“Can I have that in English, please? could well have been the reaction of those pondering a dual-core processor purchase on reading my previous article. In this follow-up we’ll explore a bit more closely the situation confronting programmers when dual or multi-core processors are introduced into desktop PCs.

My previous article, Multicore processors: Tomorrow or Today? attracted a number of bemused comments. I’ll address some of those first, before continuing on with discussion of programming for multi-core processors.


Clarifying the Queries

Thanks to forum members jwenting and benna for their interesting feedback to the earlier article. Their comments raise the following considerations:

“ It may be that GPU's are being developed faster than single core CPUs, but Moore's law continues to apply, and CPU speed continues to increase. The chip makers have no hit any brick wall, and I'm not sure where you got the idea that they did.

No, I’m sorry to say that it doesn’t. Intel ceased further development of its Pentium 4 core short of the 4GHz level for good reason.
It wasn’t achievable. Current and future developments are focused on adding ‘features’ and adding extra cores to the processor, in order that the unit can perform more work. AMD have pulled up with the AthlonFX 57 processor, for the same reason. Advances slowed to a crawl when the 3Ghz or equivalent performance level was reached, and have now stopped altogether. We may see, in the future, different processor architectures which are more powerful what we currently have but they most certainly won’t be faster versions of the systems we use today. They’ll be a whole new playing field.

The pedal is fully to the metal, and we now have to make vehicle bigger to get more work done. That’s why we’re at the onset of the multi-core era!

“The real drive behind multicore processors is NOT the games industry….The real power will come first from large scientific and financial applications, maybe CAD applications, in general things that are often run on multi-CPU machines today.

The real benefit of multi-core processors will really only arrive when ALL computing applications make use of the available hardware resources in a fundamental load sharing environment. Some more specialized applications, such as the ones mentioned there, have been developed with that environment in mind from the outset. They transfer to a multicore desktop environment seamlessly, giving even better performance than they enjoyed on the ‘hyperthreaded’ desktop processors we’ve seen previously. But an everyday PC application isn’t built on a framework which can seamlessly make use of multiple processing cores.

They'll just program OS’s to put that game in one core while the OS merrily steams along in another, thus giving both more room than they have now and increasing performance. “

Such an response to the dilemma night seem to be an answer but in reality it’s only the most basic of ways in which more than one core can be utilized. Sending one application to one core and another to an alternative core might share the load out, but it does nothing to ensure that the sharing occurs in a balanced way whatsoever. Half of the available processor resource might well be handling only a small percentage of the overall workload with such an approach.


What was David Kirk on about?


When commentors such as Nvidia’s David Kirk claim that we are facing a ‘crisis’ they refer primarily to the way in which load sharing is currently confronted. As good as current programming technology is, some of the fundamental work has only been confronted in the past several years. Trainees are taught to confront problems in a sinle threaded, one dimensional manner, and the apportioning of workload becomes a process of queuing work to be performed in a sequential manner. Interaction of code with a processor which can act on parallel threads is a rather large shift in paradigm, and an exponential increase in complexity. Currently, trainees are only being introduced to such considerations as post-graduate training. Currently, the fundamental frameworks for code to interact with CPUs in this manner don’t really exist. There are only some existing frameworks for interaction of this nature, and they are rather limited in applicability.

Consider that you have a roomful of people to be put to work at a particular task. Your job in overseeing the task needs to accommodate:

• Splitting the job into manageable small, non-interacting parts.
• Designating some workers to be ‘job creators’ who work on the individual parts
• Designating some workers to be ‘job finishers’ who work on the completed individual parts.

Ensuring that this is performed in a way which maximizes the effectiveness of the available workers is ‘load balancing’ and this is the key component of programming for a multi-core environment which currently does not exist in any practically employable form. Parallelism in programming for GPUs (graphics processors) is widespread. Graphics processors are built with parallelism in mind. CPUs (central processors) are not. To make the most effective use of dual or multi core processors, programming needs to occur with parallelism in mind from the outset. Break the BIG problem down into SMALL units. Have the SMALL units handled concurrently. Have a load balancer managing the overall workflow and keeping the timing right!

To do this, the programmer needs that framework which provides job queues, job creators, job finishers, and load balancer. At present, programmers are some way off having that framework available to them. What is needed is, as a programmer friend explained to me recently:

Someone smart build a framework based on dynamic get/put queues for smaller sub-units of work.

If you get the framework right for task communication around job/task - create/solve, then life is a lot simpler. It’s a bit like middleware, once you have said we have jobs broken into smaller tasks and each task rather than being the sole thing current, it simply gets placed on a FIFO queue until something is ready to do it. You don't need to know if you have 1 or 1 million processors – that’s what a load balancer does. All other parts of the problem either generate work or do the work whilst the load balancer changes the work creators re how 'big' a job they create before they place it on a FIFO queue,


* * *

Bottom line,

1. Build task handling FIFO queues message handlers - couple hundred lines of C

2. Show programmers how to transform existing algorithms for a parallel machine - say a 30 pages PowerPoint presentation

3. Write a clever load balancer that takes inputs on hw fast is your CPU / GPU and how many pipelines or co-processors it has and set the teaks that create work to do it in chunks that are optimised towards these parameters - between 300 - 2,000 lines of C depending on how sophisticated you want to be.

What's the hassle?

* * *

So for instant a parallel quicksort becomes

qsort (array[low..high]) 
{ 
 /* split the array into two parts, all parts below midpoint are lower than it (but likely unsequenced) and converse is true for numbers bigger than the midpoint */ 

  split(array[low..high],midpoint); 


  if (midpoint - low > MIN_PARALLEL_TASK)  /* if the task is too big, split it again and again! */ 
  { 
      qsort(array(low..midpoint);  /* sort the lower half */ 
      qsort(array(midpoint..high); /* sort the higher half */ 
   } 
   else if (midpoint > low)  /* a sort interval of more than 1 number? */ 
          { 
              PUT(array[low..midpoint]);   /* add to task queue */ 
              PUT(array[midpoint..high]);  /* add to task queue */ 
           } 
    /* else job is finished */ 
}

Then if you have say 8 pipelines you create 8 tasks that read the queue the function PUT writes to (with say a GET function), then simply implement the code in the else statement above).

The load balancer has to monitor put and get queue sizes to work out how big MIN_PARALLEL_TASK should be.

So if you had a million numbers to sort and ten processors, MIN_PARALLEL_TASK should be problem size divided by the number of co-processors i.e. one million numbers / 10 co-processors = 100,000 size job chunks. Split the array into at least segments 0..100K, 100K..200K, ... 800K..900K, 900K..999K and let each co-processor do an equal amount of work!

GET just adds jobs to a queue, PUT takes them off. MASTERS split the job until its small enough to be PUT onto the work queue, SLAVES GET the sub jobs, solve them and write the solution back to a shared memory segment. Each SLAVE process works on a separate memory block with no shared variables.

That's the basics!

The framework currently doesn’t exist. Whilst it is relatively easily achievable, there aren’t currently enough people with the skills to implement it. If code has been poorly written initially (and a lot of code has) then there is little incentive to port it across. The concepts need to be introduced at an undergraduate level for us to get very far down the road in the short term.


What does all that waffle mean to me?

The situation is this:

• We are currently being confronted with dual-core desktop processors, and informed that they are the next ‘big thing’ in desktop processing power.
• We currently have graphics processors in many machines which are even more powerful in some respects that the CPUs we have installed.
• Almost all of the software we use is designed with single threaded capability in mind, and cannot make meaningful use of either the extra CPU core or the computing power of the graphics processor.
• It will be quite some way down the track before these hurdles will be overcome.

It places a dilemma on those of us making the decision to purchase or upgrade our PCs, and that dilemma effects mot one but two of the major expense components in the PC. Is it worth the expense to go dual-core at this point in time for applications performance? Well, yes, but only if the major use of your PC will be in running those tasks for which we’d previously have considered a dual processor system anyway. Is it worth the expense of going for the latest and greatest in 3D display cards? Well who really knows. We’ve already heard the stories of how the previous generation of display cards were bottlenecked by the CPU, and CPUs really haven’t become any more capable at running what we run.

Not yet, anyway!

There is no doubt that a dual core processor is potentially a much more capable unit than a single core processor, even if the clock speed of each core is lower than that of the single core unit. But unless that clockspeed matches or betters that of the single core processor, effectively we will be spending extra to get a component which will underperform in most tasks, because we simply don’t have suitable software to run on it!

Tomorrow is most certainly going to remain tomorrow, I’m afraid, for most of us anyway!

Recommended Answers

All 2 Replies

I don't so much disagree with you as think you are being a bit alarmist about all of this. It's true, dual-core processors are a reletivly new technology, and most people don't know how to utilize it effectivly yet. So what? This is how new technologies go. I don't understand what you are basing your assertion that, "It will be quite some way down the track before these hurdles will be overcome," on.

Even if it were to take a while, say three years, what would be wrong with that? People seem to be getting along just fine with current technology. We may be used to huge exponential advances in chip technology, but in other fields, in which technology growth has long since slowed, the older technologies continue to work. What exactly is the problem? I am not even convinced single-core chips have hit a wall, but if that is the case worse things have happened.

Now, on a slightly different note, I do think multi-core technology is quite interesting. These chips are more analogous to the human brain than the single core chips, and this could mean big advances. That said, the way your programmer friend describes the current implementation, it is still basically a sequencial processor. While the seperate cores do work on seperate things at the same time, they are not at all specialized, and not nearly as dynamic as the human brain. Still, it is a step forward.

"The real benefit of multi-core processors will really only arrive when ALL computing applications make use of the available hardware resources in a fundamental load sharing environment"

Of course, I was talking about the initial phase in which we find ourselves now rather than the ultimate outcome which will be several years yet.

"Such an response to the dilemma night seem to be an answer but in reality it’s only the most basic of ways in which more than one core can be utilized."

Same thing. And this is the traditional way in which multi-CPU machines have worked for decades.
I programmed to such systems back in 1997 during my first fulltime job as a programmer. The scheduling was "interesting" to build when you needed a process to run in several CPUs at once while at the same time leaving other CPUs for other processes. Especially interesting when your application ran several parallel processes which all accessed the same database which ran in yet another series of parallel processes.
And all that in Cobol with embedded SQL, no Java or other languages with built-in multithreading...

"Half of the available processor resource might well be handling only a small percentage of the overall workload with such an approach"

That all depends on the scheduler and how smart you make it.
The scheduler may well be able to detect that core #3 has only 10% CPU load and give it some tasks to do that are currently running in core #2 which is running hot.

"• Almost all of the software we use is designed with single threaded capability in mind, and cannot make meaningful use of either the extra CPU core or the computing power of the graphics processor."

Wrong. It's typically designed for multithreading in a single-CPU environment (with longterm or non-timecritical tasks being given less CPU cycles in order to keep the application responsive).
If that weren't the case you'd not be able to press the print button in your word processor while you're performing a spellcheck for example.
Background processes running inside applications would be unheard of.
What the widespread availability of multi-core CPUs at a reasonable price will bring is a widening of this paradigm to the point where for example a game AI no longer has to timeshare with the display and player response routines, removing any delay in NPC reactions which is now due to game loops (AI loops typically are run at a lower thread priority because of their complexity and lesser importance to the game experience than responsiveness and graphical details, this will no longer be necessary).

"So what? This is how new technologies go. "
Well said. Noone can program against these things until they exist (at least as simulations). We're now entering the stage where they do exist and people will start to program against them.
For now that will be mainly as gimmicks and experiments as the market penetration into the mass market is as yet not high enough to warrant commercial release of software targeted at the mass market which makes full use of these new capabilities.
If consumers en-masse put off purchase and wait for software which makes use of those capabilities the entire technology may disappear into obscurity or at least be removed from the mainstream.

" I am not even convinced single-core chips have hit a wall, but if that is the case worse things have happened."
Neither am I, but what I do know is that single core chips using current technology have hit a cost/benefit barrier.
While building larger, more complex, faster ones is possible the cost increase from doing so by now outweighs the performance benefits gained from doing so.
New technology can likely be found to counter that (as has happened in the past, a few years ago it was thought faster CPUs were impossible and only multi-CPU machines could save the day) but in the current economic climate in the main areas where innovation like that happens (north America and Europe, Asia is still more a production area and product development based on technology created elsewhere rather than true innovation on a fundamental level) such is unlikely given the high cost of such development.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.