Sunday, August 1, 2010

Parallel Processing Predictions: Part I

I've been thinking of parallel processing as of late. I've coded a task scheduler with a few swappable algorithms in the backend to test performance; seen my programs degrade in performance as the number of cores increase; and literally learn everything the "oops I made a mistake" way.

So, standing upon my pile of errors; I'm going to attempt to predict what will happen (some of it very long term). Some of it we already know - hardware will make a massive shift. Others; well - let's see if I'm right about any in the next decade.

No Change For Most: 99% of applications will remain sequential in nature and never directly deal with threads. This sounds crazy; I know given this massive push towards multi-core - but it's an unshakable conclusion in my mind.

Rather, libraries will be parallel and asynchronous - making "DIY" algorithms an even sillier affair. As a simple example: consider comparing two strings. There is a time since the request to compare the strings is sent, and when the value is actually used. If we use something like Grand Central where the threads are already spawned and just need to be used, then we're talking low-overhead implicit parallelization.

Essentially, libraries will present a sequential interface. User-land code will not need to know about parallelism - but will take advantage of the hardware when the programmer leverages the underlying frameworks.

Right about now, some people probably want to remind me that the GUI runs in the main thread whereas heavy processing is normally put into a secondary thread with mutexes to coordinate the two. I'll recall olde code that actually split up the processing so that the GUI event-pump could run. Even, olde code that would, at explicit times, yield execution to the GUI. Just yield when data is available to update the GUI, much less messy than having locks in obscure places. Normally, for such things, I have one lock that synchronizes events from the GUI on the separate thread with my work thread. The more sequential my code, the easier it is to work with.

Procedural Over Functional: Again, the Haskell fans will most likely disagree. Actually, I'd rather code parallel code in Haskell at times... But; let's not get distracted - I believe procedural code will maintain it's reign. The reason is simplicity. Everyone understands a TO-DO list, a sequence of instructions, manuals, etc. No manual I know asks you to trace to the desired result to get to the start of the solution (sometimes I feel like I'm rigging code to have side-effects to get it to run in functional languages).

Even if languages like Haskell are "better"; the rich and parallel underlying libraries of the system will make the entire thing moot.

Software Diets: Back-end system libraries will overthrow software complexity. That is, the OS will provide so much functionality that an application will simply have to connect the pieces. Some specialization might be needed; but very little.

This availability of features in the OS will change the software landscape. Software that does not properly integrate into the OS (that will have a global strategy for multiprocessing, with libraries optimized for that strategy) will slow down.

Think of it this way: the OS and it's libraries will shift together. And as hardware changes, so will the OS-provided libraries. Software using it's own libraries will need to adapt; but those using the OS-provided libraries won't.

This raises problems for multi-platform toolkits as I figure that certain functionality will be radically different on different OSes, even though they accomplish a similar thing.

Hardware Rollercoaster: See the disconnect between the user-land software and the hardware? Well, that means the underlying hardware can change more drastically. If the OS is the only one that needs to know about the specifics of the hardware and the user-land software just needs to focus on the algorithm; then people shouldn't even notice the insanity that is going on underneath the hood.

This implies that something akin to the LLVM project would be used to compile to a machine-independant level and complete compilation on the final hardware.

More Open-Source: Not the year of Linux; but the year of the open-source libraries. The added complexity of the underlying libraries will mean that they are more costly to develop. I'm not advocating using the spare time of developers to advance a corporations heavy work; but rather problems will be shared among OS vendors. Open-source is just a way to save money by reducing the amount of redundant work among companies.

If I had to wager, I'd say a BSD-style license - so each vendor can advertise having an upper-hand when marketing to consumers.

In the end... We aren't headed towards great doom when it comes to multi-core. Yes, people are panicking now as they realize all the caveats, however we've known for quite a while that 10% of the code is responsible for 90% of the execution time. That 10% of the code, in most cases, should be part of the OS and supporting library.

Final Ramblings: I referred to Apple's technology since I'm more familiar with it. I'm sure Microsoft is doing something similar - and that the .NET runtime might be better suited to the type of parallelism that I'm describing.

Now that I've predicted that nothing much will change for your average programmer; I'm going to do a part 2 - and detail exactly how I see the underlying libraries and hardware shifting. In part 3 - I'll explore how the development environment could change.

These predictions aren't final. As I learn more and read more of the results from the scientific community, my predictions may be refined over time. I won't delete these pages; but admit my mistakes and document how I erred.

One last thing. As a reader, you must have noticed the lack of references in this document. I don't trust texts without references. Neither should you. This part essentially puts forward the theory that user applications will just, in the worst case, need a recompile and keep on running using all the available hardware.

All this despite the move to multicore.

No comments:

Post a Comment