Hide Threading: Threading should be done by the underlying API. And this underlying API's threads should block so they aren't squandering CPU resources. This means that whenever the given threaded component is used, it will use CPU resources as needed.
Worker Threads: Have a few dedicated threads for the heavy lifting - and choose the number of threads based upon the number of cores that are available. All other threads should be in a nearly permanently blocked state. This means that your worker threads won't compete with the resources of the other threads.
Document: Knowing what's thread-safe is critical to writing code that won't fail in the long run.
Queues: Use a queue structure to send messages between threads. This means that a thread won't block immediately if another thread doesn't reply immediately.
Cache: Be aware of it, and code expecting the CPU to have a cache. Programs written in Java, as well as in C will benefit from being aware of the cache and the underlying structure of the system's memory. For multi-core processors, this can be the source of a speed boost (eg. L2 cache is shared among cores on Core2Duos).
Compress: Do you really need all that data to travel across the bus? Minimize data transfer when possible by reducing the number of bits needed to represent the data.
Benchmark: You don't know the ultimate gain in speed. Test your theories -- preemptive optimization may actually hurt the performance of the application. Overflowing the bus degrades performance, btw.
Use What You Need: If a single core gets the job done - then use a single core. Distributing a problem has overhead - programmer wise as well as speed-of-execution wise.
Have a Game Plan: The most important part. Know how things will be structured overall. If you can't see the whole structure in your mind, consider alternatives as it might be too complicated.
Extra thoughts as of February 24, 2015:
Tasks: Use task managers to handle multi-threading when possible. For example Grand Central Dispatch in OS X will automatically manage the number of available threads based upon available CPU across all applications.
Task Stealing: Tasks, described as a series of interdependent operations based on data transformations, allow tasks to focus on recently touched data. Task stealing, if I recall, is quite optimal and allows another core to steal tasks when it is idle from those assigned to other CPUs. Look at Cilk and Intel Thread Building Blocks as concrete implementations.
Asynchronous: I believe the OpenGL API provides an example of what to strive for in terms of APIs. Writing serial code and logic is easy, parallel is hard. Shouldn't the API run in parallel while giving the illusion of sequential execution?
No comments:
Post a Comment