I have just spent a bit more quality time with a PS3 running Linux (I opted to not update the firmware). Upon coding, I realized one thing: spe_context_run is extremely slow. My programming strategy was to use the PPE to quickly assign tasks to the SPU in a way that would minimize cache misses.
I'll step back a bit. The PPU runs a simple piece of code that tells it what it should run on each SPU once a task completes. What it should run depends upon what data is already loaded into the SPE's local store, what data is already local to the cell but within another SPE's local store, and what has to be uploaded. The problem is that I can't call spe_context_run too often or else the application is too slow. The best performance I obtained by doing the same amount of work and DMA data transfers while minimizing the number of calls to spe_context_run.
What does this mean? The SPE should be treated as an independent machine that so happens to share memory with other SPEs and the PPE. It should be given a big list of tasks that it can work on without any support from the PPE.
Why is this a challenge? I'm integrating the equations of fluid flow. Communication should occur between the SPEs for the boundary conditions. Rather, I'm betting that border conditions only need to be worried about during the change of frame. Why? If I can run the simulations sufficiently fast (80+fps) then my calculations say that this little cheat will not be noticed by the user of the system.
Numerical modelling for my claim? consider the CFL condition. Let's say the grid-size is 1. Then the maximum velocity should be about 1 grid-cell per pass of a finite-differencing-based advection scheme. I'd suggest using forward and backwards finite differencing for integration rather than central differencing as central differencing will have trouble with sharp edges. Anyhow, we have a maximum velocity of 1/timestep. At 80fps, that's about 80 pixels that data can travel per second across the grid.
That is sufficient for my purposes. For a 1024x1024 grid, it would take 12.8 seconds for something to travel across it. Projected onto a sufficiently large surface (not a monitor) the user will feel like the fluid is moving at a brisk pace.
Maybe a Lagrangian method would be better. Or even a finite-element method. I've already tuned my code for Eulerian grids... unfortunately. I'll build something better in the next development cycle.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment