Friday, February 20, 2015

Long live the Cell Processor

Talking with a former coworker yesterday, something dawned upon me.  I have already heard allusions from fellow students years ago to this idea.  The GPU is becoming the Cell.  The exposed API may be different, but the seeds are there.  Let's look at the details.

First, we must get over the concept of 'Wavefront', otherwise known as 'Warp'.  A wavefront is a fancy way of saying that you have multiple threads sharing the same instruction pointer.  In other words, it is akin to multiple people following the same to-do list, with one person telling them what line on the to-do list to work on.

Consequentially, if there is a branch, and two threads within a warp diverge, then both branches must be executed.  Some GPUs will fall back to executing each thread in the warp in sequence leading to a sixteenth of the potential performance.

Each wavefront may have its own personal store, known as thread local storage.  This fast memory is used for accumulating data.

Second, a few preliminaries on the Cell.  It had a PowerPC processor, with 8 SPU - synergistic processing units.  SPUs each could be allocated by the application, they could have programs compiled for them uploaded to them, and could communicate between each other using a super-fast ring network in the CPU.

Each SPU in the Cell, 8 total, had about 128 kilobytes of L1.  The SPUs accessed the L1 directly.  This is crucial to understand the Cell, the SPUs exposed their L1 as memory.  Pointers pointed into the L1.  The L1 is akin to the thread local storage on the GPUs.  As for warps, think of them as operating logically on vectors.  And SPUs where vector processors, with 4 floats per vector opposed to the 16 or more on GPUs.  In this sense, they both take similar algorithms.

Where the two differ is in memory access.  Cell went with the idea that the developer should be responsible for all low-level details - if everything on this hardware is going to be optimized, might as well expose all details.  Logically follows, the programmer is responsible to call asynchronous DMA functions to and from main memory to fill small chunks of L1 not used by the application currently loaded to the SPU.

On the other hand, the GPU, when it stalls for data, will run a different set of threads.  It works based on the assumption that there are a ton of threads, and it is less expensive, and easier for the developer, if one that is ready to run takes over.  That means, though, we have an issue where memory allocated per warp must be shared by all the threads that run in that warp.  The less memory used, the more backup work there is for one a thread stalls.  Hence why you don't want to use too many uniforms in OpenGL.  (There are more details, but this is the essential.)

The question is, given memory bandwidth is the issue with modern GPUs, will we see a rise of handing control over how data is transferred to and from the GPU to the developer?  And by then, will we not have simply gone back and gotten cell-like hardware?

Monday, February 16, 2015

Super-Scalar Execution for Software

Update: Looking through the C++11 changes, there is something that I have completely missed.  std::future and std::async (or another std function that may spawn threads) which provide a bare-bones way of doing plenty of operations asynchronously within the existing standard.  This is a step towards what I describe below -- and devised long before I ranted.

Multiple processors are the new norm, and there is nothing we can do about it.  We have to bite the bullet and add threads to our software.  A few mutexes here, signals there, and before you know it we have race conditions and stalls.

There is something to be said about sequential execution of code and how legible it is.  It is a reason why we have exception handling - the common code runs through an efficient path, and the corner cases which are rare go through the stack-unwinding path.  Simple.  Easy

But what about, say, multi-threaded code.  You now have multiple instruction pointers and things dancing all over the place.  It is, for lack of a better word, a mess.

Let me elaborate.  Mutexes are terrible since they stall threads.  They act as bottle-necks preventing parallel execution.  The more code within a mutex, the less parallelism you end up with.  Task-schedulers fragment what would be linear code into a tree of potentially parallel functions.  Again, try to undersand the algorithm in that mess.

Then, it dawns upon me.  Your modern CPU does something known as super-scalar execution.  Fancy term to say it knows the dependencies among instructions and can execute, within reasonable limits, code out of order.  It can fuse common instructions into one, or it can split complex instructions into many.  For lack of a better word, it rewrites and parallelizes code among its various execution units - and that with each and every piece of linear software -- single thread that is all.

Your compiler knows that this happens, and will often enough try to re-organize code such that the various execution units are always fed with something.  It might do something that appears suboptimal, but which runs faster, since the target processor can run that instruction in parallel.

Back to the point: linear code is easy.  Is there a way to parallelize our linear code?  Doing so automatically is an exercise in frustration since the key to making something happen in parallel is the data.  Data has to be operated on, for lack of better words, atomically.  But the CPU works around this.

How?  Each operation takes up an execution unit.  An execution unit is like a thread of execution waiting to be used.  Each thread of execution will operate for a known amount of time (cycles) for each operation.  Can we replicate this pattern with regular code?

Let me propose the idea of a "Super Scalar API", where we have plenty of worker threads awaiting commands to run.  We become completely data-driven, in that the worker threads will distribute work based on data chunks and not around the series of instructions that must be followed for execution to function.

But now we have this issue: we need a way for the API to know of the data, and for the code to deal with the sequence of events.

First, we need the concept of a pipeline.  Running our sequential code should fill this pipeline with instructions and data.  Then, this pipeline should be able to look at what is being done, and assign work to the appropriate worker threads.  Yes, we do have two points of serial execution (logic + pipeline), but I expect that the brunt of the work should occur on a worker thread.

Second, we need to know what these worker threads would support.  If we know the type of operations that applications do so frequently, then would could make them happen out of order.  Think about us making the API act synchronously, but in reality it is asynchronous.  Again, not unheard of.  OpenGL, for the longest time, worked like this.    STL-like functionality would be the prime candidate; you need to do some operation on some data on some container.  Either as reduction, or as accumulation.  As you can imagine, mutability plays a massive role in this.

Third, and lastly, we need to deal with conditions asynchronously.  If (result of previous operation) needs a better solution.  And, thanks to lambdas, we can do something very nifty.  Define our own If (with a capital I) that captures the condition and feeds it rhough to the pipeline.  Sequential execution illusion is not broken, and we get parallel execution out of it.

Finally, the problem will be there is always a way to have too many interdependencies to make parallelism less possible.  Embarassingly sequential.  But I would argue that if the containers are properly built, then over time parallelization of all applications built upon the abstractions can be improved by seeing how they are used.

And there you have it.  Super-scalar execution as the hero to make massive parallelism possible.  Say, simpler.  Well, that is my thinking.

Saturday, February 14, 2015

Internet Filtering for Social Good

Having one of my goals in the next year is (again) to start a web comic; research has become essential.  My life experience is, frankly, stale.  If I were to let my imagination alone dictate what to write, it would fall flat - become something even I wouldn't like to read.

Note: Opposed to previous comics, I'm looking at drawing something that takes itself as irreverent.  When the piece is too serious on first glance, people fail to see humour.  Being ridiculous doesn't stop me from trying to bring up some sort of point.

Note 2: This post is unrelated to technology, with which I'm more well versed.  It is a slight distraction which I found interesting to research.

Enter political commentary / editorial cartoons; a fine form of expression which limits the story to a few panels and without any obligation to recycle characters.  Better yet, grounded in reality to the point that I don't see employers even caring (consider someone in modern times making a life's work / novel / sculpture / game -- for many people they can't as if they do, they don't own it and it won't come out in any shape or form).

Oddly enough, I'll be link-bombing what I believe to be respectable web-sites (whose contents is not my responsibility); with my opinion of what is in the linked articles (so read the articles to get the whole picture):


UK Tories demand a "ban this terrorist filth" button for the ...
Boing-Boing brings up this interesting factoid.  Wikipedia got swept up and blocked for posting an album cover (that was banned).  This goes hand-in-hand with an idea for a strip where questionable organizations would upload dubious material to publicly editable sites to have them banned.  More as an "haha", internet trolls.


David Cameron pledges anti-terror law for internet after ...
The Guardian has a nice piece describing how the Charlie Hebdo attacks have sparked the ideal of being able to access all communications over the Internet.  There is increased ability to track users via IP.  Yet - you just need a few people good with stenography to hide communications in seemingly innocuous places.  In other words, it would be a game of cat and mouse as communications become hidden in more clever ways masked as your routine day-to-day Internet traffic.  Alternatively, if people were to send a letters encrypted with a one-time pads -- wouldn't we have the same issue but with regular mail?


Despite Ban, YouTube Is Still a Hotbed of Terrorist Group ...
FastCompany.  I don't know them.  I post the link since all the videos have already been taken down.  They describe how "promotional" videos are disseminated to video sharing sites.  It is interesting as I initially assumed the issue was mainly point-to-point communication and nothing so overt.  Bad me.


Why Tolerate Terrorist Publications?
From the New-York Times.  And I'm surprised.  Really.  It is an opinion piece about an online magazine available to the general public describing all sorts of ways to do "bad things"(tm) at airports and whatnot and questions if the American ideal of free speech is proper in such cases.  Sure, a well-written, impassioned article.  Why am I surprised?  The author didn't filter himself.  He states the name of the magazine.  No links, but anyone could probably Google it and find it.


In India, is web censorship justified in the name of national security?
From PBS, this one is a bit more nefarious I find.  Github was blocked.  Completely.  As was The Internet Archive, Vimeo, and other sites.  Notice that these sites are all driven by content not under the control of the creators.  Just so happens they can be given or mirror content that is undesirable to be shared.  Github I used often enough to say it can be essential.  The Internet Archive has the WayBack machine, which I used to see how things were.  This article, essentially, goes full-circle to the idea that a bunch of jokers could have sites driven by user-content banned in certain places by uploading material which governments determine is bad for public consumption.


Farewell Magna Carta: the Counter-Terrorism and Security Bill
Published by OpenDemocracy.  My conscience tells me that I should say they are biased towards the left.  However, from what they say, it is quite eery.  Literally scary.  More so than the taking of passports based on suspicion.  That is of reporting all suspicious activity / people.  And that it typically targets people of a certain faith.  I heard that Germany once used this tactic...  Moving on...


'Terror has no religion': MP urges ban on phrase 'Islamic terrorist' in ...
From RT.  Said by a Russian MP.  Title says it all.


Australian broadcaster bans 'unsuitable' Peppa Pig episode ...
From DailyMail.co.uk - I don't think we're trying anymore.  It's easier to ban.  Ban the children's comic!  I guess that's what happens when television raises kids.  (Yes, I realize everything in Australia, including spiders, are deadly.  I think we should ban Charlotte's Web in that country.)


Russia Moves To Ban Tor And Anonymous Web...
From Vocative (hope Google News only brings up good sources), seems like leading a political campaign is a crime in Russia.  There's a little screenshot of Twitter at the bottom saying that the site of the opposition party was blocked.  Interesting - especially when combined with "Terror has no religion".


China cracks down on 'vulgar culture' of web pseudonyms
From the BBC.  I don't know what to make of this.  In a sense, people should publicly stand behind their convictions.  But at what point must people be able to put out subversive messages against the governing bodies?  They aren't run by deities or people impossible to corrupt.


'Islamist terror threat' shuts down largest festival in northern Germany
From Haaretz, found on Reddit's WorldNews subreddit.  Title says it all.  Why just filter data on the Internet when you can filter events off the Internet as well?


A quick perusing for blocking of sites (wholesale) in America led to no results.  I'll keep an eye out, but the links above have been quite informative.

Wednesday, February 11, 2015

Swift + Spritekit Prototype Result

My latest prototype uses two core technologies: Swift and Spritekit.  This was an experiment to see how ready the language and games toolkit were for making games, quickly.

A caveat that must be repeated: I spent more, I mean significantly more, time with Photoshop.  And that tool has served me well.  A reminder that code is but just a small portion of what is a game.

Now back to the code.  Overall, I find Swift to be a joy to work in when it works.  A few times did I see the syntax highlighting crash and restart (it's nice it didn't bring down XCode).  Spritekit does what it is good at, displaying images from a generated atlas.

90% of the time I was in coding bliss.  The let for constants just forces me to be explicit about what is constant.

Now for the pitfalls.  The 10%.

Spritekit does not have an innate ability to display subsequent frames of animation.  I can do it in code, but it's something I'd expect the framework to handle and not the game.

Sure, we do have this nifty editor to set up a scene, but I can't create empty sprites and save them in the editor.  Rather, load a scene and rip stuff out?  The editor is really there for setting up quick and dirty levels.  I was trying to use it to build a character composed of various independently anchored and animated elements.  Not going to happen with what Xcode provides as selecting what's underneath a lot of stuff can be difficult, if not impossible.  In other words, the editor is cute but didn't prove useful in the cases that I needed it.

Physics is another sore point in the editor.  I wanted the container node to have physics and the children to animate.  Again, not happening nicely.

As for Swift, when it comes to prototype levels, I usually encode them in a string in a source file.  Accessing the contents of a string randomly is slow, so I had to cast that to an array.  However, the level was so large that Swift was timing out evaluating the cast from string to array (brownie points for the language since it was optimizing something at compile time which I naively would assume to be runtime).

What annoyed me the most is that I defined a class called Character at which point the compiler just said OK and didn't really complain.  Two Character classes meant casting something to a character was neigh impossible.  I wish the language would have errored once it saw my stupidity rather than try to work around it before giving up and throwing cryptic messages about types $r0 not matching type $r1...

In conclusion, the combination of SpriteKit and Swift are great to play around in, get something quickly running, in a side project where battling with the language is an option.  If not wait a bit for the language to mature with the API.  That has been my experience when doing a 1.5 day project (yes, you'll hit the limits in 1.5 days depending upon what you're trying to do)

And that was my opinion.  Take it with a grain of salt.