Michael's Unsolicited Rants: 2015

Monday, March 16, 2015

C++14 and Unit Testing Functions Without Side-effects

Beginning in C++14, we may now have constexpr functions which are multiple lines long including loops, and other standard C++ constructs. These functions may be useful at runtime, but we can test them at compile time. At least with Clang that is.

Consider the function:

constexpr int myfunc(int a) { if (a < 10) return -a; else return a*a; }

Now, right under it I can do:

static_assert(myfunc(10) == 100, "test myfunc: edge case when input is 10");

Consider, all the tests that can now be done at compile time. Now it is easier for a programmer to break the build! It is now obliged for a developer to run the tests on their machine before a submit! No modifications to the compiler needed. Even Xcode will underline static-asserts whose condition is violated as you type.

This little gem does have some caveats, though. First, you can't debug the function while it is compiling, so kiss goodbye to the debugger until you comment out the unit tests, call the function with a value unknown at compile time (like input from the user) and then once that is done, re-enable the tests (because of this, you want to write many small functions to combine and test in parts for ease of development). Second, it only works if, as we pray is the case, that the compiler executes code as identically as possible as to what the compiled code would. Third, we are limited to deterministic functional cases which limits the utility of the method.

Thursday, March 5, 2015

Proposal for Steganography

In this post, I'll be detailing the theoretical implementation of a steganographic system which hides a message within an image. The goal is to propose a way to modify the image without modifying the underlying statistical properties of the image as checked by known techniques. Essentially, we are modifying a statistic to encode the data, no?

This is me thinking aloud. I think creating a toy app where people send images from the camera with hidden messages could be fun, as a toy, nothing serious.

Call it "Project Steggy". This blog post will be updated with the ideas that consist of this project. The end goal of this project is an app that can post images to your favourite social network with encoded stenographic features. So your friends get the message, but the rest of the world doesn't (except for the five eyes who know everything).

What we won't do
Use the last bits of colour components for another purpose (known as LSB / Least Significant Bit). The noise in the signal can be detected through automated means. For example, if you have a colour with a red-green-blue tuple of (255, 254, 253) we can say that the numbers are even or odd. Even numbers are 0, odd are 1. In this case, the hidden message would be 101. With a lot of colours, we can have a lot of bits and encode much more data.

Encode within the noisiest components of the discrete cosine transform. Again, a slight alteration of the underlying statistics. In this case, take the last components of the DCT and set them to specific values.

Considered options
Average manipulation. Given a block of pixels, if the average intensity of the pixels in the block is even then we have a 0, and if it is odd we have 1. If we say a block contains a single pixel, we look at whether round((red + green + blue)/3) is even or odd. In this case, we have more freedom - we can change the red, green or blue channels to encode a value. Furthermore, we can look at the average of reds within a 4x4 block, the greens in a 4x4 block, etc. Essentially, reducing the amount of data encoded to better mask it within the image. The less data to encode, the harder it is to find again.

Inverse statistics. This idea is a little on the crazy side, but I like it. Rather than manipulate an image, capture from a video feed until a single frame exhibits the desired statistics which will encode the image. Such a zany solution can work if the number of bits to encode is very small and if there is a bit of noise from the camera's CCD. As you can imagine, to store more than a few bits could take forever in waiting for the right frame.

HDR statistics. Recall that an HDR image is composed of several shots, taken at different brightness levels, then recombined to max out the dynamic range. Now, suppose we captured about 10 frames. Each frame is the same image, but with some variation. If the images are the same, then the pixels are interchangeable. We could use this to pick and choose pixels that match the desired statistics. On the downside, these changes will stand out when dealing with moving objects.

Chosen Option
Colour tweaking. As far as I can tell, the problem is that we doing the equivalent of encoding a second image with fewer bits so that it is hidden to the human but can be rooted out thanks to statistical techniques. What we wish to do is to preserve the colour range and not hide something behind every pixel.

Essentially, let us suppose that we have 'n' colours in a photograph and wish to encode 'n' (or as close to as possible 'n') bytes. Let us not count the colours that are at the edge of the possible such as pure blue, pure white, pure black, etc. These are most likely to appear as a complete white-spot, or black spot, and we don't want to leave a hint of manipulation by not having pure white.

Statistically, taking a picture with very little light yields about 1948 different pixel values, which is sufficient to store 243 bytes. A quick image taken with the intent of having multiple colours has 184049 unique pixel values for a good 22 kilobytes. Or, about 200 lines of text. More than enough to send a message, and hold some meta data about the message itself.

Let's look at the decoding process since the encoding process would mask the scheme.

First, a decoder would take all the unique colours in the image, sort them (treat the colour as a series of 3 numbers, sort by red, then by green if the same red, and then by blue values if red and green the same), and keeping a running total of the number of duplicates. If the number of duplicates is even, then we have a 0. If it is odd, we have a 1. String the digits together, and we have a code.

Encoding, as can be imagined, would suffer a few corner cases, but need not be that difficult. First, we enumerate the colours and a count. All the colours that match the count can be ignored. Then, find the nearest colours (visually) in our list, then within the image change one colour to the other (ideally based upon context). Repeat until all colours are processed.

There are a fixed number of pixels (even if the dynamic range isn't completely used). We can't have an odd sum of pixels. One colour -- the first, the last, or middle, whatever it may be in our sorted list -- must be a dummy. A colour that can be used as a dummy for the last swap and does not constitute as part of the message.

Yes, the amount of data that can be transmitted has been greatly decreased. But statistically, the image is practically identical to what it was before. 1000 unique red-green-blue combinations gives about 128 bytes to play with using this scheme. A short twitter message. In a 640x480 image with 1000 unique colours, we would change 0.32% of the pixels in the worst case.

iOS Specific Notes
To properly integrate with the OS, we will support iOS extensions. One extension will add steganographic data to an image, the other will attempt to find such data in the image.

This will allow a user to seamlessly view encoded images without leaving the host app, and the same for encoding. For example, someone could be editing an image and invoke steganography as an action. What I am curious is if we can send to our encoder then have the option to send it elsewhere (Nested actions?).

Extensions
As the number of pixels increases, there is more leverage to fiddle with the image's underlying statistics. Rather than limit ourselves to even/odd pixel counts, we can move to modulo arithmetic and say mod 3 can be 00, 01, or 1. Then the encoder would do more colour swaps until the desired sums are achieved.

Message Encoding
The weakness in the method is the encoding of the message. The distribution of colours must not betray that there is a message. Using a key, we could hash it, and use the hash to create bit-sequences representing the characters and dummy sequences representing nothing. We need dummy sequences to pad the message so it takes all the bits available in the colours.

My preferred encoding would use a key. The key could be a cute picture, a hex display, a QR code, etc. Anything that describes the bits. Rather than try to mask the message (xor with the key for example), the key could describe a series of offsets. Something like bit 0 is at bit 10, bit 1 is at bit 50, etc. something that makes a mess of the message unless a proper key is supplied.

Let us suppose a key that is 16 bytes. We know there are n unique colour combinations. This is tricky, as the bit layout must not conform to a generic rule. I would say use the first 11 bits for the first offset, then take the remaining 5 and the first 6 for the second offset (added to the first).

If there is a unique descriptor identifying the image as a steganographic image, then the bits should be mixed within that of the message so that as much as possible this unique code can't be used to break the key. The first few bits in the key could drive the code that is used to identify the image.
I have not yet looked at the generation of the sequences, nor of the hashes.

Friday, February 20, 2015

Long live the Cell Processor

Talking with a former coworker yesterday, something dawned upon me. I have already heard allusions from fellow students years ago to this idea. The GPU is becoming the Cell. The exposed API may be different, but the seeds are there. Let's look at the details.

First, we must get over the concept of 'Wavefront', otherwise known as 'Warp'. A wavefront is a fancy way of saying that you have multiple threads sharing the same instruction pointer. In other words, it is akin to multiple people following the same to-do list, with one person telling them what line on the to-do list to work on.

Consequentially, if there is a branch, and two threads within a warp diverge, then both branches must be executed. Some GPUs will fall back to executing each thread in the warp in sequence leading to a sixteenth of the potential performance.

Each wavefront may have its own personal store, known as thread local storage. This fast memory is used for accumulating data.

Second, a few preliminaries on the Cell. It had a PowerPC processor, with 8 SPU - synergistic processing units. SPUs each could be allocated by the application, they could have programs compiled for them uploaded to them, and could communicate between each other using a super-fast ring network in the CPU.

Each SPU in the Cell, 8 total, had about 128 kilobytes of L1. The SPUs accessed the L1 directly. This is crucial to understand the Cell, the SPUs exposed their L1 as memory. Pointers pointed into the L1. The L1 is akin to the thread local storage on the GPUs. As for warps, think of them as operating logically on vectors. And SPUs where vector processors, with 4 floats per vector opposed to the 16 or more on GPUs. In this sense, they both take similar algorithms.

Where the two differ is in memory access. Cell went with the idea that the developer should be responsible for all low-level details - if everything on this hardware is going to be optimized, might as well expose all details. Logically follows, the programmer is responsible to call asynchronous DMA functions to and from main memory to fill small chunks of L1 not used by the application currently loaded to the SPU.

On the other hand, the GPU, when it stalls for data, will run a different set of threads. It works based on the assumption that there are a ton of threads, and it is less expensive, and easier for the developer, if one that is ready to run takes over. That means, though, we have an issue where memory allocated per warp must be shared by all the threads that run in that warp. The less memory used, the more backup work there is for one a thread stalls. Hence why you don't want to use too many uniforms in OpenGL. (There are more details, but this is the essential.)

The question is, given memory bandwidth is the issue with modern GPUs, will we see a rise of handing control over how data is transferred to and from the GPU to the developer? And by then, will we not have simply gone back and gotten cell-like hardware?

Monday, February 16, 2015

Super-Scalar Execution for Software

Update: Looking through the C++11 changes, there is something that I have completely missed. std::future and std::async (or another std function that may spawn threads) which provide a bare-bones way of doing plenty of operations asynchronously within the existing standard. This is a step towards what I describe below -- and devised long before I ranted.

Multiple processors are the new norm, and there is nothing we can do about it. We have to bite the bullet and add threads to our software. A few mutexes here, signals there, and before you know it we have race conditions and stalls.

There is something to be said about sequential execution of code and how legible it is. It is a reason why we have exception handling - the common code runs through an efficient path, and the corner cases which are rare go through the stack-unwinding path. Simple. Easy

But what about, say, multi-threaded code. You now have multiple instruction pointers and things dancing all over the place. It is, for lack of a better word, a mess.

Let me elaborate. Mutexes are terrible since they stall threads. They act as bottle-necks preventing parallel execution. The more code within a mutex, the less parallelism you end up with. Task-schedulers fragment what would be linear code into a tree of potentially parallel functions. Again, try to undersand the algorithm in that mess.

Then, it dawns upon me. Your modern CPU does something known as super-scalar execution. Fancy term to say it knows the dependencies among instructions and can execute, within reasonable limits, code out of order. It can fuse common instructions into one, or it can split complex instructions into many. For lack of a better word, it rewrites and parallelizes code among its various execution units - and that with each and every piece of linear software -- single thread that is all.

Your compiler knows that this happens, and will often enough try to re-organize code such that the various execution units are always fed with something. It might do something that appears suboptimal, but which runs faster, since the target processor can run that instruction in parallel.

Back to the point: linear code is easy. Is there a way to parallelize our linear code? Doing so automatically is an exercise in frustration since the key to making something happen in parallel is the data. Data has to be operated on, for lack of better words, atomically. But the CPU works around this.

How? Each operation takes up an execution unit. An execution unit is like a thread of execution waiting to be used. Each thread of execution will operate for a known amount of time (cycles) for each operation. Can we replicate this pattern with regular code?

Let me propose the idea of a "Super Scalar API", where we have plenty of worker threads awaiting commands to run. We become completely data-driven, in that the worker threads will distribute work based on data chunks and not around the series of instructions that must be followed for execution to function.

But now we have this issue: we need a way for the API to know of the data, and for the code to deal with the sequence of events.

First, we need the concept of a pipeline. Running our sequential code should fill this pipeline with instructions and data. Then, this pipeline should be able to look at what is being done, and assign work to the appropriate worker threads. Yes, we do have two points of serial execution (logic + pipeline), but I expect that the brunt of the work should occur on a worker thread.

Second, we need to know what these worker threads would support. If we know the type of operations that applications do so frequently, then would could make them happen out of order. Think about us making the API act synchronously, but in reality it is asynchronous. Again, not unheard of. OpenGL, for the longest time, worked like this. STL-like functionality would be the prime candidate; you need to do some operation on some data on some container. Either as reduction, or as accumulation. As you can imagine, mutability plays a massive role in this.

Third, and lastly, we need to deal with conditions asynchronously. If (result of previous operation) needs a better solution. And, thanks to lambdas, we can do something very nifty. Define our own If (with a capital I) that captures the condition and feeds it rhough to the pipeline. Sequential execution illusion is not broken, and we get parallel execution out of it.

Finally, the problem will be there is always a way to have too many interdependencies to make parallelism less possible. Embarassingly sequential. But I would argue that if the containers are properly built, then over time parallelization of all applications built upon the abstractions can be improved by seeing how they are used.

And there you have it. Super-scalar execution as the hero to make massive parallelism possible. Say, simpler. Well, that is my thinking.

Saturday, February 14, 2015

Internet Filtering for Social Good

Having one of my goals in the next year is (again) to start a web comic; research has become essential. My life experience is, frankly, stale. If I were to let my imagination alone dictate what to write, it would fall flat - become something even I wouldn't like to read.

Note: Opposed to previous comics, I'm looking at drawing something that takes itself as irreverent. When the piece is too serious on first glance, people fail to see humour. Being ridiculous doesn't stop me from trying to bring up some sort of point.

Note 2: This post is unrelated to technology, with which I'm more well versed. It is a slight distraction which I found interesting to research.

Enter political commentary / editorial cartoons; a fine form of expression which limits the story to a few panels and without any obligation to recycle characters. Better yet, grounded in reality to the point that I don't see employers even caring (consider someone in modern times making a life's work / novel / sculpture / game -- for many people they can't as if they do, they don't own it and it won't come out in any shape or form).

Oddly enough, I'll be link-bombing what I believe to be respectable web-sites (whose contents is not my responsibility); with my opinion of what is in the linked articles (so read the articles to get the whole picture):

UK Tories demand a "ban this terrorist filth" button for the ...
Boing-Boing brings up this interesting factoid. Wikipedia got swept up and blocked for posting an album cover (that was banned). This goes hand-in-hand with an idea for a strip where questionable organizations would upload dubious material to publicly editable sites to have them banned. More as an "haha", internet trolls.

David Cameron pledges anti-terror law for internet after ...
The Guardian has a nice piece describing how the Charlie Hebdo attacks have sparked the ideal of being able to access all communications over the Internet. There is increased ability to track users via IP. Yet - you just need a few people good with stenography to hide communications in seemingly innocuous places. In other words, it would be a game of cat and mouse as communications become hidden in more clever ways masked as your routine day-to-day Internet traffic. Alternatively, if people were to send a letters encrypted with a one-time pads -- wouldn't we have the same issue but with regular mail?

Despite Ban, YouTube Is Still a Hotbed of Terrorist Group ...
FastCompany. I don't know them. I post the link since all the videos have already been taken down. They describe how "promotional" videos are disseminated to video sharing sites. It is interesting as I initially assumed the issue was mainly point-to-point communication and nothing so overt. Bad me.

Why Tolerate Terrorist Publications?
From the New-York Times. And I'm surprised. Really. It is an opinion piece about an online magazine available to the general public describing all sorts of ways to do "bad things"(tm) at airports and whatnot and questions if the American ideal of free speech is proper in such cases. Sure, a well-written, impassioned article. Why am I surprised? The author didn't filter himself. He states the name of the magazine. No links, but anyone could probably Google it and find it.

In India, is web censorship justified in the name of national security?
From PBS, this one is a bit more nefarious I find. Github was blocked. Completely. As was The Internet Archive, Vimeo, and other sites. Notice that these sites are all driven by content not under the control of the creators. Just so happens they can be given or mirror content that is undesirable to be shared. Github I used often enough to say it can be essential. The Internet Archive has the WayBack machine, which I used to see how things were. This article, essentially, goes full-circle to the idea that a bunch of jokers could have sites driven by user-content banned in certain places by uploading material which governments determine is bad for public consumption.

Farewell Magna Carta: the Counter-Terrorism and Security Bill
Published by OpenDemocracy. My conscience tells me that I should say they are biased towards the left. However, from what they say, it is quite eery. Literally scary. More so than the taking of passports based on suspicion. That is of reporting all suspicious activity / people. And that it typically targets people of a certain faith. I heard that Germany once used this tactic... Moving on...

'Terror has no religion': MP urges ban on phrase 'Islamic terrorist' in ...
From RT. Said by a Russian MP. Title says it all.

Australian broadcaster bans 'unsuitable' Peppa Pig episode ...
From DailyMail.co.uk - I don't think we're trying anymore. It's easier to ban. Ban the children's comic! I guess that's what happens when television raises kids. (Yes, I realize everything in Australia, including spiders, are deadly. I think we should ban Charlotte's Web in that country.)

Russia Moves To Ban Tor And Anonymous Web...
From Vocative (hope Google News only brings up good sources), seems like leading a political campaign is a crime in Russia. There's a little screenshot of Twitter at the bottom saying that the site of the opposition party was blocked. Interesting - especially when combined with "Terror has no religion".

China cracks down on 'vulgar culture' of web pseudonyms
From the BBC. I don't know what to make of this. In a sense, people should publicly stand behind their convictions. But at what point must people be able to put out subversive messages against the governing bodies? They aren't run by deities or people impossible to corrupt.

'Islamist terror threat' shuts down largest festival in northern Germany
From Haaretz, found on Reddit's WorldNews subreddit. Title says it all. Why just filter data on the Internet when you can filter events off the Internet as well?

A quick perusing for blocking of sites (wholesale) in America led to no results. I'll keep an eye out, but the links above have been quite informative.

Wednesday, February 11, 2015

Swift + Spritekit Prototype Result

My latest prototype uses two core technologies: Swift and Spritekit. This was an experiment to see how ready the language and games toolkit were for making games, quickly.

A caveat that must be repeated: I spent more, I mean significantly more, time with Photoshop. And that tool has served me well. A reminder that code is but just a small portion of what is a game.

Now back to the code. Overall, I find Swift to be a joy to work in when it works. A few times did I see the syntax highlighting crash and restart (it's nice it didn't bring down XCode). Spritekit does what it is good at, displaying images from a generated atlas.

90% of the time I was in coding bliss. The let for constants just forces me to be explicit about what is constant.

Now for the pitfalls. The 10%.

Spritekit does not have an innate ability to display subsequent frames of animation. I can do it in code, but it's something I'd expect the framework to handle and not the game.

Sure, we do have this nifty editor to set up a scene, but I can't create empty sprites and save them in the editor. Rather, load a scene and rip stuff out? The editor is really there for setting up quick and dirty levels. I was trying to use it to build a character composed of various independently anchored and animated elements. Not going to happen with what Xcode provides as selecting what's underneath a lot of stuff can be difficult, if not impossible. In other words, the editor is cute but didn't prove useful in the cases that I needed it.

Physics is another sore point in the editor. I wanted the container node to have physics and the children to animate. Again, not happening nicely.

As for Swift, when it comes to prototype levels, I usually encode them in a string in a source file. Accessing the contents of a string randomly is slow, so I had to cast that to an array. However, the level was so large that Swift was timing out evaluating the cast from string to array (brownie points for the language since it was optimizing something at compile time which I naively would assume to be runtime).

What annoyed me the most is that I defined a class called Character at which point the compiler just said OK and didn't really complain. Two Character classes meant casting something to a character was neigh impossible. I wish the language would have errored once it saw my stupidity rather than try to work around it before giving up and throwing cryptic messages about types $r0 not matching type $r1...

In conclusion, the combination of SpriteKit and Swift are great to play around in, get something quickly running, in a side project where battling with the language is an option. If not wait a bit for the language to mature with the API. That has been my experience when doing a 1.5 day project (yes, you'll hit the limits in 1.5 days depending upon what you're trying to do)

And that was my opinion. Take it with a grain of salt.

Monday, January 5, 2015

Geometric Algebra Pitfalls

After studying more than I expected to during the holidays, I believe I have finally got my wits together and can now give some clear hints to anyone else who wishes to travel this path. I highly suggest "Linear and Geometric Algebra" by Alan Macdonald. Read the previews and details online before considering to purchase this book. I like it, doesn't mean you will too.

First, and most importantly, write out your multi vectors explicitly. For example, vector (2, 3, 4) should be written as 2e₁ + 3e₂ + 4e₃. Recall the multiplication of the e's (or basis as that is what they form) is not associative.

Consider (1e₁ + 2e₂)(3e₁ + 4e₂).

Intuitively, from the good ol' days, we would do: (1e₁3e₁ + 1e14e2 + 2e₂3e₁ + 2e₂4e₂).

Regrouping the scalers together: (3e₁e₁ + 4e₁e₂ + 6e₂e₁ + 8e₂e₂)

Since e₁e₁ = 1: (3 + 4e₁e₂ + 6e₂e1 + 8)

Switching the place of two adjacent e's flips the sign, ie. e₁e₂ = -e₂e₁: (3 + 4e₁e₂ - 6e₁e₂ + 8)

Final grouping: (11 - 2e1e2)

Notice that, save for the special rules surrounding the es, it is very much as was learned before from the days of multiplying two expressions consisting of x and y, such as (2x+y)(4z+8w).

If a problem in geometric algebra seems difficult, my first reaction has been to ensure that everything uses e explicitly.

Second, when reading articles and books, always be sure to know which dimension the given results apply in. I have yet to find an article that does not give examples or definitions at a lower dimension before jumping to the dimensionless forms thanks to the desire to give out something practical without having to face a wall of mathematics.

Third, the inner (also known as the dot) and outer products are filters on the geometric product. If the result does not match the product, it is zero, and we move on.

Again, let's return to (1e1 + 2e2)(3e1 + 4e2).
The result with the geometric product was: (11 - 2e1e2).
Let us consider the dot product of (1e1 + 2e2) and (3e1 + 4e2). The grade of 1e1 is 1 (to obtain the grade, count the number of unique e's after the scalar). When multiplying two elements together, the dot product says the grade of the result must equal the grade of the second element minus the grade of the first. So when we take the inner product 1e1 with 3e1 where the grade(3e1) - grade(1e1) = 1 - 1 = 0, therefore the result must be a scalar, which it is (3).

The outer product filter is the sum of the grades.

The filter applies only to the two elements being multiplied together at a time, not to the whole multivectors. After the product is done, the results are added as usual.

Finally, just keep on finding more sources. I found it easier to work with the topic once I had a dozen articles each tackling the topic slightly differently. If my mental model of how the geometric algebra was accurate, then each paper I would read would not conflict with the mental model.

Edits:

Since my initial hints were leading people astray. They exemplified my lack of understanding, which should be resolved.
Reorganization for clarity.

Michael's Unsolicited Rants