Thursday, March 5, 2015

Proposal for Steganography

In this post, I'll be detailing the theoretical implementation of a steganographic system which hides a message within an image.  The goal is to propose a way to modify the image without modifying the underlying statistical properties of the image as checked by known techniques.  Essentially, we are modifying a statistic to encode the data, no?

This is me thinking aloud.  I think creating a toy app where people send images from the camera with hidden messages could be fun, as a toy, nothing serious.

Call it "Project Steggy".  This blog post will be updated with the ideas that consist of this project.  The end goal of this project is an app that can post images to your favourite social network with encoded stenographic features.  So your friends get the message, but the rest of the world doesn't (except for the five eyes who know everything).

What we won't do
Use the last bits of colour components for another purpose (known as LSB / Least Significant Bit).  The noise in the signal can be detected through automated means.  For example, if you have a colour with a red-green-blue tuple of (255, 254, 253) we can say that the numbers are even or odd.  Even numbers are 0, odd are 1.  In this case, the hidden message would be 101.  With a lot of colours, we can have a lot of bits and encode much more data.

Encode within the noisiest components of the discrete cosine transform.  Again, a slight alteration of the underlying statistics.  In this case, take the last components of the DCT and set them to specific values.

Considered options
Average manipulation.  Given a block of pixels, if the average intensity of the pixels in the block is even then we have a 0, and if it is odd we have 1.  If we say a block contains a single pixel, we look at whether round((red + green + blue)/3) is even or odd.  In this case, we have more freedom - we can change the red, green or blue channels to encode a value.  Furthermore, we can look at the average of reds within a 4x4 block, the greens in a 4x4 block, etc.  Essentially, reducing the amount of data encoded to better mask it within the image.  The less data to encode, the harder it is to find again.

Inverse statistics.  This idea is a little on the crazy side, but I like it.  Rather than manipulate an image, capture from a video feed until a single frame exhibits the desired statistics which will encode the image.  Such a zany solution can work if the number of bits to encode is very small and if there is a bit of noise from the camera's CCD.  As you can imagine, to store more than a few bits could take forever in waiting for the right frame.

HDR statistics.  Recall that an HDR image is composed of several shots, taken at different brightness levels, then recombined to max out the dynamic range.  Now, suppose we captured about 10 frames.  Each frame is the same image, but with some variation.  If the images are the same, then the pixels are interchangeable.  We could use this to pick and choose pixels that match the desired statistics.  On the downside, these changes will stand out when dealing with moving objects.

Chosen Option
Colour tweaking.  As far as I can tell, the problem is that we doing the equivalent of encoding a second image with fewer bits so that it is hidden to the human but can be rooted out thanks to statistical techniques.  What we wish to do is to preserve the colour range and not hide something behind every pixel.

Essentially, let us suppose that we have 'n' colours in a photograph and wish to encode 'n' (or as close to as possible 'n') bytes.  Let us not count the colours that are at the edge of the possible such as pure blue, pure white, pure black, etc.  These are most likely to appear as a complete white-spot, or black spot, and we don't want to leave a hint of manipulation by not having pure white.

Statistically, taking a picture with very little light yields about 1948 different pixel values, which is sufficient to store 243 bytes.  A quick image taken with the intent of having multiple colours has 184049 unique pixel values for a good 22 kilobytes.  Or, about 200 lines of text.  More than enough to send a message, and hold some meta data about the message itself.

Let's look at the decoding process since the encoding process would mask the scheme.

First, a decoder would take all the unique colours in the image, sort them (treat the colour as a series of 3 numbers, sort by red, then by green if the same red, and then by blue values if red and green the same), and keeping a running total of the number of duplicates.  If the number of duplicates is even, then we have a 0.  If it is odd, we have a 1.  String the digits together, and we have a code.

Encoding, as can be imagined, would suffer a few corner cases, but need not be that difficult.  First, we enumerate the colours and a count.  All the colours that match the count can be ignored.  Then, find the nearest colours (visually) in our list, then within the image change one colour to the other (ideally based upon context).  Repeat until all colours are processed.

There are a fixed number of pixels (even if the dynamic range isn't completely used).  We can't have an odd sum of pixels.  One colour -- the first, the last, or middle, whatever it may be in our sorted list -- must be a dummy.  A colour that can be used as a dummy for the last swap and does not constitute as part of the message.

Yes, the amount of data that can be transmitted has been greatly decreased.  But statistically, the image is practically identical to what it was before.  1000 unique red-green-blue combinations gives about 128 bytes to play with using this scheme.  A short twitter message.  In a 640x480 image with 1000 unique colours, we would change 0.32% of the pixels in the worst case.

iOS Specific Notes
To properly integrate with the OS, we will support iOS extensions.  One extension will add steganographic data to an image, the other will attempt to find such data in the image.

This will allow a user to seamlessly view encoded images without leaving the host app, and the same for encoding.  For example, someone could be editing an image and invoke steganography as an action.  What I am curious is if we can send to our encoder then have the option to send it elsewhere (Nested actions?).

Extensions
As the number of pixels increases, there is more leverage to fiddle with the image's underlying statistics.  Rather than limit ourselves to even/odd pixel counts, we can move to modulo arithmetic and say mod 3 can be 00, 01, or 1.  Then the encoder would do more colour swaps until the desired sums are achieved.

Message Encoding
The weakness in the method is the encoding of the message.  The distribution of colours must not betray that there is a message.  Using a key, we could hash it, and use the hash to create bit-sequences representing the characters and dummy sequences representing nothing.  We need dummy sequences to pad the message so it takes all the bits available in the colours.

My preferred encoding would use a key.  The key could be a cute picture, a hex display, a QR code, etc.  Anything that describes the bits.  Rather than try to mask the message (xor with the key for example), the key could describe a series of offsets.  Something like bit 0 is at bit 10, bit 1 is at bit 50, etc.  something that makes a mess of the message unless a proper key is supplied.

Let us suppose a key that is 16 bytes.  We know there are n unique colour combinations.  This is tricky, as the bit layout must not conform to a generic rule.  I would say use the first 11 bits for the first offset, then take the remaining 5 and the first 6 for the second offset (added to the first).

If there is a unique descriptor identifying the image as a steganographic image, then the bits should be mixed within that of the message so that as much as possible this unique code can't be used to break the key.  The first few bits in the key could drive the code that is used to identify the image.
I have not yet looked at the generation of the sequences, nor of the hashes.


No comments:

Post a Comment