Michael's Unsolicited Rants

Monday, December 2, 2013

Initial thoughts on video games and Wittgenstein

Overview

Indulging in both Wittgenstein's Tractatus and the new Legend of Zelda game have led to some mishmash of wires in my brain. Foremost, the Tractatus, the more I read it, seems to describe a game engine of sorts and the limits it imposes upon the imagination.

Since this post is slightly longer than my regular posts, I have broken it down into a few sections.
- First: we will look at the mappings, how concepts from the Tractatus map flawlessly to the Legend of Zelda: a Link Between Worlds.
- Second: I shall muse on how it describes the world.
- Third: we shall consider what this means if we were an NPC (Non-Playable Character - an character inhabiting the digital world).

For convenience, I have highlighted any terms used by Wittgenstein in italics.

A little caveat - this writing is doubly to help me further my understanding of what I read by explaining my thoughts and as a reference for the future when I look upon the text again.

Mappings

Mappings are very relevant in the Zelda games. Recall that the behaviours of objects are encoded within themselves. Interestingly, the behaviour for interacting with a bomb is not encoded within a rock. I've tried blowing up a rock repeatedly to no avail. Recall that a rock can only be picked up and thrown. There is no way to blow up a rock (except for a special type of rock with a crack in it). Use a ton of dynamite and it will always remain.

A state of affairs is essentially the layout within a dungeon. A level if you will. Using an appropriate editor, many state of affairs can be brought to life. All state of affairs are possible if imagined from objects. It is the objects of the game world that limit the imagination. Instead of composing complexes from atoms, the smallest building block is the size of a rock.

A picture is a design for a level. It can relate an actual level, but doesn't have to. It can be a subset of a level, a screen capture. We can think of a level.

A picture, in this sense can also be thought of. The thought encapsulates this toy world.

Game World

Interestingly, the thought process is further confined as we map the terms of the Tractatus to the toy world of a game.

These first few pages that build a model of the world so aptly describe the limits of our digital worlds.

I'm still working out the details of how they mesh - but there is something there. Something describing the limits of what we can make, not just what we can think.

NPCs

There is also something interesting in regards to our own world. Consider the rabbit that takes over Link's house and sets up shop - to sell all his possessions for a healthy profit to Link. We obviously know that this character is following a simple script, but let us give him the benefit of the doubt and assign consciousness to this character.

Let's suppose one day he decides to question how his world works. He would be able to analyze what is thinkable as Wittgenstein has done, eerily similar conclusions. Except, the micro world would be completely different.

To feel the world around him would be to see himself from above, hear things. He would see that he is made of pixels. But that would be normal. (I could have gone the route of seeing himself as a polygonal shape - conclusions would be similar). As long as pixels looked right without any magnification, everything would be ok.

But let's say our NPC is a scientist and decides to zoom in. Then what would be seen between the pixels?

Let's say we're dealing with an image encoded with the DCT (Discrete Cosine Transform). Zoom in a lot! As much as you wish! Underneath there is a wave-like function that provides progressively blurrier visuals. There is a point where the image is in focus.

I have deviated a bit, but this question has been in the back of my mind lately. Is there an ideal zoom for our world, and any further magnification reveals the algorithm and not the substrate? If I think of atoms as pixels (gross simplification, but hear me out), then as we zoom into atoms we are looking at a byproduct of the matter that holds the universe and not the actual matter.

Furthermore, are we at the intended zoom level, or are we just a happy accident that makes up the most basic of objects.

Wittgenstein must be rolling in his grave by now, by how I'm trying to subvert his work to apply to everyday toys.

Saturday, November 2, 2013

First thoughts on Sprite Kit

Today, I have decided to give Sprite Kit a try on iOS. I have been disenchanted with the amount of setup needed for my little projects/prototypes and wanted something extremely simple to use. Sprite Kit delivers.

First, Sprite Kit is a 2D scene graph. You add nodes and transform them within a hierarchy. There is a nice way to specify actions such as blinking lights or other little details atop a sprite.

My question was - how viable is this performance-wise. After a few tests in displaying a large 512x512 tile map, here are my results:

Sprite Kit may be able to manage draw calls, but it works best when you manage the nodes. For large enough levels, nodes will have to be swapped in and out. Too many nodes and Sprite Kit will spend too much time processing them all.
Create a cache of nodes and recycle them -- the bottleneck was the allocation and deallocation. Even with a solid color SKSpriteNode, CPU usage would hit 90% when scrolling the map!

A few other notes after playing with Sprite Kit a bit more:

Render to texture effects are achieved through the core image filter system as far as I can tell. This layout makes a feedback loop a bit challenging. They are nodes in a tree, and previous results are not in the tree.
The particle system is easy to grow out of. I tried to use it, but found that changing the velocity of all particles wasn't easy. Rendering each particle as a sprite was surprisingly efficient.
Blend modes are particularly easy to set up. They have similar names and behaviours to those in Photoshop - making for a nice way to quickly preview the result. Note - much fewer blend modes in Sprite kit (normal, add, replace, screen)
Something that deserves more attention; the ability to effortlessly replace a solid colour box with an image should not be under-rated. Why? Have an idea for a sprite? Give it the behaviour you want using solid colour boxes and see if it is fun before investing time making it look pretty. This should push people to try out many behaviours before committing to anything.

There are a few caveats. Oft when I think of some concept with a visual representation, Sprite Kit is incapable of doing what I wish (or would not do it efficiently), namely:

Complex / warping geometry would be beneficial. If some instances, rather than encode a dozen images certain effects can be done through manipulation of geometry. Suppose you had a circle and wanted it to grow spikes. Suppose even a rope - you would be limited to build the rope out of rectangular segments.
Custom shaders can't be overlooked. So many visual effects can be more readily achieved with them. That, and multi pass techniques could add so much visually. Worse, I do not see a clear step outside of the confines of Sprite Kit if ever these effects become desirable. This may change in future iOS releases. Hopefully it does.

I believe it is a great step forward in ease-of-creation for games and other multimedia apps. Unfortunately, I wish I could use custom effects (shaders) to provide more interesting visuals.

After writing this, custom GLSL shaders are supported. Still, I keep on being over-ambitious with the API.

I have subsequently written my thoughts on lighting and SpriteKit.

Friday, October 18, 2013

Effective Computation

One thing that strikes me is how much effort we humans put into writing highly precise rules for machines. As I grow older, one thing that preoccupies my thoughts is how I can effectively reduce the verbiage that the machine needs.

Current programming methodologies require an army of programmers. A ton of people to implement a solution to a problem. And, if I were to trust my intuition, the reason everyone underestimates the time to make something arises from a natural high-level view rather than a fine-combed every-detail accounted for view.

What has happened, then, to give a high level view is a massive number of functions. Really, a ton of functions. Highly specialized functions. Want to delete a file, there's a function. Want to list all folders, another function, etc. etc. there is little consistency amongst the APIs.

But, let's step back a bit. What are we actually working with? Linear algebra has this nice notion of a vector space. As long as certain basic rules are obeyed, then a set is a vector space. Then what is the verbiage that we see so often in programming languages?

First, the data. It's all dictionaries. An array is a dictionary with unsigned integer values as keys. A set is a dictionary with just keys, nil values. A queue is a dictionary with keys always increasing and the minimum key being removed... File systems are dictionaries of dictionaries. A 3D scene is a dictionary forming a tree of dictionaries.

We can even further simplify it to a tree. A binary tree. As a dictionary can be implemented using one...

If we can access all data uniformly, then what do we do with logic? Bring the most common logic into the API. Find through the file system is the same as find a node in a scene graph. This essentially forces consistency across the entire API.

If we want to drop the learning curve and increase the speed of development, then it is this that we must consider. Focus on what is peculiar to the software at hand, not get marred in the details that all software reinvents as a rite of passage.

Thursday, October 10, 2013

The Lambda Calculus and Lazy Evaluation

For quite some time I've been looking into building a programming language that is both functional and supports lazy evaluation. I may not have the energy to actually write the code, but I do keep thinking about it.

First thing I should have realized earlier: studying the theoretical foundations of functional languages (ie. The lambda calculus) would help me achieve my goals.

First, what is lazy evaluation? Essentially, an expression is only evaluated as long as the calculations lead to the final result. In a Haskell-like fictional language, I would say function "Pi" calculates Pi to an infinite number of digits. Then "take 100 Pi" would take the first 100 digits of Pi and stop computing any further digits.

My initial mental model of this process was very convoluted. Essentially, requiring back-tracking on the stack. Now, looking at the lambda calculus, it makes a lot more sense.

Recall that the lambda calculus consists of a series of variables and production rules. Rules are of the form: \xy.yx, where \ is a make-shift lambda, xy are the input, and yx the output. Essentially flipping the values.

Execution in a functional language starts with a function. Let's call it R. Well, R does nothing on its own, so let's evaluate it. We get T 3 P. Yup, same as the pseudo-Haskell above, except 3 digits. Let's try parsing this.

First, some syntax. : means element in a list. So x:P means x is the first element from P...

I expand it to something like:

(a:b:c.P)

At each step in T n, we take the head of the list from P and recursively call T (n-1). In the end, we are matching a set of 3 elements to the infinite list P. Notice that once P evaluates 3 elements, the matching is complete and any further evaluation of P is not needed.

This is what escaped me! That means structured properly I could evaluate and match expressions and get lazy evaluation for free! Or so I believe. I'll keep on reflecting on the corner cases, but for now "early match and discard" seems to do the trick.

Saturday, June 22, 2013

The Future Rants

Glimpses of a potential future have been assembled in the past two weeks thanks to leaks that media companies were happy to publish as infotainment. Interests will wane, and like all other hot topics buried and people will omit what is happening from their collective consciousness unless people feed the fire by making more of a scene from the information.

Arguments for keeping track of everything seem to revolve around preventing people from doing bad things. If the good guys of enforcement have access to all communications then they can prevent bad things from even landing on the radar. Looking at patterns, anomalies can be spotted and investigated. From the anomalies, bad people can be exposed.

Perfect people would manage this information tracking system so it would only be used for its intended purposes. Perfect people would make the perfect hardware and software setup so it can't be hacked. Perfect people would peruse the data looking for information without thinking of personal gain. Perfect people would take utmost care of the data so no outside person sees it. Perfect algorithms to sift through the data. Perfect data to identify the right person. Complete and utter perfection now and indefinitely into the future.

People aren't perfect. The world isn't split into good and evil. Good people win since the winners decide what is good -- play on history is written by the winners.

There's enough information floating on the Internet, that if someone were to capture it, for a single person they could determine income, circle of friends, health status/records, interests, means of expression, opinions, every purchase ever done, current location, affiliations, etc. A complete profile, perfect for nefarious purposes.

Now I get off my soap box. This is probably a repetition of what others said. I just needed to collect my thoughts a bit and reflect.

Tuesday, December 27, 2011

Classifiers for mobile devices

Imagine, for some reason, someone wanted to have on a mobile device a classifier that could recognize various object classes. That is, if the system recognized face it would not be able to tell to who it belonged - but it would know that it was a face. Viola and Jones provide a nice system (Haar-like classifiers) to do such classification for faces. For anyone who wants to go further, Eigenfaces are a good place to start looking.

What I ask is: is there a way to rig a classifier based on Haar-like features to classify multiple features in parallel? That is, can we efficiently detect 50+ object classes on an iPod/iPad in real time? (I like developing on iOS, I know Android is more "open", but the Apple's tools are a pleasure to use!)

Derivatives are the solution. I forget in which paper I read this, but the rectangular features that are subtracted from each other are like coarse derivatives using many pixel values for normalization purposes. This is achieved by realizing that the Haar-like features form a basis that is over-represented (it's not orthonormal).

So - in other words the Haar-like features present a reduced/compressed form of the image which is convenient for recognition purposes.

According to my arithmetic, at lower capture resolutions, 10 passes over the image have to be done to scan a 24x24 window over the image (one pass per scale). So we start with a 24x24 window that is slide across the entire image, then we increase it's size to say 28x28 pixels and slide it across the image, and so forth. According to Viola and Jones the increment should percentage of the original size (as is the shifting). I really recommend reading their paper: Rapid Object Detection using a Boosted Cascade of Simple Features.

What I started to ask myself: could a Haar-like feature be shared among multiple classes of objects? Yes. The features are differences. The varying thresholds to accept a feature may pose a problem, but that would be solved with how the data is organized.

But I have a bet. An interesting bet. If we have just a horizontal and vertical feature. Could we restrict it to a square? Yes... That means that the first feature we pick is going to be a square (either horizontal or vertical split for the subtraction). It also means that we have divided the space into 2 buckets.

Ok, next round - what is the standard deviation among the features from the training set? That should determine the next round of features that will be placed spatially around the initial feature. So we detect a rectangle, but that rectangle may belong to a larger feature which we must then scan for by checking surrounding rectangles.

However, we can not let the number of rectangles to test explode after the initial round.

By using this trick, Adaboost in the form that Viola and Jones describes would not be sufficient. Rather, I believe rigging the features so they don't overlap may provide the solution that I need.

This is as far as my thinking has gone (except for a means of accelerating the training so it could be done on the iOS device as well - I've solved it and managed to find a way to reduce storage needs, but that is the simple part of the puzzle that I'm looking at).

Sunday, November 27, 2011

Haskell "to add or multiply" (slow version)

Today I'm rewriting this entire post (minus the code - it's very ugly code in retrospect). So I'm trying to solve the "to add or multiply" problem from the ACM 2011 finals. My hunch last week was that it could be solved in linear time. This week I explored what I thought would be a linear solution and came back with some interesting notes.

Before I jump into the subject; the problem gives two ranges of integers (start and end), and two constants 'a' and 'm'. Find arbitrary positive integers [i1, i2, i3, ... in] such that the following is contained within the range 'end':
(((start+i1*a)*m^i2 + i3*a)*m^i4 + i5*a) ....

(where arithmetic on ranges is identical to operations on a 2-vector. a range consists of a start-value and end-value)

First, I now believe that there is no linear-time solution. Intuitively, the reason is because there is a destination range and not a destination number.

Consider the inclusive range [a,b]. Now, there's a number 'x' that I can multiply an integer 'j' to get a value between a and b. I also want 'j' to be minimal. This is quite simple - if 'a mod x = 0' then the answer is a/x else it is 1+a/x. Recall the computer always floors integer data, we would want the ceiling.

Within the problem, there's a point at which both j=a/x and j=1+a/x should be explored. Below I'll try to informally explain this:

For the problem, it is possible to find a value 'a' that is the maximum number of multiplications that may be applied to [p,q] (the start range) until the values exceed [r,s] or the number of integers within the range of [p,q] exceeds that of [r,s].

The maximum number of multiplications is invariant no matter how many addition operators are present. Simple example ('x' is a starting value, 'a' and 'm' are integers):
(x+a)m = xm + am

'am' will shift values a constant amount regardless of the starting value 'x'. This is important, look at a range [2,3] -- [2,3] has 3 digits, [2,3]*4 = [8,12] has 5 digits. We can add integers before the multiplication and the number of digits will not change. This allows us to compute another value -- the minimum number of additions needed until a solution.

If it is not possible to find a series of additions that satisfies the start range and end range, then we can not conclude that there is no solution. Consider that the increments of addition is 100,000 and multiplication is 2 with start range [1,2] to [5,20].

Neither is it possible to rely solely on multiplication. Consider addition of 1 and multiplication of 7, start range [1,2] and end range [14,21]. The following is contained within the range: ([1,2]+1)*7

Knowing these two values helps us when searching for a solution. The solution becomes:
x*m^j + i1*a*m^0 + i2*a*m^1 + ...

It's a matter of finding i1, i2, i... (j-1 unknowns). If multiplication is 1 or 0, or if addition is 0 then the solution can be directly computed in constant time (depending upon your implementation of log - or if you decide to loop over values for multiplication -- which makes it O(n)).

Anyhow, a good strategy is to attempt to maximize the i for the m with the largest exponents. Think of this as a heuristic. When the number of integers can be in the tens of thousands, doing a brute-force search will be slow and memory consuming (simple arithmetic, just it will look like a mess here).

I'll continue playing with the numbers, maybe something interesting will pop out.

For historical purposes, here's a very slow / bad implementation of "to add or multiply":

import Data.Monoid
import Data.Char

-- AddOrMultiply
-- Given the ability to add 'a' or multiply 'm', see if there is a sequence
-- that starts in range [p,q] and ends in range [r,s]

data Operation = A Integer | M Integer

instance Show (Operation) where
show (A v) = " " ++ (show v) ++ "A"
show (M v) = " " ++ (show v) ++ "M"

apply :: Operation -> Integer -> Integer
apply (A v) a = (v+a)
apply (M v) a = (v*a)

data Partial = Partial (Integer,Integer,Integer) [Operation] deriving (Show)

applyl :: (Integer,Integer) -> [Operation] -> Partial
applyl (min,max) os = Partial (foldr apply min os,foldr apply max os,0) os

incrementMul :: Operation -> Integer -> Integer
incrementMul (M m) v = 1
incrementMul _ v = v

combineOps :: Operation -> [Operation] -> [Operation]
combineOps o [] = [o]
combineOps (A a) ((A as):xs) = (A $ a+as):xs
combineOps (M m) ((M ms):xs) = (M $ m+ms):xs
combineOps o xs = o:xs

add :: Operation -> Partial -> Partial
add o (Partial (min,max,mul) os) = Partial (apply o min, apply o max, incrementMul o mul) $ combineOps o os

validAdd :: (Integer,Integer)->Partial -> Bool
validAdd _ (Partial (_,_,0) _) = True
validAdd (a,m) (Partial _ ((A a1):_)) = a1 < (a*m)
validAdd _ _ = True

valid :: (Integer,Integer) -> (Integer,Integer) -> Partial -> Bool
valid am (r,s) (Partial (min,max,mul) os) = if (max <= s && max-min <= s-r && (validAdd am $ Partial (min,max,mul) os)) then True else False

iteration :: (Integer,Integer) -> (Integer,Integer) -> Partial -> [Partial]
iteration (a,m) rs ptl = filter (valid (a,m) rs) [add (A a) ptl, add (M m) ptl]

-- concatMap :: (a -> [b]) -> [a] -> [b]

startCondition :: (Integer,Integer) -> [Partial]
startCondition (p,q) = [Partial (p,q,0) []]

success :: (Integer,Integer) -> Partial -> Bool
success (r,s) (Partial (min,max,_) _) = if (min >= r && max <= s) then True else False

iterationl :: (Integer,Integer) -> (Integer,Integer) -> [Partial] -> [Partial]
iterationl (a,m) rs ps = concatMap (iteration (a,m) rs) ps

everything :: (Integer,Integer) -> (Integer,Integer) -> [Partial] -> [Partial]
everything _ _ [] = []
everything am rs ps = ps ++ (everything am rs $ iterationl am rs ps)

solutions :: (Integer,Integer) -> (Integer,Integer) -> (Integer,Integer) -> [Partial]
solutions am pq rs = filter (success rs) $ everything am rs $ startCondition pq

type Ampqrs = (Integer,Integer,Integer,Integer,Integer,Integer)

reformat :: (Integer,Integer) -> [Operation] -> [Operation]
reformat _ [] = []
reformat am ((A a1):(A a2):xs) = reformat am $ (A $ a1+a2):xs
reformat am ((M m1):(M m2):xs) = reformat am $ (M $ m1+m2):xs
reformat (a,m) ((A a1):xs) = ((A $ a1 `div` a):(reformat (a,m) xs))
reformat (a,m) ((M m1):xs) = ((M $ m1 `div` m):(reformat (a,m) xs))

solutionPartial :: Ampqrs -> [Partial]
solutionPartial (a,m,p,q,r,s) = take 1 $ solutions (a,m) (p,q) (r,s)

operationFromPartial :: Partial -> [Operation]
operationFromPartial (Partial _ o) = o

solution' :: (Integer,Integer) -> [Partial] -> Maybe [Operation]
solution' _ [] = Nothing
solution' am xs = Just . reverse . (reformat am) . operationFromPartial . head $ xs

solutionOp :: Ampqrs -> Maybe [Operation]
solutionOp (a,m,p,q,r,s) = solution' (a,m) $ solutionPartial (a,m,p,q,r,s)

solutionOp' :: [String] -> Maybe [Operation]
solutionOp' [a,m,p,q,r,s] = solutionOp (read a, read m, read p, read q, read r, read s)

parseLine :: String -> Maybe [Operation]
parseLine s = solutionOp' $ words s

display :: Integer -> [String] -> String
display _ [] = []
display _ [a] = []
display i (x:xs) = ("Case " ++ (show i) ++ ": " ++ (show $ parseLine x)) ++ ['\n'] ++ (display (i+1) xs)

main = do
contents <- getContents
putStr . (display 1) $ lines contents