Friday, July 1, 2011

My Latest Haskell Experiments

Many moons have risen in the night sky as I tried with great difficulty to become a Haskell master.  Lacking knowledge of category theory and Monads being this abstract notion with no concrete definition.  My infatuation with the language began with the desire to write a better programming language where I could express select computations and generate code to optimally handle a variety of data structures.  Haskell expressed purity in its functioning and strict type checking.  Winner it clearly was.

This year, marking at least the third, I have again revisited Haskell.  No language has managed to elude me before.  I will figure this one out.  This year, a book I have invested in.  A book I have read cover to cover. Another attempt at writing a code generator followed.

As I programmed Prolog, my mind always translated the code to it's sequential equivalent.  If this is done, then all possibilities are recursively enumerated until a solution is found.

Haskell, with Monads, is another story.  My problem was this silly attempt to maintain an imperative mindset rather than thinking functionally.  Hence, now I will document what I have discovered -- as many must have done before.  For someone wiser than I who can extrapolate the interesting portions and run forward -- someone who wishes to see how the thinking differs.  For tutorials, many have written much better texts than I could.

I'll start with the concept of laziness.  Haskell is lazy.  In an imperative language, suppose you had a for loop.  As long as it writes to a memory location (or potentially could write to a memory location) it will be run.  Haskell takes a goal and follows a chain of dependencies until it has finished the calculation at hand.  Back to our for loop, this means the compiler (or language) would need to provide a way to determine if a given write to a memory address will actually impact the resulting computation in the future.  This is possible for certain local computations on the stack, but anything dealing with shared objects is next to impossible.  Forget about static memory locations.

If we logically follow this path of thought, it means that Haskell has some restrictions on the language to allow it to do such reasoning.  Two things come to mind.  First global non-constant variables must go.  Suppose an object were to change a global variable, how would it know that no other object were to access that variable?  It becomes impossible with libraries unless everything must register for the data it will read from a-priori.

Actually, for the second thing, local variables as well should be constant.  Parameters passed to functions should be constant.  As should be the return values.  Essentially, all data becomes immutable.  Why do we want this?  All computations can now be represented as an acyclic graph.  You can start with a variable -- it is computed by calling a function, let's say.  That function's input is known, and the result is computed from information that is known.  So, let's say every computation that must be done for any value is known.  Then all that you need is to look at the graph of computations, only compute the nodes (that hold the operations) whose result benefits the desired computation at hand.

Some may wonder about determinism.  That is covered by the second thing.  If every variable is constant and can only change by creating a new constant whose value is computed by a series of predictable constant values then a function will always return the same output for a given input.

Implied by determinism and constant everything is the need for recursion.  Why recursion?  It creates more stack space to put constant values.  It also creates a framework to destroy values when they are no longer needed.

So that leads us to state...  how do we manage state?  Here, Haskell is very elegant.  Imagine you have an object that represents state.  Manipulating that object returns a new state object.  That old state object is discarded and the new one is used.  Haskell provides ways to automate this process.  This is the last piece of mind-bending...

Consider a multi-parameter function that returns a single value.  Suppose you gave a single parameter to the function and a new function was created with one-less parameter.  Let's call the first parameter the state and the return value is a combination of the state and some value (which we have yet to define).

Now, let's create functions that work on two inputs (the state and the data) and outputs two things (the state and the data).  The first function takes in a function, state, data, and outputs state and data.  Suppose the function to our function just manipulates data based on the current state.

A bit mind-bending, but now the state management is separated from the computations.  This doesn't always work, but when it does it's quite neat.  Error handling can be simplified in a similar way.  The error state can be a container that checks return values to make sure they are valid.  Push the idea further, since it's a function calling a function doing the actual work, the function that does the calling can first ensure that only non-error states can continue computations.  Want an exception state?  sure, have a function that takes two functions (the second only gets called on error).

Let's push this a bit further.  Our error functions last parameter is the data, but the first is the state.  Recall that we can assign values to the first parameters to create new functions?  Now, push this further and say assign a function with the promise to evaluate the results if successful.  In C this might look like: eval(func1, data1, eval(func2, data2, eval(func3, data3, ...)))...  Messy.   In Haskell: eval func1 data1 $ eval func2 data2 $ eval func3 data3.  Read it backwards...

Isn't that cleaner?  I'd need to get back to the book to give further insights as I'm too quickly jumping into Monads...  But, I think the point is made.

Haskell glues functions within functions (think containers) to abstract away computations.  Error handling can be seen as a series of functions.  As is state maintenance.  The typical machinery for this is a Monad.

Ok, quickly.  Monads.  Monads just say that a given object (some type of state) wraps another object (computation).  They define operators that must function for each Monad, and each Monad must conform to a set of rules so they behave as other programmers expect.

First, a Monad should be able to take a computation object and wrap it in a state object.  Second, given a function operating on the data, the Monad should be able to do the computations exactly as though they were done devoid of state.  Last, computations should give the same result no matter the order.  (a*b)*c = a*(b*c).

Well, something like that.  The concept, for a person used to the imperative style, should see a consistent use of the class adapter pattern.  The adaptee is the computation, the client is the state.

No comments:

Post a Comment