Thursday, September 28, 2017

SIL: Explorations in non-Turing Completeness

I want to demonstrate that Turing Completeness is unnecessary for general purpose computing, through something I have called SIL.

SIL either stands for Stand-In Language, or Simple Intermediary Language. I intended it to stand for one thing, then forgot what it was, then backronymed the other meaning.

SIL has a very simple core grammar, based on the simply typed lambda calculus, consisting of 10 instructions. On top of this, is a Prelude.sil file that defines common functions you find in functional programming languages, like fold, map, zip, etc.

SIL's type system is very simple. It's constructed of a generic "data" type, a pair type, and an arrow type (taking a type and returning a type). I'm working on adding refinement typing which will be the key to runtime optimizations.

At the stage I'm at, I believe I have demonstrated that general purpose computing is possible in SIL. For evidence, please look at the tictactoe.sil file, which implements a 2-player (no AI) game. If you're not convinced this is an adequate demonstration, please tell me something else to write!

(note: the master branch is outdated, for the best mix of up-to-dateness and non-brokeness, try the pairType branch)

The Core Grammar

Zero: this can represent the number zero, or a terminator
Pair: this is a pair of other expressions. We can use nested pairs to represent lists, numbers, or really any form of data.
Var: this is misnamed. It actually represents the current closure environment. by manipulating it, we can pull out specific variables
App: this feeds data to a function
Anno: this is a deprecated instruction, applying one expression encoding a type annotation to another expression, to be evaluated at compile time. In the refinementTypes branch, work is being made to replace this with a more generic "Check" expression that allows checking an expression against a precondition function.
Gate: creates conditional behavior. What is passed in acts as a switch which needs to be applied to a Pair. If the input is Zero, it selects the left side of the pair, anything else, and it selects the right side.
PLeft: returns the left side of a Pair
PRight: returns the right side of a Pair
Trace: this instruction is simply used for debugging while designing the rest of the language. It echoes the expression within to the console, then evaluates it.
Closure: Originally this was a lambda expression, representing a function. For historical reasons (and possibly future reasons?), I changed this to a closure, but it mostly serves the purpose of a lambda - creating a function.

Using nested pairs we can represent a stack, and this representation is used for closure environments. Values on the stack are the left sides of pairs, and the right sides hold the remainder of the stack. Thus, PLeft Var is an expression representing the first value on the stack, and PLeft (PRight Var) represents the second value.

Functional programmers may be aware that a fixed point combinator can be used to make non-recursive functions recursive. Through this same principle, we can apply a non-recursive function to a church numeral together with a function to return a default value, and get a function that can recurse limited to the magnitude of the church numeral. This is how SIL handles recursion.

Most practical programs can potentially run forever. How can a language where all expressions terminate be useful?

The key to this, is plugging the core grammar into input/output loop "frames". Each frame takes a pair of the input and program state, and returns a pair of output and program state. If running the SIL expression generates a null program state, the program is aborted, otherwise the state is fed back into the frame, and the program continues executing.

The Parser

Since constructing SIL ASTs by hand to write programs isn't fun, I've written a simple parser. It translates a simple Haskell-like syntax into the AST.

Identifiers (variables) in the syntax are translated to series of instructions that will look up that variable in the relevant closure environment.

The Type Checker

I am in the middle of rewriting the type checker to support refinement types, so I don't want to give too much information here that will be quickly out of date. With refinement types, type checking will be a two stage process.

The key function of the first stage is making sure that the expression will terminate. For each element of the core grammar, we can infer a simple type composed of 4 type elements: ZeroType, PairType, ArrowType, TypeVariable (which should really be called EnvironmentType)

Since SIL is based on simply typed lambda calculus, when we infer a simple type for an expression, if that type is self-referential (as in the case of a fixed-point combinator), that expression will not terminate, and we should generate a type error.

With refinement typing, the whole SIL language will be made available to validate expressions. Each expression can be pared with a function written in SIL. If the expression has no free variables, it can be processed at compile time, and alert the programmer if they have semantic errors.

To handle expressions with free variables, we can run an optimization process on the AST to move all refinement checks to the top level of the AST. At the top level, either the refinement checks will depend on no free variables and can be run at compile time, or will be dependent upon non-deterministic runtime values.

If a user of a program written in SIL inputs data that does not pass runtime checks, the error can be reported to the user before any other execution is performed, and the user can instead input different data. This prevents the SIL program from entering many unrecoverable states. The exception (pun unintended) is when there is no possible user input that will pass validation.

Summary

Hopefully this gives a good view of the current state and near future of SIL. I'm hoping for other contributors and/or feedback.

Wednesday, December 14, 2016

A Better Model of Computation

Modern software engineering is based on problematic fundamentals.

One thing I loved about computers when I was young, was their deterministic nature. If you understood their rules, you could perfectly predict what they would do. However, as the complexity of computers and software has grown, they increasingly behave in practically non-deterministic ways. Programs often crash or suddenly run out of memory, or hang, or get really slow for no apparent reason.

Some of this in unavoidable, since complexity is the enemy of practical predictability. But we could be doing much, much better than how we are now.

Computer programs have side effects that we care deeply about, such as memory use, CPU time utilization and even heat generation. But if you ask a programmer how much resources their program uses, you'll be lucky if they can give you even the vaguest of answers. We measure all these effects at a top level, and hope that our benchmark runs are representative of real world use. What we should be doing, is using the computer to measure these things from the bottom up.

What we should have, is a world where we know exactly how long a computer program will take to execute down to the nanosecond, given specific inputs.

But wait, doesn't that involve solving the halting problem?

Yes, but only if we're using Turing complete languages. That's why we need to abandon Turing completeness as our foundation paradigm.

To illustrate my perspective, we can segregate the Turing Machine model into two parts, a terminating pure function taking a state and returning a state, and a simple fixed-point combinator for feeding the pure function into itself. The fixed-point combinator is trivial and not worth talking about; all the interesting logic lives in the terminating part.

The pure functional programming community has been working on many such segregations of deterministic and non-deterministic code already. The FRP model (if you get past the continuous time part, which distracts from my main point) shows that we can use a state machine to bridge the IO behavior of a computer and the pure functions that actually define what we care about in software. It's a small jump from there, to eliminating non-terminating pure functions entirely.

Of course, perfect deterministic behavior relies on an open hardware model, which unfortunately is not the case in our current world. It would take the compiler having a perfect model of the CPU it was compiling for, with all its caches, branch prediction and instruction prefetch circuitry modeled in software. But even without that, the terminating paradigm should allow compilers to give us bounding predictions that are much more detailed than current big omega metrics.

So if this model of computation is truly better, how to go about demonstrating it?

My plan is to develop a simply typed lambda calculus, marry it to an FRP framework, and write a simple game like tic-tac-toe in it. After that, I want to add in refinement typing and work it into the FRP framework so that inputs that would violate preconditions are not allowed rather than generate runtime errors.

There is much more I could talk about, such as implications for operating system design, opportunities for optimization and how to transpile existing languages into a terminating model, but I'll wait until I have a working proof-of-concept to write about what comes next.

Thursday, September 18, 2014

The Stupidity of the Blockchain Fragmentation

The world is now filled with cryptocurrency fans, and it would be great, but we're cannibalizing our movement. It's Bitcoin against BitSharesX against Ethereum against Ripple... we've created a situation where we're unwilling to have the others succeed, because it threatens our investments.

Our infighting is unnecessary.

I invested in the Ethereum ICO. Shortly after making my investment, I stumbled on a proposal on bitcointalk. I'm glad that my money has gone towards rewarding Vitalik Buterin and all the rest of the development team for their efforts, but I feel I have to take a public stand now, and say I hope Æthereum succeeds, and Ethereum does not, even though I own a greater proportion of ether than bitcoins.

For those who don't have the patience to read the Æthereum proposal, basically it's all the same code, but a different blockchain. This different blockchain will contain a snapshot of all bitcoin address balances at a certain time, and a mechanism with which bitcoin holders can use their private (bitcoin) keys to retrieve a proportional amount of ether. So bitcoin holders can upgrade to a (possibly) superior technology and keep their original investment safe.

All new cryptocurrencies need this ability. If you see any new ICO that does not give bitcoin holders an automatic stake, you should consider it a scam. Maybe the developers have some quixotic scheme for a more equitable initial distribution of coins, but I'm going to make the judgement that's unlikely to pan out in any form for any coin.

Let me make it clear that I do support developers and initial investors reaping a reward. But I think that can be safely capped at 10-20% initial stake, with bitcoin holders getting the rest.

Cryptocurrency investors should only need to invest once in the idea of cryptocurrency, in archetypal bitcoin, and be able to reap the rewards no matter how the technology changes. It's not the blockchain that matters, it's the balances that matters; it's the weight of emotion: hope, and yes, greed, symbolized in electronic numbers that matters.

Bitcoin sooner or later will be replaced, and we need to have an orderly exit. We can't just hack anything on top of a bitcoin blockchain in perpetuity and expect security and scalability.

We need to stop doing things like BitShares PTS. We already have a blockchain indicating interest in cryptocurrency, and it's called bitcoin.

I really, really hope this post or something similar said in a better way gets popular and permeates the entire cryptocurrency world.

Sunday, April 19, 2009

Learning Happstack

I've been trying to shoehorn Haskell into my job for a while now. Occasionally I'll use it for small command-line tools, but usually it doesn't save any time in such a scenario, so it's hard to justify.

But we do have some internal webapps that need updating. Some of them do horrible things. For instance, to create a customer account, a webapp will write a bunch of raw shell commands to a database where they will be later pulled out by a daemon and run directly. I dunno why this was done, poor-man's threading perhaps.

I'm not touching that particular webapp, since I'm afraid modifying it will unravel more than I hope to correct, but I'm hoping I could shoehorn in some happstack apps in a few other places.

So let's jump right into some code I wrote to try out a simple templated app:

> import Happstack.Server
>    (ServerPartT, Response, dir, toResponse, simpleHTTP, Conf (Conf))
> import Happstack.Helpers (HtmlString (HtmlString))
> import Control.Monad.Trans (lift)
> import Control.Monad (msum, liftM)

If you're going to use happstack, it seems you have to be pretty comfortable with navigating between different levels in composite monads.

> import Text.StringTemplate.Helpers (renderTemplateGroup)
> import Text.StringTemplate (directoryGroup)

It seems that HStringTemplate is the standard way to template HTML in happstack. I think that using '$' as brackets is pretty ugly in HTML. Ideally the template language should mix well with HTML. I've wondered if this couldn't be done with a modified version of html comments (). It's rather verbose, but it makes for valid HTML before the template is rendered. But this is a small complaint.

> exampleMethod :: IO [(String, String)]
> exampleMethod = return $ [("hello", "Hello, World.")]
>
> methodsAssList :: [(String, IO [(String, String)])]
> methodsAssList = [("exampleMethod", exampleMethod)]

I wanted to show in this example how requests for different paths could be handled by different methods. In this example that handling is very simplistic. I think a web framework should have a standard way of handling requests, like breaking the request into a path parameter, a request method and a map of string request variables. I haven't discovered such a thing in happstack yet.

> makeMethods :: IO [(String, IO String)]
> makeMethods = do
>    templates <- directoryGroup "templates"
>    let
>       makeRenderTemplateFunction (name, assListM) = let
>             renderFun = do
>                assList <- assListM
>                return $ renderTemplateGroup templates assList name
>          in (name, renderFun)
>
>    return . map makeRenderTemplateFunction $ methodsAssList

This function is used because the HStringTemplate library can load a group of templates in a single directory at once with "directoryGroup". Then we can translate each method name to a name of a template (i.e. templates/exampleMethod.st)

So all the exampleMethod written above does, is give an association list of variables to replace in the template, and their values.

> fillRequest :: (Monad m) => String -> m String -> ServerPartT m Response
> fillRequest name method =
>    dir name . lift .
>       liftM (toResponse . HtmlString) $ method

This simply associates a method with a path from the root.

> responses :: IO (ServerPartT IO Response)
> responses = liftM (msum . map (uncurry fillRequest)) makeMethods

Now we make a response for each path request to its method, and group these
responses together in one response.

> main = responses >>= simpleHTTP (Conf 8080 Nothing)

finally, we set up the server with some basic settings (i.e. the server responds on port 8080), and set the response.

The only other thing necessary is writing the template at "templates/exampleMethod.st":

<html><body><h2> $ hello $ </h2></body></html>

Sfultong's Code