Thinking in terms of data transformations

Sep 22, 2012

This is a successor to my previous post, Haskellify your Scala, giving another example of how to design Scala in a Haskellish fashion. I am describing a way of thinking design in terms of data transformations, instead of a sequence of instructions to mutate state.

Imperative programming focuses on executing a series of statements to mutate the program state into something desirable. It is fundamentally based on the idea of executing instructions on the CPU. Modern languages provide high level abstractions for specifying the instructions, of course. Java programmers rarely need to care about the Java bytecode instructions 1, and even less so about the actual CPU instructions.

However, none of these abstractions change the underlying mindset. The program is still a sequence of statements, albeit well structured sequence of statements, that are executed in the defined order to achieve the goal.

Functional programming offers a different way of reasoning about programs. Instead of thinking in terms of sequential statements, programs can be viewed as ways of transforming input data into something more useful than the input itself.

It’s about first modelling the problem by defining appropriate data structures, such as lists, queues and trees, and then defining functions, which transform this data.

A tale of two Pong bots

To illustrate my point, I’ll show you two imaginary designs that could take part in the Hello World Open competition. In Hello World Open, programmers compete against each other by writing bots for the game of Pong. Bots receive events over the network about the current state of the game, while controlling their own paddle by sending commands to the server.

The point of this example is not to describe a great Pong bot, but the difference between designing imperatively and functionally. I am presenting two ways of structuring a Pong bot in Scala.

An imperative approach to implementing the core bot logic and the main loop could look something like this (Full source as Gist).

class Bot(connection: PongConnection) {
  var myDirection      = 0
  var missileInventory = 0
  var enemyPosition    = 0

  def update(event: Event) {
    if (conditionA(event)) {
      goUp()
    } else if (conditionB(event)) {
      shootMissile()
    }
  } 
    
  private def goUp() {
    myDirection = 1
    connection.moveUp()
  }

  private def shootMissile() {
    missileInventory -= 1
    connection.shootMissile()
  }
  
  private def conditionA(event: Event): Boolean = sys.error("not implemented")
  private def conditionB(event: Event): Boolean = sys.error("not implemented")
}

object Pong extends App {
  val connection = new PongConnection
  val bot        = new Bot(connection)
  while (connection.isConnected()) {
    val event = connection.readEvent()
    bot.update(event)
  }
} 

A functional version, structured around the idea of transforming data, could look more like this (Full source as Gist).

object Bot {
  def actions(state: State, event: Event): (State, Seq[Action]) = (state, event) match {
    case (s,e) if conditionA(s,e) => State.up(state)    -> Actions.up
    case (s,e) if conditionB(s,e) => State.shoot(state) -> Actions.shoot
    case _                        => state              -> Actions.none
  }

  def conditionA(state: State, event: Event): Boolean = sys.error("not implemented")
  def conditionB(state: State, event: Event): Boolean = sys.error("not implemented")
}

object Pong extends App {
  val events       = createSource()
  val initialState = State()
  
  events.foldLeft(initialState) { case (state, msg) =>
    val (newState, actions) = Bot.actions(state, msg)
    actions.foreach(execute)
    newState
  }

  private def createSource(): Iterator[Event] = sys.error("not implemented")
  private def execute(action: Action) = sys.error("not implemented")
}

Key differences in the two designs

There are countless ways of making both of these implementations better, but I want to concentrate on the key differences between the thought process behind them. This is not a competition between the imperative and the functional version, but an illustration of the differences.

The functional version is built around the concept of transforming the current state and incoming event into a sequence of formalized actions. The imperative version, on the other hand, is executing a series of instructions by calling other objects, in hopes of leading the bot to a victory.

Structuring program as data transformations leads to some nice things. In the functional version it’s easy to isolate all side-effects out of the core bot logic. The decision function is only transforming (State, Event) tuples into new a new state and a sequence of Actions. The imperative version, in contrast, is using Connection to move our world into the desired new state. Nothing prevents you from implementing similar separations for the imperative version, but it usually doesn’t feel as natural.

Functional data transformations are also conveniently testable using techniques like ScalaCheck. It generates a set of random inputs and asserts that given properties are true for all outputs. Similar testing is harder to achieve for imperative code, even if you would utilize e.g. mock objects.

Summary

Thinking in terms of data transformations can lead to better designs. It is one of the ways you can Haskellify Scala code. Transformations, and the data-centric approach in general, forces you to clarify and formalize concepts. It also makes isolating side-effects easy.


  1. Unless you’re doing performance optimizations, at which point javap becomes an invaluable tool, especially with Scala. I could rant a lot about this, but this is not the post for that :)

akisaarinen

A software guy exploring (mostly) technical things.

More about Aki

Follow @akisaarinen


See all posts

Thanks for reading. If you liked this, leave a comment or reach out to me at Twitter or email. You can take a look at my other posts here.
comments powered by Disqus