Sunday, July 4, 2010

Learn the basic concepts and significance of functional programming in 10 minutes

Introduction


If you're a Java programmer, this short article will show you the basic concepts of functional programming and the reason why it may be getting more and more important in the near future. All sample code is in Java so it should be very easy to understand.

What is functional programming?


Functional programming is just like our programming in Java or C#, but without assignments. You may wonder is this possible at all? For example, to calculate the total price of a list of product prices, one might write in functional programming style (the code is still in Java):

abstract class List {
}

class EmptyList extends List {
}

class NonEmptyList extends List {
int head;
List tail;
}

class Foo {
int getTotalPrice(List ps) {
if (ps instanceof EmptyList) {
return 0;
} else {
final NonEmptyList nl = (NonEmptyList) ps; //initialization, NOT assignment
return nl.head + getTotalPrice(nl.tail);
}
}
}

Note that in line 17 there seems to be an assignment, but it is not: It is an initialization and the variable is declared as final, so no modification can be made. This is perfectly allowed in functional programming.
So you can see that assignment is not really required. But why the lack of assignment is a good thing?

Why functional programming may be huge in the near future?


To see the benefit, let's assume that if a price is >= 100 then you'll give a 20% off discount. So, you may modify the code as:

class Foo {
int getTotalPrice(List ps) {
if (ps instanceof EmptyList) {
return 0;
} else {
final NonEmptyList nl = (NonEmptyList) ps;
final int headDiscountedPrice = nl.head >= 100 ? (int) (nl.head * 0.8) : nl.head;
final int tailPrice = getTotalPrice(nl.tail);
return headDiscountedPrice + tailPrice;
}
}
}

Note the three initializations. Because all the variables and fields can't be modified, their order is unimportant and, in a real functional programming language, the order can be changed at wish without changing the return value. For example, you could change it like:

class Foo {
int getTotalPrice(List ps) {
if (ps instanceof EmptyList) {
return 0;
} else {
final int tailPrice = getTotalPrice(nl.tail); //nl is not initialized yet, how it works?
final int headDiscountedPrice = nl.head >= 100 ? (int) (nl.head * 0.8) : nl.head;
final NonEmptyList nl = (NonEmptyList) ps;
return headDiscountedPrice + tailPrice;
}
}
}

Note that when initializing the tailPrice variable, the nl variable hasn't been initialized yet. Will this cause a problem? No. In a real functional programming language, each of three variables will be initialized at the same time with a lazy expression. When the value is really needed, the lazy expression will be evaluated. So, if the value of tailPrice is needed but nl hasn't been evaluated yet, it will be evaluated and the calculation will proceed. No matter what execution order is, the final total price will be the same.
Now, let's get to the core issue of why this is important. As the order of these expressions are unimportant, they can be evaluated concurrently. As nowadays we're getting more CPU cores instead of speedier single CPU, this programming model may become the mainstream in the future as the evaluations of different expressions can be done in different cores.
Note that the semicolon in the code now has a different meaning: In Java, it means sequential execution, but in a real functional programming language, it only separates the expressions but there is no ordering at all. That is, in a real functional programming language, there is no obvious concept of sequential execution.

How to maintain states or write to a database?


How to write to a database in functional programming? Let's say there is special built-in methods to perform reading and writing to a certain database record. Let's try to increment the value of the record twice with the code below:

class Foo {
int read() {
//...
}
void write(int v) {
//...
}
void inc() {
final int v = read();
final int w = v+1;
write(w);
final int t = w+1;
write(t); //Problem: It may occur before write(w)!
}
}

The problem with the code is that, because there is no ordering, write(t) may be evaluated before write(w), so finally the value will have been increased by one, not two! The problem here is that, once we have assignment (the write operation to the record), the ordering becomes important, but there is no obvious ordering in functional programming.

Creating sequential ordering in functional programming


In fact, it is possible to create the effect of sequential evaluation in functional programming. For the above example, you can restructure the code as:

interface Function {
Object eval(Object arg);
}

class Foo {
int write(int v) {
...
}
void inc() {
final int v = read();
new Function() {

public Object eval(Object arg) {
final int w = v+1;
final int r = write(w);
return new Function() {

public Object eval(Object arg) {
final int t = w+1;
return write(t);
}
}.eval(r); //Force r to be evaluated before going into eval()
}
}.eval(v); //Force v to be evaluated before going into eval()
}

Now, you're putting the logic into three steps: step 1: read the record. step 2: write w to the record. step 3: write t to the record. To make sure step 1 is executed before step 2, you wrap step 2 into a Function object and pass the result of step 1 (the "v" variable) as an argument to its eval() method. In a real functional programming language, when calling a method, its arguments will be fully evaluated. So, this guarantees that step 1 is executed before step 2. You use the same trick to put step 3 into a Function object inside step 2 and pass the result of step 2 (the "r" variable) as an argument to its eval() method. For this to work, the write() method must return something instead of void. So, I changed it to return an int.
It works, but the code looks complex. To make it simpler, you can introduce a method like:

class Foo {
Object chain(Object arg, Function f) {
return f.eval(arg);
}
void inc() {
final int v = read();
chain(v, new Function() {

public Object eval(Object arg) {
final int w = v+1;
final int r = write(w);
return chain(r, new Function() {

public Object eval(Object arg) {
final int t = w+1;
return write(t);
}
});
}
});
}

Once you have such a chain() method, you can perform more interesting things. For example, instead of passing the previous result directly as the argument, you could expect it to be an instruction to access the database so that you perform the database access in the chain() in the top level of your program, so that all the code inside is free of any side-effect:

class Read {
}

class Write {
int v;

public Write(int v) {
this.v = v;
}
}
class Foo {
Object chain(Object arg, Function f) {
if (arg instanceof Read) { //arg is now the instruction
return f.eval(read()); //Perform the side effect here
}
if (arg instanceof Write) {
final int v = ((Write)arg).v;
return f.eval(write(v)); //Perform the side effect here
}
return null;
}
void inc() {
chain(new Read(), new Function() {

public Object eval(Object arg) {
final int w = (Integer)arg+1;
final Write r = new Write(w);
return chain(r, new Function() {

public Object eval(Object arg) {
final int t = w+1;
return chain(new Write(t), ...);
}
});
}
});
}
}

In addition, you can perform other interesting things in the chain() method. For example, the argument could also contain an integer indicating the number of available resources. Every time you could reduce it by one before passing to the next step. If it is zero, you could abort.
BTW, you've just learned one of the most intriguing concepts in functional programming: monad. It is just the way above to achieve sequencing, optionally manipulating the previous result, and combining it with the next step.

Better syntax?


Even though it works, the multiple levels of embedding is still very complicated. In a real functional programming, you won't need to write so much to create the anonymous Function object, to define the eval() method and you probably can use an infix operator in place of the chain() method. For example, in Haskell, the code can be written something like this:

inc =
new Read() >>=
\v ->
w = v+1
new Write(w) >>=
\r ->
t = w+1
new Write(t)

Here the >>= infix operator has replaced for the chain() method. The \x -> expresssion defines an anonymous function and x is the argument. It looks simple, right?

What can you do next?


Now you've learned the significance of functional programming. What to do next? If you'd like to learn more about functional programming, if you prefer static typing, I'd recommend that you study Haskell by going through this excellent tutorial. Then, for production, you may check out Scala which supports functional programming and integration with JVM and all existing Java classes. It borrows a lot of concepts from Haskell. It also supports assignments and OO, so it can ease the migration from Java.
If you prefer dynamic typing, for production, you may check out Clojure which also integrates with JVM and existing Java classes.