% Quick Start to Functional Programming Through Haskell
**Páli Gábor János
January 4, 2016**
> module Tutorial where
> import Prelude hiding (abs, const, head, length, map, negate, foldr, map, repeat, replicate, sum, tail, take, uncurry, (.))
> import System.Environment
Introduction
------------
Functional programming is a branch of the declarative paradigm. Such
programs are built up by putting a series of function and type
definitions after each other like they were dominoes and that is going
to be eventually pushed over from the top, usually called a start
expression. In this tutorial, we are going to use the [Haskell]
programming language to demonstrate the basics of this programming
style.
In functional programs, it is not fully determined by the author how the
pieces will exactly fall into their places but the rules are given for
that. In result, the value of the final computation is often the answer
of the whole program. This gives a great freedom and focus to the
programmer to describe problems from a rather abstract and mathematical
perspective, through formulas, which immediately spring up as definition
of functions, for example:
> negate x = 0 - x
This function above calculates the negated value of a number by
subtracting it from zero. We call this `negate` and it has a single
parameter, `x`. It is separated from its body of definition by the `=`
symbol, which gives us an equation. Note that a nice property of pure
functions (like as the ones that can be defined in Haskell) that their
application is basically a substitution of the expression on the
left-hand side with the expression on the right-hand side of the
equation.
In order to make use of this freshly defined function, place this line
into a text file, name it `Tutorial.hs`. It could be then loaded for
interpretation with the `ghci` program. That is the interactive
frontend of the Glorious Glasgow Haskell Compiler, or GHC for short.
(On Windows, it is just enough to double click on the icon for the
file.)
~~~~{.sh}
$ ghci Tutorial.hs
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Tutorial ( Tutorial.lhs, interpreted )
Ok, modules loaded: Tutorial.
~~~~
Once the file was successfully loaded, the interpreter lets us to invoke
and compose functions, in a fashion similar to a calculator. It also
has a prompt that says `Prelude>`. That is the name of a module that is
always loaded along with ours (unless it is disabled), and it contains
many of the basic definitions we are also going to see.
~~~~{.ghci}
Prelude> negate 1
-1
Prelude> negate (negate 1)
1
Prelude> negate -1
:3:1:
Non type-variable argument in the constraint: Num (a -> a)
(Use FlexibleContexts to permit this)
When checking that ‘it’ has the inferred type
it :: forall a. (Num a, Num (a -> a)) => a -> a
~~~~
The interaction with the interpreter is called the Read-Eval-Print (REP)
loop. First, the interpreter tries to parse the input expression, then
if it is successful, evaluates it, and finally it prints the result.
Unfortunately, the last expression did not seem to work out well. This
is an example of when the compiler cannot make sense of the input,
although it looks syntactically all right. It is the result of the
semantical evalutation of the program, called type checking. We do not
go into the exact details for now, only note that it is because in
Haskell the `-` symbol is parsed for both the binary operator of
subtraction and the unary operator of negation. This may cause a slight
misunderstanding, so as a rule of thumb, it is better to put the
negative numbers into parentheses to fix up the problem:
~~~~{.ghci}
Prelude> negate (-1)
1
~~~~
*Exercise.* Define the `increment` function that increases the value of
its arguments. It shall work like this:
~~~~{.ghci}
Prelude> increment (-1)
0
Prelude> increment 3
4
~~~~
Note that the module has to be reloaded on each change to the program.
This can be achieved by the `:reload` command. Commands start with the
`:` (colon) symbol, which makes them different from expressions so they
are processed differently.
~~~~{.ghci}
Prelude> :reload
[1 of 1] Compiling Tutorial ( Tutorial.lhs, interpreted )
Ok, modules loaded: Tutorial.
~~~~
As a shorthand, `:r` may be also used or simply `:` that repeats the
last command. The `:?` command gives the list of all commands.
Branching in Functions
----------------------
Function definitions are often not that straightforward as sometimes
they have to involve branching. For example, assume that we would like
to write a function that calculates the absolute value of a number. The
corresponding algorithm needs to determine first if the number is less
than zero then negate it only if that holds. In any other cases, it
shall just leave the number as it is.
Note that since values cannot be changed through the program's lifetime
— that is the property of single assignment — we will not
touch the argument at all but return a fresh modified value instead.
Branching in Haskell programs can be implemented by the use of guard
expressions, or simply "guards". Guards are Boolean-valued expressions
that can be attached to the left-hand side of function definitions by
the `|` (pipe) symbol.
Consider the following definition of a specialized version for division,
where the actual calculation is performed only if the divisor is not
zero:
> x ./. y | y /= 0 = x / y
With this, the function body will only be evaluated if the guard
condition is satisfied, otherwise the body in question is skipped and a
next one is searched for, from the top to the bottom. In absence of
further possible bodies, the interpreter complains that it will not be
able to compute the return value of the function.
~~~~{.ghci}
Prelude> 42 ./. 0
*** Exception: Tutorial.hs:5:3-26: Non-exhaustive patterns in function ./.
~~~~
That is called a partial function — a function that may not map
every input to an output. As it causes an exception to raise and
therefore may grind the execution of the whole program to halt, a
careful attention must be paid to avoid such definitions. The opposite
of this is the total function, where every element has an image
assigned.
Let us now define `abs`, the absolute value function using guards. Mind
the indentation — there is an expectation from the compiler,
called the off-side rule, that goes by this: for multiline definitions,
the lines that follow the first line have to be indented with spaces
(tab characters are not appreciated).
> abs x | x < 0 = negate x
> | otherwise = x
There we can see two different bodies: the top one takes care of the
exceptions, that is the negative numbers, and applies the previously
defined `negate` function on them, and the bottom one handles everything
else. That latter one is called "fallthrough" and that is the guard
that lets everybody pass. It has `otherwise` as the condition, but it
could be quickly checked the it corresponds to trivially true:
~~~~{.ghci}
Prelude> otherwise
True
~~~~
Note that since the semantics of guards is technically "first satisfied
wins", if `otherwise` appears before anything else, it will just shadow
(overlap with) the rest of the branches.
Haskell also includes an `if` function that acts almost like the `if`
statement in other languages. However, there is an important
difference: in Haskell, there are no statements, everything is
considered an expression. As a consequence, `if` is treated as it was a
function thus both of its branches have to be specified.
~~~~{.ghci}
Prelude> if True then 1 else 0
1
~~~~
Along those lines, the definition of `abs` could be rethough with the
use of `if`:
> abs' x =
> if (x < 0) then (negate x)
> else x
But guards clearly show their advantage over the regular `if` construct
when there are multiple branches, so their use is recommended usually.
*Exercise.* Define the `sgn` (signum) function that returns `-1` for
negative numbers, `1` for positives, and `0` for zero.
Working with Lists
------------------
Besides integers, Haskell features many other types (and it also lets us
to define our own ones), for example, as we have seen with `otherwise`,
we can work with Booleans. Another common type is the list, a heritage
reaching far back to the first functional programming language, LISP (as
in "list processing"). It could be used as an ordinary container for
values. The simplest list is the empty list.
~~~~{.ghci}
Prelude> []
[]
~~~~
But one can also create a more advanced list by enumerating all of its
elements:
~~~~{.ghci}
Prelude> [1,2,3,4,5]
[1,2,3,4,5]
~~~~
or even letting the interpreter enumerate the elements and set only the
lower and upper bounds:
~~~~{.ghci}
Prelude> [1..5]
[1,2,3,4,5]
~~~~
Note that it is not the only way to build lists. Actually, lists are
even built internally with the previously introduced empty list
constant, and the "cons" operator, which is identified by the `:`
symbol. The `:` operator takes and element and a list and puts them
together into a new list. It is right-associative.
~~~~{.ghci}
Prelude> 1 : (2 : (3 : (4 : (5 : []))))
[1,2,3,4,5]
~~~~
Thus it is possible to write functions on lists, for example the one
that checks if the given list is empty or not:
> isEmpty xs = xs == []
that we could use like that:
~~~~{.ghci}
Prelude> isEmpty []
True
Prelude> isEmpty [1..5]
False
~~~~
However, as it might have been expected, it is more common to approach
this in a more different way, which is called pattern matching. Pattern
matching is a method to syntactically compare values with a so-called
pattern. That way, similarly to the use of guard conditions, a series
of alternative function bodies can be listed, each of them assigned with
a pattern. In run time, those patterns are checked against the actual
argument in the order to definition and the body for the first matching
is selected for evaluation, and the rest is left completely unevaluated.
For example, the previously function for checking if a list is empty or
not, is usually phrased like that:
> null [] = True
> null _ = False
Here the first alternative matches for only the empty list and returns
the `True` constant, while for any other lists a fallthough case is
defined by a "wildcard" pattern, that matches for any value. Note that
when a wildcard pattern is used, we cannot also access the value of the
argument, so it is mostly used when an answer has to be returned
irregardless of what was passed to the function. Thus the another name
of wildcard pattern is "anonymous variable".
There is an inverse operation for list construction — that is
about removing the symbol `:` from between the consequent elements
therefore splitting the list into two sections. They are called the
head and the tail of the list, respectively. Those operations can be
only defined with pattern matching:
> head (x:_) = x
> head _ = error "head: empty list"
>
> tail (_:xs) = xs
> tail _ = error "tail: empty list"
Now there we can see that pattern matching is a bit more than just
checking for equivalence: a pattern may contain "holes" that are covered
by variable names. When the match is done, those holes may be filled
with parts of the actual matching value that we can separately address
and use in the function body.
In the definition `head`, a list will only match the first line if it
was built up by the `:` symbol, otherwise (in case of the `[]` symbol,
that is the empty list) it will give an error message. That is because
there is no head (i.e. first element) for the empty list, and that is why it
is a partial function.
There can be also seen that a function for extracting the tail of the
list could be defined on the same account.
~~~~{.ghci}
Prelude> head []
*** Exception: head: empty list
Prelude> head [1..5]
1
Prelude> tail [1..5]
[2,3,4,5]
~~~~
*Exercise.* Define the `isSingleton` function that determines if a list
has exactly one element. It shall work for list of any size, and it
shall work like this:
~~~~{.ghci}
Prelude> isSingleton []
False
Prelude> isSingleton [1]
True
Prelude> isSingleton [1..5]
False
~~~~
Enter the Recursion
-------------------
When working with lists, it is not also uncommon the situation when the
whole list has to be walked. For example, consider the equivalence of
two lists. Since we do not know in advance how long each of the lists
will be, some kind of an iteration has to be implemented.
In functional languages, the concept of recursion is used to express
that. A function is called recursive when it refers to itself in its
own definition. (Hence it is often simply said: "Recursion: see
recursion.") That is why the imperative way of incrementing a number
fails to work here:
> x = x + 1
This gives us an infinite recursion, which is going to keep the
interpreter occupied forever. Note that this loop of infinite
calculations can be broken by the key combination of Ctrl+C.
~~~~{.ghci}
Prelude> x
^CInterrupted.
~~~~
Due to the lazy evaluation strategy that is employed in Haskell, such
recursive functions may work quite well sometimes, though. Consider the
following definition:
> repeat x = x : (repeat x)
~~~~{.ghci}
Prelude> repeat 1
[1,1,1,1,1,1,1,1,1,1,1^CInterrupted.
~~~~
Although this computation will not stop either, but we can at least see
the intermediate results, the endless series of `1`s flooding the
console. Curiously, thanks to laziness or on-demand evaluation of
subexpressions, the following expression will compute in a rather finite
time:
~~~~{.ghci}
Prelude> head (repeat 1)
1
~~~~
It looks quite logical: why to unfold the complete the list when we can
answer the question by taking only the first element? That is why in
Haskell one can work with infinite data structures, which sometimes
makes the description of algorithms surprisingly elegant and concise.
But a finite version of `repeat` can be obtained if we enforce a limit
on how many elements should be generated in total. This is achieved by
branching on the value of a counter and carry on with the computation
only if it is still a positive number. That way we have got a base and
an inductive case. The evaluation of the base case does not include any
self-reference so it could be answered immediately, while the inductive
case does the recursion.
The finite version of `repeat` is called `replicate` and it can be
recursively defined like this:
> replicate n x
> | n <= 0 = []
> | otherwise = x : (replicate (n - 1) x)
Here `replicate` is always called with a number lesser than the previous
one, otherwise this recursion would not terminate either. That is, when
writing inductive cases, it is important make a step towards one of the
base cases by changing at least one of the arguments on the recursive
invocation. For `replicate`, this is done by subtracting one from the
counter and checking if it reached (or went below) zero so the iteration
may stop.
~~~~{.ghci}
Prelude> replicate 0 1
[]
Prelude> replicate 3 1
[1,1,1]
Prelude> replicate (-3) 1
[]
~~~~
*Exercise.* Define the `fromTo` function that takes two integers and
constructs a list by starting with the first parameter as the lower
bound, and increasing adding further numbers until the second parameter
as the upper bound is reached. It shall work like this:
~~~~{.ghci}
Prelude> fromTo 1 5
[1,2,3,4,5]
Prelude> fromTo 1 1
[1]
Prelude> fromTo 5 1
[]
~~~~
More on Recursion
-----------------
Not only numbers, but other values might also be "decremented" and
therefore consumed in a recursive fashion. Lists are an example of
that, where passing on only the tail of the list is a way to reach a
common base case, the empty list, which can be regarded as the zero
element of lists.
Consider the definition of the `take` function that is to take a given
number number of elements from the beginning of the list, if there is so
many. Otherwise it should just return the full list.
> take _ [] = []
> take n _ | n <= 0 = []
> take n (x:xs) = x : take (n - 1) xs
There we can notice that a combination of pattern matching and guards
were applied to express two classes of base cases. The first line says
if the input list is empty, regardless of the value of the second
parameter, simply return the empty list. The second line says that if
the number of values to take is not positive, then again return the
empty list. Finally, the third line says that any other case (actually,
when all the necessary conditions are satisfied so the execution reached
this alternative) get the first element from the input list and continue
building the list with an invocation of `take` where both the number and
the list are shrinked (in the hope that they will eventually hit one of
the previous cases).
Note that `take` is also a lazy function so it can work with infinite
lists, such as the ones that are produced by `repeat`:
~~~~{.ghci}
Prelude> take 3 (repeat 1)
[1,1,1]
~~~~
To understand how this is possible, let us briefly take a deeper look
into the actual evaluation of the expression above.
~~~~
take 3 (repeat 1)
-----------------
< matches the 3rd line of take where n = 3, x = 1, xs = repeat 1 >
1 : take (3 - 1) (repeat 1)
1 : take 2 (repeat 1)
--------------
< matches the 3rd line of take where n = 2, x = 1, xs = repeat 1 >
1 : 1 : take (2 - 1) (repeat 1)
1 : 1 : take 1 (repeat 1)
--------------
< matches the 3rd line of take where n = 1, x = 1, xs = repeat 1 >
1 : 1 : 1 : take (1 - 1) (repeat 1)
1 : 1 : 1 : take 0 (repeat 1)
--------------
< matches the 2nd line of take where n = 0 >
1 : 1 : 1 : []
[1,1,1]
~~~~
In addition, it can be also spotted that `replicate` can be written as
the composition of `take` and `repeat` that plays nicely with modular
programming:
~~~~{.haskell}
replicate n x = take n (repeat x)
~~~~
*Exercise.* Define the `append` function that appends two lists together
by putting all the elements of the second list after the elements of the
first. It shall work like this:
~~~~{.ghci}
Prelude> [] `append` [1..5]
[1,2,3,4,5]
Prelude> [] `append` []
[]
Prelude> [1..5] `append` []
[1,2,3,4,5]
Prelude> [5,4,3] `append` [2,1,0]
[5,4,3,2,1,0]
~~~~
In this exercise, we used the ` ` ` (backtick) symbol. That is a
syntactical aid to push binary alphabetical function names between its
arguments. We could have seen earlier that Haskell supports symbolical
function names, that are often called operators. For operators, they
are infix by default, while the name of other functions are put in
prefix before its arguments. Quoting with this symbol helps to use
prefix functions as they were infix ones for better readability.
Similarly, infix operators could be also turned into prefix ones by
putting them into parentheses. That is, for example, it is possible to
write the `+` operator in the following format:
~~~~{.ghci}
Prelude> (+) 1 2
3
~~~~
The Essence of Recursion Over Lists: Folding
--------------------------------------------
It is worth to discuss a bit that induction on lists follow, and thus
can be captured with a more generic scheme, called "folding". This is
about taking elements of a list and gradually adding them up into some
accumulated value that will eventually give us a result. Notice that
one may fold the list from left to the right:
~~~~{.haskell}
( ... ((((z `f` x0) `f` x1) `f` x2) `f` x3) `f` ... ) `f` xn
~~~~
or right to the left:
~~~~{.haskell}
x0 `f` (x1 `f` (x2 `f` (x3 `f` ( ... (xn `f` z) ... ))))
~~~~
where `f` is the operator, `x0`, `x1`, `x2`, and so on are the
consequent elements of the list to be folded, and `z` is an initial
value for the accumulated value.
For example, consider the addition of elements in a list, where the
operator is the addition operator (`+`), the accumulated value is the
sum of the elements, and the initial value for that is zero. Based on
what we have just learned about recursion over lists, it is actually
easy to implement such a function:
> sum [] = 0
> sum (x:xs) = x + sum xs
Using the method above, we can see that the invocation of that function
will basically turn the list argument into an expression like this:
~~~~
sum [1,2,3] --> ... --> 1 + (2 + (3 + 0))
~~~~
Thanks to the abstraction power hides in functional programming, the
previously mentioned scheme that describes the class of those function
can be also expressed in Haskell with the help of higher-order
functions. Higher-order function is a function that takes a function as
parameter or returns a function as value. That is one of the perks of
the functional approach, where functions are therefore first-class
citizens, that is, they can act like values. For now, we will exploit
only the first case, namely, when one of the parameters is a function,
but later we will explore this idea further.
For beginning, let us take a simple but nice example of the higher-order
functions, `map`. This function takes a function and a list, and
returns a list where the function is applied to every element of the
list.
~~~~{.ghci}
Prelude> map negate [1,-1,2,-2,3,-3]
[-1,1,-2,2,-3,3]
~~~~
Note that `negate` is used here without any arguments, apparently. But
that is exactly what `map` assumes when we turn to its definition:
> map _ [] = []
> map f (x:xs) = f x : map f xs
The function that was passed gets is argument once it comes to the
evaluation. Looks like we can just use the variable as it was a
function — well, that is it!
*Exercise.* Define the `filterList` function that takes a Boolean-valued
function, a predicate, and keeps only the elements where the function
is satisfied, i.e. it is evaluated to `True`. It shall work like this:
~~~~{.ghci}
Prelude> filterList even [1..5]
[2,4]
Prelude> filterList even []
[]
Prelude> filterList even [1,3,5,7]
[]
~~~~
In these examples, `even` is a predicate function that determines if its
argument is an even number or not.
~~~~{.ghci}
Prelude> even 0
True
Prelude> even 1
False
Prelude> even 2
True
~~~~
Taming Functions of Functions
-----------------------------
Now, that we have seen how recursive and higher-order functions could be
made, let us take a look on the "folding" scheme that was mentioned
earlier. For the sake of simplicity, only the "fold right" version, the
`foldr` function is going to be considered, but similarly the "fold
left", i.e. the `foldl` function can be also given:
> foldr _ z [] = z
> foldr f z (x:xs) = x `f` (foldr f z xs)
The algorithm behind the function goes like this: when the list is
empty, return the initial value for the accumulated value, otherwise
take the first element (the head) of the list and combine it with the
value computed from the rest (the tail) of the list recursively by the
(right-associative) function we passed.
With the help of this scheme, now we can come up with another definition
for the `sum` function, which takes only a single line:
~~~~{.haskell}
sum xs = foldr (+) 0 xs
~~~~
There we can see that the `+` function and the zero initial value are
passed to `foldr`. It is easy to verify that if we replaced the
parameters with those symbols in the definition of `foldr`, we will just
get back the definition without that. That is why higher-order
functions are more powerful: every function in the class they represent
can be expressed only with giving it the appropriate function bodies.
Note that because the symbol of `+`, i.e. the function, was again used
without its arguments, parentheses were employed to touch that. Of
course, any function might be passed to `foldr`, even custom functions.
The only true requirement for them is to have the proper type.
Speaking of types: so far we have carefully avoided to talk too much
about them, while they are another important but often invisible
component of Haskell programs. Although we have not yet mentioned the
type of the functions, they are typed. In GHCi, types of expressions
can be queried by the `:type` or the `:t` command. Let us observe the
type of `foldr`:
~~~~{.ghci}
Prelude> :type foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
~~~~
At the first sight this all may be quite daunting. The first step of
decyphering the answer is to note that the type information is attached
to the name or symbol in question by the `::` (colon-colon or double
colon) symbol. Function types include the `->` (type arrow) symbol,
— which is actually an operator but in the space of types —
whose purpose is to construct the type of the function from the type of
the function's domain, `A`, and its range `B`:
~~~~
f :: A -> B
~~~~
For multi-parameter functions, one might have borrowed the classical
mathematical notation, the use of Cartesian products, the `x` symbol for
types.
~~~~
f :: A x B -> C
~~~~
Such a type operator indeed exists in Haskell as well, that is the `,`
(comma), called product type:
~~~~
f :: (A, B) -> C
~~~~
But for technical reasons that we will see later, designers of the language
instead went for using the type arrow everywhere:
~~~~
f :: A -> B -> C
~~~~
In result, we can roughly interpret function type as follows. The first
(n - 1) components correspond to the type of the arguments and the last,
n-th element is the type of the returned value. According to this
explanation, we can figure out that `foldr` has three arguments:
- `a -> b -> b`
- `b`
- `[a]`
Beside the function types, there can be also noticed another type
operator, that is the `[]` or list symbol. It is yet another function
over types but it has a single argument, the type of the elements in the
list. So, like `->` does, it takes a type and constructs the type of
the list of that type.
Armed with all this knowledge from the preceeding paragraphs, we can now
learn more about the actual parameters of `foldr`: the first one must be
a function, the second must be a value, and the last one must be a list.
But the final piece of the puzzle is still missing: why do not we have
some integer types, e.g. `int` everywhere, what are those letters? The
letters in the types are type variables, and they introduce the notion
of parametric polymorphism. That is, our `foldr` function does not work
for a single type of values but it automatically becomes capable of
handling values of almost any type. Only the exact relationships
between the types appearing in the type of `foldr` function are
described by variables. Thus, if we subtitute those variables by actual
types we can obtain (infinitely) many versions of this abstract type.
For example, consider `a = Integer` and `b = Bool`, where `Integer` is
the type of integer numbers and `Bool` is type of Booleans:
~~~~{.haskell}
foldr :: (Integer -> Bool -> Bool) -> Bool -> [Integer] -> Bool
~~~~
It is even allowed to assign the same type to both variables, for
example we could have chosen them both `a` and `b` to be the `Integer`
type.
~~~~{.haskell}
foldr :: (Integer -> Integer -> Integer) -> Integer -> [Integer] -> Integer
~~~~
Though, an important rule of subtitution of type variables that we must
always replace the same type variable with the same type every time.
Hence the following version becomes invalid, `foldr` cannot be used like
this:
~~~~{.haskell}
foldr :: (Bool -> Bool -> Bool) -> Integer -> [Integer] -> Integer
~~~~
The way we had a type assigned to the functions we defined is called
type inference. This is an algorithm that attempts to calculate the
most generic, or as it is often called, the principal type for each of
the functions. For `foldr`, the presented type is the most generic,
because in the description of the function:
- We assumed the third parameter is a list of some values, that is, a
list of of type `a`, because used the empty list symbol (`[]`) and
the list constructor operator (`:`) to match a pattern against it.
- We did not assume anything about the second parameter, so it may not
be of the same type of the elements in list, so classify it as
something of type `b`.
- Finally, we used the variable `f` as a function that takes elements (of
type `a`) extracted from the list and the value calculated from the
recursive invocation of `foldr` (of type `b` as we can see in the
first line, where the second parameter is directly returned), so it
must be of type `a -> b -> b`.
If the compiler fails to infer the type in a similar manner, it raises a
type error and the compiler refuses to generate code for the given
definition. That is why it is also called type checking, because the
type inference cannot only free us from writing type declarations but it
can also verify if the given definition makes sense at all. Therefore
it is a form of semantical evaluation of the program.
Consider a totally wrong version of `foldr`:
~~~~{.haskell}
foldr _ z [] = z
foldr f z (x:xs) = x `f` (foldr z f xs)
~~~~
If we try to load this code to the interpreter, it will emit a nice type
error message to tell that something is not quite right:
~~~~{.ghci}
Prelude> :load Fail.hs
[1 of 1] Compiling Main ( Fail.hs, interpreted )
Fail.hs:4:33:
Occurs check: cannot construct the infinite type:
t1 ~ t -> t1 -> t1
Relevant bindings include
xs :: [t] (bound at Fail.hs:4:14)
x :: t (bound at Fail.hs:4:12)
z :: t1 (bound at Fail.hs:4:9)
f :: t -> t1 -> t1 (bound at Fail.hs:4:7)
foldr :: (t -> t1 -> t1) -> t1 -> [t] -> t1 (bound at Fail.hs:3:1)
In the first argument of ‘foldr’, namely ‘z’
In the second argument of ‘f’, namely ‘(foldr z f xs)’
Failed, modules loaded: none.
~~~~
We can also note that many other variations of `foldr` can be typed, but
they do not conform to the previous specification. So, even if type
declarations are not always mandatory to set, it is considered a good
engineering practice to explictly state the type of the major or
top-level functions. This can help us to detect if the function's type
changes suddenly, like in the case given below. In this example below,
we just forgot to add the backticks around `f` which completely changed
the meaning of the definition:
~~~~{.haskell}
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ z [] = []
foldr f z (x:xs) = x f (foldr f z xs)
~~~~
Thanks to explicit type declaration, the compiler can come to the
rescue:
~~~~{.ghci}
Prelude> :load Fail.hs
[1 of 1] Compiling Main ( Fail.hs, interpreted )
Fail.hs:5:20:
Couldn't match expected type ‘(a -> b -> b) -> b -> b’
with actual type ‘a’
‘a’ is a rigid type variable bound by
the type signature for foldr :: (a -> b -> b) -> b -> [a] -> b
at Fail.hs:3:10
Relevant bindings include
xs :: [a] (bound at Fail.hs:5:14)
x :: a (bound at Fail.hs:5:12)
z :: b (bound at Fail.hs:5:9)
f :: a -> b -> b (bound at Fail.hs:5:7)
foldr :: (a -> b -> b) -> b -> [a] -> b (bound at Fail.hs:4:1)
The function ‘x’ is applied to two arguments,
but its type ‘a’ has none
In the expression: x f (foldr f z xs)
In an equation for ‘foldr’: foldr f z (x : xs) = x f (foldr f z xs)
Failed, modules loaded: none.
~~~~
*Exercise.* Define the `andList` function (in a single line) with
`foldr` that takes a list of Booleans and performs the logical
conjuntion, the `&&` operator, between the elements. It shall work like
that:
~~~~{.ghci}
Prelude> andList []
True
Prelude> andList [True,False,True,True,False,True,False]
False
Prelude> andList [True,True,True]
True
~~~~
As a hint, here is the recursive definition for it:
~~~~{.haskell}
andList :: [Bool] -> Bool
andList [] = True
andList (x:xs) = x && andList xs
~~~~
Partially Applied Functions
---------------------------
If we recall the other higher-order function, `map`, it also has a type
that we can query:
~~~~{.ghci}
Prelude> :t map
map :: (a -> b) -> [a] -> [b]
~~~~
Based on its definition, its type can be deduced by the same argument we
used for `foldr`. And it can indeed work with the `even` predicate that
we introduced for the `filterList` exercise (so `a = Integer` and `b =
Bool`):
~~~~{.ghci}
Prelude> map even [1..5]
[False,True,False,True,False]
~~~~
However, it is good to know that not only one-parameter functions may be
passed to it exclusively. In Haskell, because we abandoned the
mathematical intuition of using Cartesian products, a curious
opportunity pops up that it makes possible to partially apply functions.
To demonstrate this, consider a function that we will call `const`:
> const :: a -> b -> a
> const x _ = x
This function takes two parameters and cancels out the second one.
~~~~{.ghci}
Prelude> const 1 2
1
Prelude> const True False
True
~~~~
As an ordinary prefix function, it is invoked with its arguments listed
after its name. But it is time to add that function application
is actually single-parameter only. This can be shown if we add the
parentheses that could have been omitted so far due to the
left-associative nature of the function application operator, the space:
~~~~{.ghci}
Prelude> ((const True) False)
True
~~~~
We can also check the type of the `const True` expression too:
~~~~{.ghci}
Prelude> :t const True
const True :: b -> Bool
~~~~
It looks like we have just created another function by simply omitting
one of the parameters. Now this explains better why it was opted to
represent functions that way. It also has the consequence that the `->`
type operator is right-associative, so the type of `const` can be
written as follows:
~~~~
const :: a -> (b -> a)
~~~~
This feature comes handy when it comes to using higher-order functions.
Consider the use of `map` where the apparently two-parameter `const` is
fitted by binding the first parameter:
~~~~{.ghci}
Prelude> map (const True) [1..5]
[True,True,True,True,True]
~~~~
None of the numbers appear in the result, except that `True` is exactly
repeated five times after each other, which corresponds to the size of
the list passed.
*Exercise.* Define the `mapList` function in terms of `foldr`. In order
to express the functionality of `map`, try to construct a list as the
result, and define a helper function of type `(a -> b) -> a -> [b] ->
[b]` that takes the function parameter and folds with it. It shall work
as the regular `map` function.
Types are Everywhere
--------------------
As it might have been expected, all of the other featured functions have
types as well, for example `negate` from earlier:
~~~~{.haskell}
negate x = 0 - x
~~~~
And its type:
~~~~{.ghci}
Prelude> :t negate
negate :: Num a => a -> a
~~~~
It almost fully matches with the mental image of types that we have
built up until this point: the function takes a single parameter and
produces a value of the same type. The only difference is the part that
says `Num a =>`. This is called a type context or context for short,
and we can quickly discover that it comes from the use of the `-`
operator:
~~~~{.ghci}
Prelude> :t (-)
(-) :: Num a => a -> a -> a
~~~~
From our previous findings we already know that the presence of type
variables refers to polymorphism, so it could be claimed that `-` may be
applied to values of many different types. Type variables may be
replaced with any Haskell type, but when a context associated with them,
it becomes possible to restrict the values of the variable to types that
are members of some class of types.
`Num` is just a class like that, and it means the "class of types that
represent numbers". In Haskell, such types are for example the
`Integer`, and `Double`, so `-` works for both of them but their
respective versions are used behind the curtains. This can be thought
of a form of overloading function names, so the same name could be
reused for multiple types, or in the other way around, a function may
behave differently in different type settings. That is called ad-hoc
polymorphism.
~~~~{.ghci}
Prelude> -- (-) :: Double -> Double -> Double
Prelude> 0.0 - 1.0
-1.0
Prelude> -- (-) :: Integer -> Integer -> Integer
Prelude> 0 - 1
-1
~~~~
But `Num` does not contain the `-` operator only, it comes with all the
other common operators on numbers. This can be listed with the `:info`,
or `:i` for short, command.
~~~~{.ghci}
Prelude> :info Num
class Num a where
(+) :: a -> a -> a
(-) :: a -> a -> a
(*) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
-- Defined in ‘GHC.Num’
instance Num Word -- Defined in ‘GHC.Num’
instance Num Integer -- Defined in ‘GHC.Num’
instance Num Int -- Defined in ‘GHC.Num’
instance Num Float -- Defined in ‘GHC.Float’
instance Num Double -- Defined in ‘GHC.Float’
~~~~
The `class` keyword introduces a class, followed by its name, which is
`Num` this time, and a type variable. That is the variable we can use
in describing a set of functions that of all should be present for the
given type in order to be classified as a member. Membership
declarations can be added any time and anywhere by `instance`
definitions where the type-specific implementation of the overloaded
names are assigned.
Mind that it could be misleading to think about type classes as classes
in the object-oriented sense. Althought they bear some similarity, type
classes can be rather viewed as abstract interfaces that pack a group of
functions, called members of the class, to allow the reuse of names with
multiple implementations.
Curiously, not only functions but values (well, which are actually
constant functions of no parameters) may be ad-hoc polymorphic.
Consider the type of the `1` constant:
~~~~{.ghci}
Prelude> :t 1
1 :: Num a => a
~~~~
For what it is worth, why to stick a specific type to the `1` symbol
other than saying that is a number? It may be either an integer or a
floating-point number, there is no information about its actual
representation. The representation is (lazily) narrowed down as we
start to use it. Consider the following expression:
~~~~{.ghci}
Prelude> :t 1 / 3
1 / 3 :: Fractional a => a
~~~~
Another type context appears that says that `1 / 2` is not a number but
a fractional number, therefore excluding the possibility of being an
integer due to the use of `/`, the fractional division. `Fractional` is
a more restrictive subclass of `Num`:
~~~~{.ghci}
Prelude> :i Fractional
class Num a => Fractional a where
(/) :: a -> a -> a
recip :: a -> a
fromRational :: Rational -> a
-- Defined in ‘GHC.Real’
instance Fractional Float -- Defined in ‘GHC.Float’
instance Fractional Double -- Defined in ‘GHC.Float’
~~~~
And we can see that the interpreter eventually chooses a member type
when it is forced to calculate the value of this expression:
~~~~{.ghci}
Prelude> 1 / 3
0.3333333333333333
~~~~
This is due to the built-in defaulting mechanism of GHCi, and the type
it chooses is `Double`. Using the `::` symbol for attaching types, we
can control how to narrow down, that is how to make monotonic a
polymorphic type.
~~~~{.ghci}
Prelude> 1 / 3 :: Float
0.33333334
~~~~
Though this should not be confused with casting: Haskell is a strictly
typed language, thus it does not allow to change type of expressions
withouting providing an appropriate conversion function. Without this
principle, the type system would not be able to convey correctness
guarantees.
~~~~{.ghci}
Prelude> (1 / 3 :: Float) :: Double
:2:2:
Couldn't match expected type ‘Double’ with actual type ‘Float’
In the expression: (1 / 3 :: Float) :: Double
In an equation for ‘it’: it = (1 / 3 :: Float) :: Double
~~~~
Fortunately, many conversion functions exist for numbers, one of them
that we can use here is `realToFrac`:
~~~~{.ghci}
Prelude> :t realToFrac
realToFrac :: (Fractional b, Real a) => a -> b
Prelude> realToFrac (1 / 3 :: Float) :: Double
0.3333333432674408
~~~~
But it promptly becomes evident that this is just a quick fix for the
problem. Because we first performed a calculation with a lower
precision, we irremediably lost information — that is what the
type system tries to make ourselves mindful about.
Embracing Lambdas
-----------------
We have claimed that functions are first-class citizens in functional
languages that means that they can act as values. Being a value implies
that it does not have to be named in order to use it. Like in the
previous examples, we used the `1` and `3` values without attaching
dedicated identifiers to them. That holds for functions as well, as
they could be written with the so-called lambda notation. Using the
lambda symbol only is enough to define a function, we are not required
to name it.
As an illustration, let us rewrite the `negate` function with this
notation:
~~~~{.haskell}
negate = \x -> 0 - x
~~~~
There we can see that the lambda is actually written as `\` (backslash)
and it replaces the name of the function. The lambda symbol is a
quantifier that binds a variable in a scope. This quantifier is
employed to tell which variables and where in the body should be
subtituted on the application of the function. The bound variables and
their scope is delimited by the `->` symbol, which is not the same as
the previously introduced type arrow function, however it is again used
to represent a function.
Let us drop the name of the function and use the right-hand side of the
new naked definition of `negate` at the interpreter.
~~~~{.ghci}
Prelude> :t \x -> 0 - x
\x -> 0 - x :: Num a => a -> a
Prelude> (\x -> 0 - x) 4
-4
~~~~
Multi-parameter functions can be expressed by nested lambdas, and it is
now more visible that the lambda operator, or lambda abstraction is
right-associative, which meets our previous suspicion:
~~~~{.ghci}
Prelude> :t (\x -> (\y -> x))
\x -> \y -> x :: a -> b -> a
Prelude> (\x -> (\y -> x)) 1 2
1
~~~~
Note that nested lambda functions may be simplified and written with
less arrows:
~~~~{.ghci}
Prelude> :t \x y -> x
\x y -> x :: a -> b -> a
~~~~
Lambda abstractions can help us with reducing the need for introducing a
fresh helper function every time when we want to use a higher-order
function. Consider the definition of the `length` function that is to
measure the length of a list: we have inlined the definition of the
helper function, the function parameter of `foldr` at the place of
application and kept everything in a single line.
> length :: Num b => [a] -> b
> length xs = foldr (\_ n -> n + 1) 0 xs
*Exercise.* Unroll the definition of `length` using recursion directly
with the name `listLength`. It shall work identically to the `length`
function.
Constructing Functions on the Spot
----------------------------------
Furthermore, the lambda notation offers a way to construct functions
inside functions, that is, to return function as result. For example, a
function like that could be defined without much ado:
> actuator :: Integer -> (Integer -> Integer)
> actuator n
> | n < 0 = \x -> x - 1
> | n == 0 = \x -> x
> | n > 0 = \x -> x + 1
That is, the `actuator` function returns a different a function
depending on its argument:
~~~~{.ghci}
Prelude> (actuator 0) 5
5
Prelude> (actuator 1) 3
4
~~~~
Another nice example of that is the `uncurry` function. Not a long ago,
we have learned that in functional programming, multi-parameter
functions are represented with a series of nested one-parameter
functions. This is named "currying" after Haskell Curry, an American
logicist from the last century, who laid down many of the foundations of
the typed functional programming languages and he proposed this
technique. Hence the name "uncurry" refers to the conversion from the
curried format to the traditional one. The inverse of it is called
`curry`.
~~~~
A -> B -> C --- uncurry --> (A, B) -> C
<--- curry ----
~~~~
So, here is the definition:
> uncurry :: (a -> b -> c) -> ((a, b) -> c)
> uncurry f = \(x,y) -> f x y
and it should be used like that:
~~~~{.ghci}
Prelude> :t uncurry replicate
uncurry replicate :: (Num a, Ord a) => (a, b) -> [b]
Prelude> (uncurry replicate) (3, True)
[True,True,True]
~~~~
Though, a careful reader may notice that — due to the
right-associativity of the type arrow operator — the `uncurry`
function could be actually written as a two-parameter one:
~~~~{.haskell}
uncurry :: (a -> b -> c) -> (a, b) -> c
uncurry f (x,y) = f x y
~~~~
*Exercise.* Define the `curry` function and show how to use it.
Further Perks of the Lambda Notation
------------------------------------
While we are at lambda functions, we add that there is a whole logical
calculus, the lambda-calculus associated with them, created by Alonzo
Church in the 1930s. They form the mathematical foundations, the
computation model, the semantics of Haskell. Thus, one might say that
lambda functions constitute the machine code of functional programs. In
result, many of their features, such as the previously presented
currying, are inherited from there. Another curious operation of
functions is called the eta-conversion, and it is as follows.
~~~~
\x -> E x <--- eta ---> E if x is bound or does not appear in E
~~~~
The rule states that any function that has a function application with
`E` as the function and its parameter, `x` as the argument in its body,
and `x` does not occur as a variable not bound by any lambda quantifier
in `E` or does not occur in `E` at all, it equals to `E`. Or,
conversely, a function body may be always extended with a quantified
variable on the left and the right, it will not change its meaning.
For example, the following simplification is allowed:
~~~~
\x -> (\x -> x) x --> \x -> x
~~~~
while the following is not:
~~~~
\x -> (\y -> x) x -/-> \y -> x
~~~~
The eta-conversion can be frequently exploited to give shorter function
definitions, consider the following version of the `foldr`-based `sum`
function:
> sum' :: Num a => [a] -> a
> sum' = foldr (+) 0
Composing Functions
-------------------
This amplifies the fact that we can directly construct functions from
other functions. Functions can be put together with the function
composition operator, which can be expressed with our current knowledge
in Haskell with no fuss:
> infixr 9 .
> (.) :: (b -> c) -> (a -> b) -> (a -> c)
> f . g = \x -> f (g x)
An extra to the previous definitions is the first line here, that sets
the associativity and predecence for the function because its infix. As
it was earlier said, symbolic operator names, like the `.` in this case,
or alphabetical names quoted with the backtick symbol automatically
become infix ones, so they may appear after the `infix` family of
keywords. The `infixr` keyword is for right-associative functions, and
`infixl` is for left-associative ones (which is the default), finally,
the `infix` is for functions that are not associative. The number
between the keyword and the name denotes the precedence: it goes from
zero to nine, where zero is the weakest, and nine is the strongest
(which is default).
The function composition operator closely follows the mathematical
definition and it can used like this:
~~~~{.ghci}
Prelude> map ((\x -> x + 1) . (\x -> x * 2)) [1..5]
[3,5,7,9,11]
~~~~
Essentially, it is the same as if we wrote this:
~~~~{.ghci}
Prelude> map (\x -> 2 * x + 1) [1..5]
[3,5,7,9,11]
~~~~
As we can see, the lambda notation gets in our way in such situtations,
that is why the designers of Haskell decided add a syntactical rule that
allows to omit the variables in a way similar to the one that we have
seen for the eta-conversion. That is called sections, and their
application can often make the expressions more comprehensible:
~~~~{.ghci}
Prelude> map ((+ 1) . (* 2)) [1..5]
[3,5,7,9,11]
~~~~
This possibility gives space to more experimentation too, and we can
have many different modes of composing functions, for example consider
that one:
> infixr 3 &&&
> (&&&) :: (a -> b1) -> (a -> b2) -> (a -> (b1, b2))
> f &&& g = \x -> (f x, g x)
It passes the same value to different functions and creates a pair out
of their results.
*Exercise.* Define the `>>>` operator, aka. the forward composition.
It shall work like this:
~~~~{.ghci}
Prelude> ((+1) >>> (*2)) 3 -- (3 + 1) * 2
8
Prelude> ((*2) >>> (+1)) 3 -- (3 * 2) + 1
7
~~~~
Hiding the Function Parameter
-----------------------------
The composition of higher-order functions like the `.` and `&&&` opens
up a very unique way of writing programs. Consider the definition of
the `average` function below, that determines the average of the
elements in a list:
> average :: Fractional a => [a] -> a
> average = uncurry (/) . (sum &&& length)
The function first takes the sum and length of the list (at the same
time) and then it divides that two. The speciality of this style that it
never names the parameter, that is why it is called "point-free" or
"tacit" programming. The name of this technique comes from geometry,
and the point here refers to the parameter.
*Exercise.* Define the `squareRoot` function by implementing the Newton
iteration with the following functions: `\x -> (x + a / x) / 2`,
`iterate`, `dropWhile`, and `head`. Use the lambda function to
calculate the next approximation of the square root of `a`, which is now
`2`:
~~~~{.ghci}
Prelude> (\x -> (x + 2 / x) / 2) 2
1.5
Prelude> (\x -> (x + 2 / x) / 2) 1.5
1.4166666666666665
~~~~
The `iterate` function helps with running the iteration without an end,
so apply the `dropWhile` function to drop the approximations that are
not close enough to the expected result, the number whose square is
almost the same as the number whose square root we are looking for, and
take the first appropriate one with `head`. A limit for the
approximation can be set by a sufficiently small epsilon value.
More information on the referenced functions can be found in [Hoogle],
the API search engine of Haskell.
It shall work like this with the epsilon set to `0.001`:
~~~~{.ghci}
Prelude> squareRoot 2
1.4142156862745097
Prelude> squareRoot 4
2.0000000929222947
~~~~
Lists as Sets
-------------
If we take a step further, another function can be defined that
determines the average deviation of the elements. That is, it
calculates the average of the elements (a mean), followed by the
absolute difference from it for each of them and finally takes the
average of that.
> averageDeviation :: (Ord a, Fractional a) => [a] -> a
> averageDeviation xs = average [ abs (x - mean) | x <- xs ]
> where mean = average xs
We could have used the classical recursion technique for lists, but it
is useful to know that the language offers us some syntactical aid for
that, known as "list comprehensions". It borrows the mathematical
intuition of Zermelo-Fraenkel (ZF) set expressions, where a set is
described by a characteristic function, for example:
~~~~
{ x² | x ∈ ℕ, x is even }
~~~~
This describes the set of square of even numbers. In Haskell, it can be
written almost in the very same fashion. Here `[0..]` is a list that
corresponds to the set of natural numbers, because it is infinite list
(a list without upper bound) that contains every integer starting from
zero, at least in theory.
~~~~{.haskell}
[ x^2 | x <- [0..], even x ]
~~~~
Though the Haskell version is actually a list that is computed: elements
may appear multiple times in the result, and they have a clearly defined
order. That is the reason why some of the ZF-sets will not work as
expected, for example:
~~~~{.haskell}
[ (x,y) | x <- [0..], y <- [0..] ]
~~~~
Theoretically, this should give a list of Cartesian products on natural
numbers. But in Haskell, the `<-` (left arrow) is rather interpreted as
a loop on the elements of the list on its right-hand side, and using
that operator multiple times constructs a nested loop of walking the
lists. With finite lists, this is shown nicely:
~~~~{.ghci}
Prelude> [ (x,y) | x <- [1..3], y <- [1..3] ]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]
~~~~
Note that the use of list comprehensions with a single list is nothing
more than a map:
~~~~{.haskell}
map f xs = [ f x | x <- xs ]
~~~~
and its other component is called a filter:
~~~~{.haskell}
filter p xs = [ x | x <- xs, p x ]
~~~~
for example:
~~~~{.ghci}
Prelude> [ x | x <- [1..5], even x ]
[2,4]
~~~~
Local Definitions
-----------------
What we can see in addition to that is the `where` keyword. With the
help of that, a local variable or function can be added. Local
definitions always see the names and other local definitions of the host
definition (as they are global to them). This way local functions can
share information with their host that allows for a better definition,
for example, for `foldr`:
~~~~{.haskell}
foldr f z xs = go xs
where
go [] = z
go (x:xs) = x `f` (go xs)
~~~~
But the `where` keyword can be combined with pattern matching that makes
possible to write functions like `partition`, that returns multiple
results as an n-tuple. The `partition` function takes a predicate and
splits a list into two based on that:
> partition :: (a -> Bool) -> [a] -> ([a], [a])
> partition p [] = ([], [])
> partition p (x:xs)
> | p x = (x:xs1,xs2)
> | otherwise = (xs1,x:xs2)
> where
> (xs1,xs2) = partition p xs
~~~~{.ghci}
Prelude> partition even [1..5]
([2,4],[1,3,5])
~~~~
Finally, `where` is suitable for telling the compiler that we would like
to cache the result, so it should not be computed more than once.
Because our expressions are pure, this is feasible. In the definition
of `averageDeviation`, the `mean` variable is treated exactly like that.
One final remark: note that in the type, there are multiple contexts
present. They are inherited from the functions used in the
implementation: the `Fractional` restriction is the result of the use of
the `average` function, while `Ord` is the consequence of `abs`. That
is, we can conclude here that restrictions may implicitly add up through
function compositions.
Interaction with the World Outside
----------------------------------
Even after all of these adventures, a thrilling question may still
remain: how do pure functional programs interact with the outside world,
how they do I/O? A pure approach to this problem would be to work with
a start expression that receives the environment of the program and it
passes down to any of the function needs that. That is path what was
taken in the programming language Clean, but in Haskell, this problem
gave rise to the inclusion of computational contexts, monads.
We will not go into the details for now, but the key to the solution is
a type called `IO` that is to mark that a given function has a side
effect. Consider the following pure function, a constant:
~~~~{.ghci}
Prelude> :t "foobar"
"foobar" :: String
Prelude> "foobar"
"foobar"
Prelude> "foobar"
"foobar"
~~~~
Every time we evaluate this constant, it will give the very same result,
which is expected from a pure value. But now let us take an effectful
constant:
~~~~{.ghci}
Prelude> :t getLine
getLine :: IO String
Prelude> getLine
hello!
"hello!"
Prelude> getLine
yello!
"yello!"
~~~~
This happens because `getLine` itself is not a function but an "action",
namely an IO action. Actions are different from functions in a sense
that they have to be "run" in order to get their results. GHCi has an
internal support for running IO actions, it automatically passes them
the (invisible) program environment, for example the state of the buffer
of the standard input — so the user is requested for some input
first, then it is passed to the `getLine` action.
`IO` is a type similar to the list, it has a parameter that
characterizes the type of the result that is returned after the effect
of the action was played. This also makes possible for the type system
to clearly separate the pure and the effectul parts of the program, as
they cannot be mixed and matched: there is no function to play the
effects of an I/O action inside the program. That will be done by the
run-time system for the whole start expression at once, which is called
`main` in Haskell.
For example, assume that we want to wrap a program around our earlier
functions that gets integers from the command-line as arguments,
converts them to numbers (from strings) and determines the average
deviation for them, finally prints the result to the standard output:
> main :: IO ()
> main = do
> args <- getArgs
> let xs = [ read x | x <- args ]
> print (averageDeviation xs)
There we can see that the type of `main` is `IO ()` that means that it
is an effectful function that does not return any value. Well, it
returns some, denoted by `()` but it is a type that has only a single
element, `()`. It can be considered as the `void` type in other
languages, so `main` is basically a `void` function, a procedure.
The body of `main` is elaborated with a special syntax, named `do`
notation. Jokingly, its purpose is to make imperative programmers
believe that imperative-style code blocks can be embedded into Haskell
programs. This goes well with the current situation, but in reality, it
is much more than meets the eye: the `do` notation provides a way to
compose effectful actions into blocks, so the values they compute in
each of the steps can be passed to other actions. Mind that the
indentation matters again, as elements of the same block should start at
the same column. In return, we are saved from adding semicolons at each
line's end, and also saved from adding block markers.
To bind a name to a result of some computation in the block, the `<-`
(bind) operator is used. This can be thought of a variable assignment
operator, like for example `:=`. So the first line of the block may be
interpreted with an imperative mindset in the following way:
~~~~
args := getArgs
~~~~
The `getArgs` action is another effectful constant that returns the
command-line arguments of the invoked program, and it can be found in
the `System.Environment` module. Modules can be imported by the
`import` keyword in the beginning of every Haskell source file or in
GHCi:
~~~~{.haskell}
import System.Environment
~~~~
This assignment automatically creates a variable named `args` so we do
not have to bother with declaring it — its type will be
automatically inferred in the way that we have seen already before.
Note that pure functions cannot be added to such effectful blocks
directly, as their type will not match. For that reason, there is a
function, `return` that "lifts" pure functions into the context of some
computation. (The name `return` is not a very good choice of name, some
people say that `lift` or `inject` would have been a better one.)
~~~~{.ghci}
Prelude> :t return
return :: Monad m => a -> m a
~~~~
In this type, the `Monad` describes the notion of effect, and `return`
can be used to turn any pure value into an effectful one.
~~~~{.haskell}
main :: IO ()
main = do
args <- return []
print args
return ()
~~~~
This version of `main` works with a (purely) constant list of arguments
and it explicitly returns void. Note that the last line may be actually
omitted as the `print` action itself returns a void value, that already
satisfies our so-far-hidden expectation that the last expression in the
block should be of type of that of the block.
Playing the game of binding a lifted pure value to a name looks awkward,
so the next line in the original block has a new keyword, `let` that
implements binding pure values to names, in a similar vein to `where`.
In the block, `let` is used to create a fresh variable, `xs` that
contains the integer version of the command-line arguments. The
conversion itself is done by a list comprehension that iterates over the
elements of the `args` list and maps each of them to an integer from a
string. This is achieved by the polymorphic `read` function:
~~~{.ghci}
Prelude> :t read
read :: Read a => String -> a
~~~
The `read` function is overloaded, and it works for any type that is a
member of the `Read` type class: it takes a string and parses as it was
a textual representation of some value of that type. Note that, being a
polymorphic function in its return type, `read` always needs type
information on its return value:
~~~~{.ghci}
Prelude> read "42"
*** Exception: Prelude.read: no parse
~~~~
In GHCi, it works out badly, because due to the defaulting mechanism of
overloaded functions, the return type was chosen to be `()`. But in
compile time, this gives a type error message, as the value of the type
variable cannot made monomorphic therefore it becomes ambiguous.
~~~~{.ghci}
Prelude> read "()"
()
~~~~
Of course, the situation can be saved by type specialization:
~~~~{.ghci}
Prelude> read "42" :: Integer
42
~~~~
On that note, it is worthwhile to mention that the standard instances of
`Read` are quite smart, so we can easily parse more complicated values
like that one:
~~~~{.ghci}
Prelude> read "([(1, True)], [('x', 3.14)])" :: ([(Integer,Bool)], [(Char,Double)])
([(1,True)],[('x',3.14)])
~~~~
The `main` function can be invoked from GHCi directly with the `:main`
command:
~~~~{.ghci}
Prelude> :main 1 2 3 4 5 6 7 8 9 10
2.5
~~~~
But it also possible to compile the full source down to a native binary
with GHC itself. GHC has support for building multi-module programs as
it can automatically discover the involved modules, given that they
maintain a hierarchy like for Java programs: for example, the module
named `A.B.C` must be in the `A/B/C.hs` file relative to the directory
of the compiler's invocation.
The `-main-is` flag is used to tell the compiler that our main module is
not called `Main` (which is assumed by default) but `Tutorial`.
~~~~{.sh}
$ ghc Tutorial.lhs -main-is Tutorial
[1 of 1] Compiling Tutorial ( Tutorial.lhs, Tutorial.o )
Linking Tutorial ...
$ ./Tutorial 1 2 3 4 5 6 7 8 9 10
2.5
~~~~
*Exercise.* Add some basic error handling such as check for presence
of command-line arguments.
Closing
-------
Well, that is all for folks! :-) The source of this tutorial can be
[downloaded](Tutorial.lhs) and a [key to the solutions](Solutions.hs) is
also available. Feedbacks and bug reports for this tutorial are more
than welcome. All the contents may be used under the terms of [Creative
Commons Attribution-Noncommercial 3.0
License](http://creativecommons.org/licenses/by-nc/3.0/).
[Haskell]: http://haskell.org/
[Hoogle]: http://haskell.org/hoogle/