-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlec-higher-order-2.lhs
508 lines (359 loc) · 11.8 KB
/
lec-higher-order-2.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
---
title: Higher-Order Programming II
---
Recursive Types
===============
Recall that Haskell allows you to create brand new data types like this one
from [lecture 1](lec1.html).
\begin{code}
data Shape = Rectangle Double Double
| Polygon [(Double, Double)]
\end{code}
Quiz
----
What is the type of `Rectangle` ?
a. `Shape`
b. `Double`
c. `Double -> Double -> Shape`
d. `(Double, Double) -> Shape`
e. `[(Double, Double)] -> Shape`
Values of this type are either two doubles *tagged* with `Rectangle`
~~~~~{.haskell}
ghci> :type (Rectangle 4.5 1.2)
(Rectangle 4.5 1.2) :: Shape
~~~~~
or a list of pairs of `Double` values *tagged* with `Polygon`
~~~~~{.haskell}
ghci> :type (Polygon [(1, 1), (2, 2), (3, 3)])
(Polygon [(1, 1), (2, 2), (3, 3)]) :: Shape
~~~~~
One can think of these values as being inside special "boxes"
with the appropriate tags.
![Datatypes are Boxed-and-Tagged Values](/static/lec4_boxed.png)
However, Haskell (and other functional languages), allow you to define
datatypes *recursively* much like functions can be defined recursively.
For example, consider the type
\begin{code}
data IntList = IEmpty
| IOneAndMore Int IntList
deriving (Show)
\end{code}
(Ignore the bit about `deriving` for now.) What does a value of type `IntList`
look like? As before, you can *only* obtain them through the constructors.
Here, we have the constructor `IEmpty` takes no arguments and returns an
`IntList`, that is
~~~~~{.haskell}
ghci> :type IEmpty
IEmpty :: IntList
~~~~~
Now, that we have at least one value of type `IntList` we can make more by
using the other constructor
\begin{code}
is1 = IOneAndMore 1 IEmpty
is2 = IOneAndMore 2 is1
is3 = IOneAndMore 3 is2
\end{code}
and so on. Suppose we ask Haskell to *show* us `is3` we get
~~~~~{.haskell}
ghci> is3
IOneAndMore 3 (IOneAndMore 2 (IOneAndMore 1 IEmpty))
ghci> :type is3
is3 :: IntList
~~~~~
However, the presence of recursion does not change what the values really
are, now you simply have boxes within boxes.
![Recursively Nested Boxes](/static/lec4_nested.png)
Of course, there is no reason why the type definition must have only one
*recursive* occurrence of the type. You can encode general *trees* like
\begin{code}
data IntTree = ILeaf Int
| INode IntTree IntTree
deriving (Show)
\end{code}
Here, each value is either a simple *leaf* which is a box containing an
`Int` labeled `ILeaf`, such as each of
\begin{code}
it1 = ILeaf 1
it2 = ILeaf 2
\end{code}
or an *internal node* which is a box containing two trees, a left and right
tree, and marked with a tag `INode`, such as each of
\begin{code}
itt = INode (ILeaf 1) (ILeaf 2)
itt' = INode itt itt
itt'' = INode itt' itt'
\end{code}
Now, if we ask Haskell
~~~~~{.haskell}
ghci> itt'
INode (INode (ILeaf 1) (ILeaf 2)) (INode (ILeaf 1) (ILeaf 2))
ghci> :type itt''
itt' :: IntTree
~~~~~
Needless to say, you can have multiple branching factors, for example
you can define [2-3 trees](http://en.wikipedia.org/wiki/2-3_tree) over
integer values as
\begin{code}
data Int23T = ILeaf0
| INode2 Int Int23T Int23T
| INode3 Int Int23T Int23T Int23T
deriving (Show)
\end{code}
An example value of type Int23T would be
\begin{code}
i23t = INode3 0 t t t
where t = INode2 1 ILeaf0 ILeaf0
\end{code}
which looks like
![Integer 2-3 Tree](/static/lec4_int23t.png)
and has the type
~~~~~{.haskell}
ghci> :type i23t
i23t :: Int23T
~~~~~~
Parameterized Types
===================
We could go on and define versions of `IntList` that stored, say, `Char`
and `Double` values instead.
\begin{code}
data CharList = CEmpty
| COneAndMore Char CharList
deriving (Show)
data DoubleList = DEmpty
| DOneAndMore Char CharList
deriving (Show)
\end{code}
and we could go on and on and do the same for trees and 2-3 trees and
so on. But that would be truly lame, because we would (mostly) repeating
ourselves. This looks like a job for abstraction!
The song remains the same: as when finding abstract *computation*
patterns, we can find abstract *data* patterns by identifying the
bits that are different and turning them into parameters. Here, the
bit that is different is the underlying *base* data stored in each
box of the structure. So, we will turn that base datatype into a
*type parameter* that is passed as input to the type constructor.
\begin{code}
data List a = Empty
| OneAndMore a (List a)
deriving (Show)
\end{code}
Now, as before, we may define each of the types as simply *instances*
of the above parameterized type
~~~~~{.haskell}
type IntList = List Int
type CharList = List Char
type DoubleList = List Double
~~~~~
Similarly, it is instructive to look at the types of the constructors
~~~~~{.haskell}
ghci> :type Empty
List a
~~~~~
That is, the `Empty` tag is a value of *any* kind of list, and
~~~~~{.haskell}
ghci> :type OneAndMore
a -> List a -> List a
~~~~~
That is, the constructor two arguments: of type `a` and `List a` and
returns a value of type `List a`. Here's how you might construct values
of type `List Int` (note that you can use the binary constructor function
in infix form)
\begin{code}
l1 = OneAndMore 'a' `OneAndMore` Empty
l2 = OneAndMore 'b' `OneAndMore` l1
l3 = OneAndMore 'c' `OneAndMore` l2
\end{code}
Of course, this is pretty much how the "built-in" lists are defined
in Haskell, except that `Empty` is called `[]` and `OneAndMore` is
called `:`.
One can use parameterized types to generalize the definition of the
other data structures that we saw. For example, trees
\begin{code}
data Tree a = Leaf a
| Node (Tree a) (Tree a)
deriving (Show)
\end{code}
and *2-3* trees
\begin{code}
data Tree23 a = Leaf0
| Node2 (Tree23 a) (Tree23 a)
| Node3 (Tree23 a) (Tree23 a) (Tree23 a)
deriving (Show)
\end{code}
Kinds
-----
The `Tree a` corresponds to trees of values of type `a`. If `a` is the
*type parameter*, then what is `Tree` ? A *function* that takes a type `a`
as input and returns a type `Tree a` as output! But wait, if `List` is a
"function" then what is its type? A *kind* is the "type" of a type.
~~~~~{.haskell}
ghci> :kind Int
Int :: *
ghci> :kind Char
Char :: *
ghci> :kind Bool
Bool :: *
~~~~~
Thus, `List` is a function from any "type" to any other "type, and so
~~~~~{.haskell}
ghci> :kind List
List :: * -> *
~~~~~
Quiz
----
What is the *kind* of `->`? That, is what does GHCi say if we type
~~~~~{.haskell}
ghci> :kind (->)
~~~~~
a. `*`
b. `* -> *`
c. `* -> * -> *`
We will not dwell too much on this now. As you might imagine, they allow
for all sorts of abstractions over how to construct data.
See this for more information about [kinds](http://en.wikipedia.org/wiki/Kind_(type_theory)).
Computing Over Trees
====================
We have (and will continue) to see that trees are crucial pillar upon which
many data structures are built in Haskell. The workhorse, lists, are infact
a kind of tree with exactly one child. Lets write some functions over
trees.
First, here is a function that computes the *height* of a tree.
~~~~~{.haskell}
height :: Tree a -> Int
height (Leaf _) = 0
height (Node l r) = 1 + max (height l) (height l)
~~~~~
The readability of the above, makes an English description redundant!
Good old recursion. You have a base case for the `Leaf` pattern (namely
`0`) and a recursive or inductive case for the `Node` pattern (namely the
larger of the recursively computed heights of the left and right subtrees.)
Lets give it a whirl
\begin{code}
st1 = Node (Leaf "cat") (Leaf "doggerel")
st2 = Node (Leaf "piglet") (Leaf "hippopotamus")
st3 = Node st1 st2
\end{code}
~~~~~{.haskell}
ghci> height st1
1
ghci> height st3
2
~~~~~
How do we compute the *number* of leaf elements in the tree?
~~~~~{.haskell}
size :: Tree a -> Int
size (Leaf _) = 1
size (Node l r) = (size l) + (size l)
~~~~~
How about a function that gathers all the elements that occur
as leaves of the tree:
~~~~~{.haskell}
toList :: Tree a -> [a]
toList (Leaf x) = [x]
toList (Node l r) = (toList l) ++ (toList r)
~~~~~
Computation Pattern: Tree Fold
------------------------------
Did you spot the pattern? What are the different bits? The
base value and the operation? So we can write a tree folding
routine as
~~~~~{.haskell}
treeFold op b (Leaf x) = b
treeFold op b (Node l r) = (treeFold op b l) `op` (treeFold op b r)
~~~~~
Does that work? Well, we can easily check that
Quiz
----
What does `treeFold (+) 1 t` return?
a. `0`
b. the *largest* element in the tree
c. the *height* of the tree
d. the *number-of-leaves* of the tree
e. type *error*
Quiz
----
What does `treeFold max 0 t` return?
a. `0`
b. the *largest* element in the tree
c. the *height* of the tree
d. the *number-of-leaves* of the tree
e. type *error*
But what about `toList` ? Urgh. Does. Not. Work. We painted ourselves into
a corner. For `size` and `height` the base value is a constant, but for `toList`
the base value *depends on* the actual datum at the leaf! Thus, we need
*two* operations: one to combine the results of the left and right
subtrees, and another to give us the value at a leaf. In other words, the
base `b` needs to be a *function* that takes as input the leaf value and
returns as output the result of folding over the leaf.
\begin{code}
treeFold op b (Leaf x) = b x
treeFold op b (Node l r) = (treeFold op b l)
`op`
(treeFold op b r)
\end{code}
Now we can write the `size` as
\begin{code}
size = treeFold (+) (const 1)
\end{code}
How would you write `height` ?
\begin{code}
height = treeFold undefined undefined
\end{code}
where `const 0` and `const 1` are the respective base functions that
ignore the leaf value and just always return `0` and `1` respectively.
Can you guess the definition of `const` ?
Quiz
----
Which of the following implements `toList :: Tree a -> [a]`
a. `toList = treeFold (:) []`
b. `toList = treeFold (++) []`
c. `toList = treeFold (:) (\x -> [x])`
d. `toList = treeFold (++) (\x -> [x])`
e. none of the above.
What about the equivalent of `map` for `Tree`s ? We could write
the recursive definition
~~~~~{.haskell}
treeMap :: (a -> b) -> Tree a -> Tree b
treeMap f (Leaf x) = Leaf (f x)
treeMap f (Node l r) = Node (treeMap f l) (treeMap f r)
~~~~~
but recursion is **HARD TO READ** do we really have to use it ?
**DO IN CLASS**
Sweet! see how we just used the constructors as functions!
Lets take it out for a spin.
~~~~~{.haskell}
ghci> st1
Node (Leaf "cat") (Leaf "doggerel")
ghci> treeMap length st1
Node (Leaf 3) (Leaf 8)
ghci> st2
Node (Leaf "piglet") (Leaf "hippopotamus")
ghci> treeMap reverse st2
Node (Leaf "telgip") (Leaf "sumatopoppih")
~~~~~
Spotting Patterns In The "Real" World
=====================================
It was all well and good to see the patterns in tiny "toy" functions.
Needless to say, these patterns appear regularly in "real" code, if only
you know to look for them. Next, we will develop a small library that
"swizzles" text files.
1. We will start with a beginner's version that is
riddled with [explicit recursion](swizzle-v0.html).
2. Next, we will try to spot the patterns and eliminate
recursion using [higher-order functions](swizzle-v1.html).
3. Finally, we will [parameterize](swizzle-v2.html) the code so that we can
both "swizzle" and "unswizzle" without duplicate code.
Exercise
--------
Needless to say, the code can be cleaned up even more,
and I encourage you to do so. For example, rewrite the
code that swizzles `Char` so that instead of using
association lists, it uses the more efficient `Map k v`
(maps from keys `k` to values `v`) type in the standard
library module `Data.Map`.
<div class="hidden">
\begin{code}
main :: IO ()
main = return ()
\end{code}
</div>