Take me home

# The amazingly amazing amaze of lazy data structures

Published December 14, 2019

This post is part of a series: Advent Calendar 2019

Two things makes me keep coming back to Clojure: Persistent data structures and lazy sequences.

This post is about lazy sequences.

## Lazy fibonacci

To demonstrate lazy sequences, let's implement the Fibonacci sequence.

Here's a JavaScript version of the Clojure code to follow, so you have something to compare it to.

``````// JavaScript version, for good measure
function fib(n, xs) {
xs = xs || [0, 1]

const c = xs.length

if (c < n) {
return fib(n, xs.concat([
xs[xs.length - 1] + xs[xs.length - 2]
]))
} else {
return xs;
}
}

fib(10)
// [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]``````

Here's the Clojure version of the same code.

``````; "Plain" version
(defn fib
([n]
(fib n [0 1]))
([n xs]
(let [c (count xs)]
(if (< c n)
(fib n
(conj xs
(+ (nth xs (- c 1))
(nth xs (- c 2)))))
xs))))

(fib 10)
; [0 1 1 2 3 5 8 13 21 34]``````

All good, right?

Well, it's not the prettiest code in the world.

Let's fix that, by making literally the prettiest code in the world.

``````; Lazy version - literally the prettiest
; code in the world!
(defn fib
([]
(fib 0 1))
([a b]
(lazy-seq (cons a (fib b (+ a b))))))

(take 10 (fib))
; (0 1 1 2 3 5 8 13 21 34)``````

Oh yass.

First of all, that's much less code.

Second of all, you can make it use BigDecimals and get `(fib 100000)` super fast, with good memory usage and no stack overflow and all that jazz.

``````; Clojure uses long by default, so make it use
; bigdec for arbitrarily sized numbers
(defn fib
([]
(fib (bigdec 0) (bigdec 1)))
([a b]
(lazy-seq (cons a (fib b (+ a b))))))

(time (last (take 100000 (fib))))
; "Elapsed time: 299.1678 msecs"
; 1605285768272[...20880 digits (yes,
; really) cut for brevity]790626M``````

That's pretty cool.

By the way, the threading macro is awesome. You use it to write statements in the order that they are called.

``````; Plain version
(time (last (take 100000 (fib))))

(->> (fib)
(take 100000)
(last)
(time))``````

So what actually is this laziness?

Well, the whole idea is that you separate two concepts: generating data, and extracting/using data.

See how the first version had both iteration, data generation, checking if our max `n` was reached etc in the same code?

We don't need to do that with laziness.

With the lazy version, we just define what the data structure looks like. You get a "lazy sequence" back, and you can think of it as a view of an infinitely sized Fibonacci sequence, that is not actually computed until you actually ask for some data from the sequence.

So it's the call to `(take)` that actually makes something happen.

``````; "iterate" is also lay
(take 10 (iterate inc 0))
;  (0 1 2 3 4 5 6 7 8 9)

; So is "range"
(take 10 (range -50 9000 3))
; (-50 -47 -44 -41 -38 -35 -32 -29 -26 -23)``````

## Lazy map

This code prints absolutely nothing.

``````(map (fn [n] (print n ""))
[1 2 3 4 5 6 7 8 9 10])

; Nothing.. Absolutely nothing``````

That's because `(map)` is also lazy. We have to actually get some items from it for the map function to be called.

``````; Wait what, now?
(->> (map (fn [n] (print n "") n)
[1 2 3 4 5 6 7 8 9 10])
(take 3))

; prints
; 1 2 3 4 5 6 7 8 9 10

; returns
; (1 2 3)``````

Surprised? We ask for 3 items, but all 10 items seems to be printed! Is it not lazy after all?

It turns out that under the hood, Clojure will some times do a performance optimization and realize lazy sequences in batches of 32.

``````; Batching
(->> (map (fn [n] (print n "") n)
[1 2 3 4 5 6 7 8 9 10 11 ... 34 35 36])
(take 3))

; prints
; 1 2 3 4 5 6 7 8 9 10 11 ... 30 31 32

; returns
; (1 2 3)``````

Note: it will run the mapping function 32 times and "pre-cache" the lazy results - but you'll still get a sequence with 3 elements in it returned from `(take)`.

But, if your source is a lazy sequence and not a vector, it won't do that.

``````; No batching when source is lazy
(->> (map (fn [n] (print n "") n)
(iterate inc 1))
(take 3))

; prints
; 1 2 3

; returns
; (1 2 3)``````

## Generating PIN codes

This one is pretty cool. Let's say you want to generate N random 4 digit pin codes.

You can generate an infinite lazy sequence of random numbers:

``````; Possible duplicates!
(->> (repeatedly #(rand-int 9999))
(take 5))

; (643 6000 483 8668 7493)``````

But this list can contain duplicates! The chances are low when you just get 5 items obviously, but that doesn't matter - our requirement was that the list should be unique.

Enter the awesomeness of `(distinct)`. That function will take a list, and return a list with no duplicates - and it is lazy!

``````; No duplicates!
(->> (repeatedly #(rand-int 9999))
(distinct)
(take 5)
; Also format it, for good measure
(map #(format "%04d" %)))

; ("6856" "3461" "8833" "4884" "0004")``````

So good. So, so very good.