Collections
Clojure is a functional programming language in which we focus on building the computations of our programs in terms of the evaluation of functions, rather than building custom data types and their associated behaviors. In the other dominant programming paradigm, object-oriented programming, programmers define the data types and the operations available on them. Objects are supposed to encapsulate data and communicate with each other by passing messages around. But there is an unfortunate tendency to create classes and new types of objects to customize the shape of the data, instead of using more generic data structures, which cascades into creating specific methods to access and modify the data. We have to come up with decent names, which is difficult, and then we pass instances of objects around in our programs. We create new classes all the time, but more code means more bugs. It is a recipe for disaster; it is an explosion of code, with code that is very specific and benefits from little reuse.
Of course, it is not like that everywhere, and you can write clean object-oriented code, with objects being the little black boxes of functionality they were designed for. However, as programmers, whether it's through using other libraries or maintaining a legacy code base, we spend most of our time working with other people's code.
In functional programming, and more specifically, in Clojure, we tend to work with just a few data types. Types that are generic and powerful, types that every other "Clojurian" already knows and has mastered.
Collections are data types that can contain more than one thing and describe how those items relate to each other. The four main data structures for collections that you should know about are Maps, Sets, Vectors, and Lists. There are more available, including the data structure offered by your host platform (for example, Java or JavaScript) or other libraries, but those four are your bread and butter for doing things in Clojure.
"Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." - Rob Pike's Rule #5 of programming.
Maps
A Map is a collection of key-value pairs. Clojure provides – in a persistent and immutable fashion – the usual HashMap but also a SortedMap.
HashMaps are called "Hash" because they create a hash of the key and map it to a given value. Lookups, as well as other common operations (insert
and delete
), are fast.
HashMaps are used a lot in Clojure, notably, for representing entities where we need to associate some attributes to some values. SortedMaps are different because they preserve the order of the keys; otherwise, they have the same interface and are used in the same way as HashMaps. SortedMaps are not very common, so let's focus on HashMaps.
You can create a HashMap with the literal notation using curly braces. Here is a Map with three key-value pairs, with the keys being the :artist
, :song
, and :year
keywords:
user=> {:artist "David Bowtie" :song "The Man Who Mapped the World" :year 1970} {:artist "David Bowtie", :song "The Man Who Mapped the World", :year 1970}
You might have noticed in the preceding example that key-value pairs in the map are separated by a space, but Clojure evaluates it and returns a Map with key-value pairs separated by a comma. As with other collections, you can choose to use a space or a comma to separate each entry. For maps, there's no best practice and if you think it improves a map's readability, use commas; otherwise, simply omit them. You can also separate entries with new lines.
Here's another map written with comma-separated entries:
user=> {:artist "David Bowtie", :song "Comma Oddity", :year 1969} {:artist "David Bowtie", :song "Comma Oddity", :year 1969}
Notice that the values can be of any type, and not only simple values such as strings and numbers, but also vectors and even other maps, allowing you to create nested data structures and structure information as follows:
user=> { "David Bowtie" { "The Man Who Mapped the World" {:year 1970, :duration "4:01"} "Comma Oddity" {:year 1969, :duration "5:19"} } "Crosby Stills Hash" { "Helplessly Mapping" {:year 1969, :duration "2:38"} "Almost Cut My Hair" {:year 1970, :duration "4:29", :featuring ["Neil Young", "Rich Hickey"]} } } {"David Bowtie" {"The Man Who Mapped the World" {:year 1970, :duration "4:01"}, "Comma Oddity" {:year 1969, :duration "5:19"}}, "Crosby Stills Hash" {"Helplessly Mapping" {:year 1969, :duration "2:38"}, "Almost Cut My Hair" {:year 1970, :duration "4:29", :featuring ["Neil Young" "Rich Hickey"]}}}
Keys can be of different types too, so you could have strings, numbers, or even other types as a key; however, we generally use keywords.
Another way of creating a map is by using the hash-map
function, passing in pairs of arguments as follows:
user=> (hash-map :a 1 :b 2 :c 3) {:c 3, :b 2, :a 1}
Choose to use literal notation with curly braces when possible, but when HashMaps are programmatically generated, the hash-map
function can come in handy.
Map keys are unique:
user=> {:name "Lucy" :age 32 :name "Jon"} Syntax error reading source at (REPL:6:35). Duplicate key: :name
An exception was thrown because the :name
key was present twice in the preceding literal map.
However, different keys can have the same value:
user=> {:name "Lucy" :age 32 :number-of-teeth 32} {:name "Lucy", :age 32, :number-of-teeth 32}
Notice that both age
and number-of-teeth
have the same value, and that is both valid and convenient, to say the least.
Now that you know how to create maps, it is time for a bit of practice.
Exercise 2.02: Using Maps
In this exercise, we will learn how to access and modify simple maps:
- Start your REPL and create a map:
user=> (def favorite-fruit {:name "Kiwi", :color "Green", :kcal_per_100g 61 :distinguish_mark "Hairy"}) #'user/favorite-fruit
- You can read an entry from the map with the
get
function. Try to look up a key or two, as follows:user=> (get favorite-fruit :name) "Kiwi" user=> (get favorite-fruit :color) "Green"
- If the value for a given key cannot be found,
get
returnsnil
, but you can specify a fallback value with a third argument toget
:user=> (get favorite-fruit :taste) nil user=> (get favorite-fruit :taste "Very good 8/10") "Very good 8/10" user=> (get favorite-fruit :kcal_per_100g 0) 61
- Maps and keywords have the special ability to be used as functions. When positioned in the "operator position" (as the first item of the list), they are invoked as a function that can be used to look up a value in a map. Try it now by using the
favorite-fruit
map as a function:user=> (favorite-fruit :color) "Green"
- Try to use a keyword as a function to look up a value in a Map:
user=> (:color favorite-fruit) "Green"
As with the
get
function, those ways of retrieving a value returnnil
when the key cannot be found, and you can pass an extra argument to provide a fallback value. - Provide a fallback value for a key that doesn't exist in the
favorite-fruit
map:user=> (:shape favorite-fruit "egg-like") "egg-like"
- We would like to store this value in the map. Use
assoc
to associate a new key,:shape
, with a new value,"egg-like"
, in our map:user=> (assoc favorite-fruit :shape "egg-like") {:name "Kiwi", :color "Green", :kcal_per_100g 61, :distinguish_mark "Hairy", :shape "egg-like"}
The
assoc
operation returns a new map containing our previous key-value pairs as well as the new association we've just added. - Evaluate
favorite-fruit
and notice that it remains unchanged:user=> favorite-fruit {:name "Kiwi", :color "Green", :kcal_per_100g 61, :distinguish_mark "Hairy"}
Because a map is immutable, the value bound to the
favorite-fruit
symbol has not changed. By usingassoc
, we have created a new version of the map.Now, the F3C ("Funny Fruity Fruits Consortium") have reverted their previous ruling and determined during their quarterly review of fruit specifications that the color of the kiwi fruit should be brown and not green. To make sure that your application is F3C compliant, you decide to update your system with the new value.
- Change the color of
favorite-fruit
by associating a new value to the:color
key:user=> (assoc favorite-fruit :color "Brown") {:name "Kiwi", :color "Brown", :kcal_per_100g 61, :distinguish_mark "Hairy"}
assoc
replaces the existing value when a key already exists, because HashMaps cannot have duplicate keys. - If we wanted to add more structured information, we could add a map as a value. Add production information as a nested map in our
Kiwi
map:user=> (assoc favorite-fruit :yearly_production_in_tonnes {:china 2025000 :italy 541000 :new_zealand 412000 :iran 311000 :chile 225000}) {:name "Kiwi", :color "Green", :kcal_per_100g 61, :distinguish_mark "Hairy", :yearly_production_in_tonnes {:china 2025000, :italy 541000, :new_zealand 412000, :iran 311000, :chile 225000}}
Having nested maps or other data types is commonly used to represent structured information.
New research has found out that the Kiwi contains fewer calories than previously thought, and to stay compliant, the F3C requires organizations to reduce the current value of kcal per 100 g by 1.
- Decrement
kcal_per_100g
with theassoc
function, as follows:user=> (assoc favorite-fruit :kcal_per_100g (- (:kcal_per_100g favorite-fruit) 1)) {:name "Kiwi", :color "Green", :kcal_per_100g 60, :distinguish_mark "Hairy"}
Great! It works, but there is a more elegant way to deal with this type of operation. When you need to change a value in a map based on a previous value, you can use the
update
function. While theassoc
function lets you associate a completely new value to a key,update
allows you to compute a new value based on the previous value of a key. Theupdate
function takes a function as its third parameter. - Decrement
kcal_per_100g
with theupdate
function anddec
, as follows:user=> (update favorite-fruit :kcal_per_100g dec) {:name "Kiwi", :color "Green", :kcal_per_100g 60, :distinguish_mark "Hairy"}
Notice how the value of
:kcal_per_100g
changed from61
to60
. - You can also pass arguments to the function provided to update; for example, if we wanted to lower
:kcal_per_100g
by 10 instead of 1, we could use the subtract function,-
, and write the following:user=> (update favorite-fruit :kcal_per_100g - 10) {:name "Kiwi", :color "Green", :kcal_per_100g 51, :distinguish_mark "Hairy"}
Like
assoc
,update
does not change the immutable map; it returns a new map.This example illustrates the power of functions being "first-class citizens": we treat them like typical values; in this case, a function was passed as an argument to another function. We will elaborate on this concept in the next chapter while diving into functions in more depth.
- Finally, use
dissoc
(as in "dissociate") to remove one or multiple elements from a map:user=> (dissoc favorite-fruit :distinguish_mark) {:name "Kiwi", :color "Green", :kcal_per_100g 61} user=> (dissoc favorite-fruit :kcal_per_100g :color) {:name "Kiwi", :distinguish_mark "Hairy"}
Well done! Now that we know how to use maps, it is time to move on to the next data structure: sets.
Sets
A set is a collection of unique values. Clojure provides HashSet and SortedSet. Hash Sets are implemented as Hash Maps, with the key and the value of each entry being identical.
Hash Sets are fairly common in Clojure and have a literal notation of a hash with curly braces, #{}
, for example:
user=> #{1 2 3 4 5} #{1 4 3 2 5}
Notice in the preceding expression that when the set is evaluated, it does not return the elements of the sets in the order that they were defined in the literal expression. This is because of the internal structure of the HashSet. The value is transformed in a unique hash, which allows fast access but does not keep the insertion order. If you care about the order in which the elements are added, you need to use a different data structure, for example, a sequence such as a vector (which we will soon discover). Use a HashSet to represent elements that logically belong together, for example, an enumeration of unique values.
As with maps, sets cannot have duplicate entries:
user=> #{:a :a :b :c} Syntax error reading source at (REPL:135:15). Duplicate key: :a
Hash Sets can be created from a list of values by passing those values to the hash-set
function:
user=> (hash-set :a :b :c :d) #{:c :b :d :a}
Hash Sets can also be created from another collection with the set
function. Let's create a HashSet from a vector:
user=> (set [:a :b :c]) #{:c :b :a}
Notice that the order defined in the vector was lost.
The set
function will not throw an error when converting a collection of non-unique values to a set with the set
function, which can be useful for deduplicating values:
user=> (set ["No" "Copy" "Cats" "Cats" "Please"]) #{"Copy" "Please" "Cats" "No"}
Notice how one of the duplicate strings, "Cats"
, was silently removed to create a set.
A Sorted Set can be created with the sorted-set
function and have no literal syntax as Hash Sets do:
user=> (sorted-set "No" "Copy" "Cats" "Cats" "Please") #{"Cats" "Copy" "No" "Please"}
Notice that they are printed in the same way as Hash Sets, only the order looks different. Sorted Sets are sorted based on the natural order of elements they contain rather than the order of the arguments provided upon creation. You could instead provide your own sorting function, but we will focus on Hash Sets as they are far more common and useful.
Exercise 2.03: Using Sets
In this exercise, we will use a Hash Set to represent a collection of supported currencies:
Note
A Hash Set is a good choice of data structure for a list of currencies because we typically want to store a collection of unique values and efficiently check for containment. Also, the order of the currencies probably doesn't matter. If you wanted to associate more data to a currency (such as ISO codes and countries), then you would more likely use nested Maps to represent each currency as an entity, keyed by a unique ISO code. Ultimately, the choice of the data structure depends on how you plan to use the data. In this exercise, we simply want to read it, check for containment, and add items to our set.
- Start a REPL. Create a set and bind it to the
supported-currencies
symbol:user=> (def supported-currencies #{"Dollar" "Japanese yen" "Euro" "Indian rupee" "British pound"}) #'user/supported-currencies
- As with maps, you can use
get
to retrieve an entry from a set, which returns the entry passed as a parameter when present in the set. Useget
to retrieve an existing entry as well as a missing entry:user=> (get supported-currencies "Dollar") "Dollar" user=> (get supported-currencies "Swiss franc") nil
- It is likely that you just want to check for containment, and
contains?
is, therefore, semantically better. Usecontains?
instead ofget
to check for containment:user=> (contains? supported-currencies "Dollar") true user=> (contains? supported-currencies "Swiss franc") false
Notice that
contains?
returns a Boolean and thatget
returns the lookup value ornil
when not found. There is the edge case of looking upnil
in a set that will returnnil
both when found and not found. In that case,contains?
is naturally more suitable. - As with maps, sets and keywords can be used as functions to check for containment. Use the
supported-currencies
set as a function to look up a value in the set:user=> (supported-currencies "Swiss franc") nil
"Swiss franc"
isn't in thesupported-currencies
set; therefore, the preceding return value isnil
. - If you tried to use the
"Dollar"
string as a function to look itself up in the set, you would get the following error:user=> ("Dollar" supported-currencies) Execution error (ClassCastException) at user/eval7 (REPL:1). java.lang.String cannot be cast to clojure.lang.IFn
We cannot use strings as a function to look up a value in a set or a Map. That's one of the reasons why keywords are a better choice in both sets and maps when possible.
- To add an entry to a set, use the
conj
function, as in "conjoin":user=> (conj supported-currencies "Monopoly Money") #{"Japanese yen" "Euro" "Dollar" "Monopoly Money" "Indian rupee" "British pound"}
- You can pass more than one item to the
conj
function. Try to add multiple currencies to our Hash Set:user=> (conj supported-currencies "Monopoly Money" "Gold dragon" "Gil") #{"Japanese yen" "Euro" "Dollar" "Monopoly Money" "Indian rupee" "Gold dragon" "British pound" "Gil"}
- Finally, you can remove one or more items with the
disj
function, as in "disjoin":user=> (disj supported-currencies "Dollar" "British pound") #{"Japanese yen" "Euro" "Indian rupee"}
That's it for sets! If you ever need to, you can find more functions for working with sets in the clojure.set
namespace (such as union and intersection), but this is more advanced usage, so let's move on to the next collection: vectors.
Vectors
A vector is another type of collection that is widely used in Clojure. You can think of vectors as powerful immutable arrays. They are collections of values efficiently accessible by their integer index (starting from 0), and they maintain the order of item insertion as well as duplicates.
Use a vector when you need to store and read elements in order, and when you don't mind duplicate elements. For example, a web browser history could be a good candidate, as you might want to easily go back to the recent pages but also remove older elements using a vector's index, and there would likely be duplicate elements in it. A map or a set wouldn't be of much help in that situation, as you don't have a specific key to look up a value with.
Vectors have a literal notation with square brackets ([]
):
user=> [1 2 3] [1 2 3]
Vectors can also be created with the vector
function followed by a list of items as arguments:
user=> (vector 10 15 2 15 0) [10 15 2 15 0]
You can create a vector from another collection using the vec
function; for example, the following expression converts a Hash Set to a vector:
user=> (vec #{1 2 3}) [1 3 2]
As with other collections, vectors also can contain different types of values:
user=> [nil :keyword "String" {:answers [:yep :nope]}] [nil :keyword "String" {:answers [:yep :nope]}]
We can now start practicing.
Exercise 2.04: Using Vectors
In this exercise, we will discover different ways of accessing and interacting with vectors:
- Start a REPL. You can look up values in a vector using their index (that is, their position in the collection) with the
get
function. Try to use theget
function with a literal vector:user=> (get [:a :b :c] 0) :a user=> (get [:a :b :c] 2) :c user=> (get [:a :b :c] 10) nil
Because vectors start at 0-index,
:a
is at index 0 and:c
is at index 2. When the lookup fails,get
returnsnil
. - Let's bind a vector to a symbol to make the practice more convenient:
user=> (def fibonacci [0 1 1 2 3 5 8]) #'user/fibonacci user=> (get fibonacci 6) 8
- As with maps and sets, you can use the vector as a function to look up items, but for vectors, the parameter is the index of the value in the vector:
user=> (fibonacci 6) 8
- Add the next two values of the Fibonacci sequence to your vector with the
conj
function:user=> (conj fibonacci 13 21) [0 1 1 2 3 5 8 13 21]
Notice that the items are added to the end of the vector, and the order of the sequence is kept the same.
- Each item in the Fibonacci sequence corresponds to the sum of the previous two items. Let's dynamically compute the next item of the sequence:
user=> (let [size (count fibonacci) last-number (last fibonacci) second-to-last-number (fibonacci (- size 2))] (conj fibonacci (+ last-number second-to-last-number))) [0 1 1 2 3 5 8 13]
In the preceding example, we used
let
to create three local bindings and improve the readability. We usedcount
to calculate the size of a vector,last
to retrieve its last element,8
, and finally, we used thefibonacci
vector as a function to retrieve the element at index "size - 2" (which is the value5
at index5
).
In the body of the let
block, we used the local binding to add the two last items to the end of the Fibonacci sequence with conj
, which returns 13
(which is, indeed, 5 + 8).
Lists
Lists are sequential collections, similar to vectors, but items are added to the front (at the beginning). Also, they don't have the same performance properties, and random access by index is slower than with vectors. We mostly use lists to write code and macros, or in cases when we need a last-in, first-out (LIFO) type of data structure (for example, a stack), which can arguably also be implemented with a vector.
We create lists with the literal syntax, ()
, but to differentiate lists that represent code and lists that represent data, we need to use the single quote, '
:
user=> (1 2 3) Execution error (ClassCastException) at user/eval211 (REPL:1). java.lang.Long cannot be cast to clojure.lang.IFn user=> '(1 2 3) (1 2 3) user=> (+ 1 2 3) 6 user=> '(+ 1 2 3) (+ 1 2 3)
In the preceding examples, we can see that a list that is not quoted with '
throws an error unless the first item of the list can be invoked as a function.
Lists can also be created with the list
function:
user=> (list :a :b :c) (:a :b :c)
To read the first element of a list, use first
:
user=> (first '(:a :b :c :d)) :a
The rest
function returns the list without its first item:
user=> (rest '(:a :b :c :d)) (:b :c :d)
We will not talk about iterations and recursion yet, but you could imagine that the combination of first
and rest
is all you need to "walk" or go through an entire list: simply by calling first
on the rest of the list over and over again until there's no rest.
You cannot use the get
function with a list to retrieve by index. You could use nth
, but it is not efficient as the list is iterated or "walked" until it reaches the desired position:
user=> (nth '(:a :b :c :d) 2) :c
Exercise 2.05: Using Lists
In this exercise, we will practice using lists by reading and adding elements to a to-do list.
- Start a REPL and create a to-do list with a list of actions that you need to do, using the
list
function as follows:user=> (def my-todo (list "Feed the cat" "Clean the bathroom" "Save the world")) #'user/my-todo
- You can add items to your list by using the
cons
function, which operates on sequences:user=> (cons "Go to work" my-todo) ("Go to work" "Feed the cat" "Clean the bathroom" "Save the world")
- Similarly, you can use the
conj
function, which is used because a list is a collection:user=> (conj my-todo "Go to work") ("Go to work" "Feed the cat" "Clean the bathroom" "Save the world")
Notice how the order of the parameters is different.
cons
is available on lists because a list is a sequence, andconj
is available to use on lists because a list is a collection.conj
is, therefore, slightly more "generic" and also has the advantage of accepting multiple elements as arguments. - Add multiple elements at once to your list by using the
conj
function:user=> (conj my-todo "Go to work" "Wash my socks") ("Wash my socks" "Go to work" "Feed the cat" "Clean the bathroom" "Save the world")
- Now it's time to catch up with your task. Retrieve the first element in your to-do list with the
first
function:user=> (first my-todo) "Feed the cat"
- Once done, you can retrieve the rest of your tasks with the
rest
function:user=> (rest my-todo) ("Clean the bathroom" "Save the world")
You could imagine then having to call
first
on the rest of the list (if you had to develop a fully blown to-do list application). Because the list is immutable, if you keep callingfirst
on the samemy-todo
list, you will end up with the same element,"Feed the cat"
, over and over again, and also with a happy but very fat cat. - Finally, you can also retrieve a specific element from the list using the
nth
function:user=> (nth my-todo 2) "Save the world"
However, remember that retrieving an element at a specific position in a list is slower than with vectors because the list has to be "walked" until the
nth
element. In that case, you might be better off using a vector. One final note aboutnth
is that it throws an exception when the element at position n is not found.
That is all you need to know about lists for now and we can move on to the next section about collection and sequence abstractions.
Collection and Sequence Abstractions
Clojure's data structures are implemented in terms of powerful abstractions. You might have noticed that the operations we used on collections are often similar, but behave differently based on the type of the collection. For instance, get
retrieves items from a map with a key, but from a vector with an index; conj
adds elements to a vector at the back, but to a list at the front.
A sequence is a collection of elements in a particular order, where each item follows another. Maps, sets, vectors, and lists are all collections, but only vectors and lists are sequences, although we can easily obtain a sequence from a map or a set.
Let's go through a few examples of useful functions to use with collections. Consider the following map:
user=> (def language {:name "Clojure" :creator "Rich Hickey" :platforms ["Java" "JavaScript" ".NET"]}) #'user/language
Use count
to get the number of elements in a collection. Each element of this map is a key-value pair; therefore, it contains three elements:
user=> (count language) 3
Slightly more apparent, the following set contains no elements:
user=> (count #{}) 0
We can test whether a collection is empty with the empty?
function:
user=> (empty? language) false user=> (empty? []) true
A map is not sequential because there is no logical order between its elements. However, we can convert a map to a sequence using the seq
function:
user=> (seq language) ([:name "Clojure"] [:creator "Rich Hickey"] [:platforms ["Java" "JavaScript" ".NET"]])
It yielded a list of vectors or tuples, which means that there is now a logical order and we can use sequence functions on this data structure:
user=> (nth (seq language) 1) [:creator "Rich Hickey"]
A lot of functions just work on collections directly because they can be turned into a sequence, so you could omit the seq
step and, for example, call first
, rest
, or last
directly on a map or a set:
user=> (first #{:a :b :c}) :c user=> (rest #{:a :b :c}) (:b :a) user=> (last language) [:platforms ["Java" "JavaScript" ".NET"]]
The value of using sequence functions such as first
or rest
on maps and sets seems questionable but treating those collections as sequences means that they can then be iterated. Many more functions are available for processing each item of a sequence, such as map
, reduce
, filter
, and so on. We have dedicated entire chapters to learning about those in the second part of the book so that we can stay focused on the other core functions for now.
into
is another useful operator that puts elements of one collection into another collection. The first argument for into
is the target collection:
user=> (into [1 2 3 4] #{5 6 7 8}) [1 2 3 4 7 6 5 8]
In the preceding example, each element of the #{5 6 7 8}
set was added into the [1 2 3 4]
vector. The resulting vector is not in ascending order because Hash Sets are not sorted:
user=> (into #{1 2 3 4} [5 6 7 8]) #{7 1 4 6 3 2 5 8}
In the preceding example, the [5 6 7 8]
vector was added to the #{1 2 3 4}
set. Once again, Hash Sets do not keep insertion order and the resulting set is simply a logical collection of unique values.
A usage example would be, for example, to deduplicate a vector, just put it into a set:
user=> (into #{} [1 2 3 3 3 4]) #{1 4 3 2}
To put items into a map, you would need to pass a collection of tuples representing key-value pairs:
user=> (into {} [[:a 1] [:b 2] [:c 3]]) {:a 1, :b 2, :c 3}
Each item is "conjoined" in the collection, and so it follows the semantic of the target collection for inserting items with conj
. Elements are added to a list at the front:
user=> (into '() [1 2 3 4]) (4 3 2 1)
To help you understand (into '() [1 2 3 4])
, here is a step-by-step representation of what happened:
user=> (conj '() 1) (1) user=> (conj '(1) 2) (2 1) user=> (conj '(2 1) 3) (3 2 1) user=> (conj '(3 2 1) 4) (4 3 2 1)
If you want to concatenate collections, concat
might be more appropriate than into
. See how they behave differently here:
user=> (concat '(1 2) '(3 4)) (1 2 3 4) user=> (into '(1 2) '(3 4)) (4 3 1 2)
A lot of Clojure functions that operate on sequences will return sequences no matter what the input type was. concat
is one example:
user=> (concat #{1 2 3} #{1 2 3 4}) (1 3 2 1 4 3 2) user=> (concat {:a 1} ["Hello"]) ([:a 1] "Hello")
sort
is another example. sort
can rearrange a collection to order its elements. It has the benefit of being slightly more obvious in terms of why you would want a sequence as a result:
user=> (def alphabet #{:a :b :c :d :e :f}) #'user/alphabet user=> alphabet #{:e :c :b :d :f :a} user=> (sort alphabet) (:a :b :c :d :e :f) user=> (sort [3 7 5 1 9]) (1 3 5 7 9)
But what if you wanted a vector as a result? Well, now you know that you could use the into
function:
user=> (sort [3 7 5 1 9]) (1 3 5 7 9) user=> (into [] *1) [1 3 5 7 9]
It is interesting to note that conj
can also be used on maps. For its arguments to be consistent with other types of collections, the new entry is represented by a tuple:
user=> (conj language [:created 2007]) {:name "Clojure", :creator "Rich Hickey", :platforms ["Java" "JavaScript" ".NET"], :created 2007}
Similarly, a vector is an associative collection of key-value pairs where the key is the index of the value:
user=> (assoc [:a :b :c :d] 2 :z) [:a :b :z :d]
Exercise 2.06: Working with Nested Data Structures
For the purpose of this exercise, imagine that you are working with a little shop called "Sparkling," whose business is to trade gemstones. It turns out that the owner of the shop knows a bit of Clojure, and has been using a Clojure REPL to manage the inventory with some kind of homemade database. However, the owner has been struggling to work with nested data structures, and they require help from a professional: you. The shop won't share their database because it contains sensitive data – they have just given you a sample dataset so that you know about the shape of the data.
The shop owner read a blog post on the internet saying that pure functions are amazing and make for good quality code. So, they asked you to develop some pure functions that take their gemstone database as the first parameter of each function. The owner said you would only get paid if you provide pure functions. In this exercise, we will develop a few functions that will help us understand and operate on nested data structures.
Note
A pure function is a function where the return value is only determined by its input values. A pure function does not have any side effects, which means that it does not mutate a program's state nor generate any kind of I/O.
- Open up a REPL and create the following Hash Map representing the sample gemstone database:
repl.clj
1 (def gemstone-db { 2 :ruby { 3 :name "Ruby" 4 :stock 480 5 :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712] 6 :properties { 7 :dispersion 0.018 8 :hardness 9.0 9 :refractive-index [1.77 1.78] 10 :color "Red" 11 } 12 }
The complete code for this snippet can be found at https://packt.live/3aD8MgL
One of the most popular questions the shop gets from its customers is about the durability of a gem. This can be found in the properties of a gem, at the
:hardness
key. The first function that we need to develop isdurability
, which retrieves the hardness of a given gem. - Let's start by using a function we already know,
get
, with the:ruby
gem as an example:user=> (get (get (get gemstone-db :ruby) :properties) :hardness) 9.0
It works, but nesting
get
is not very elegant. We could use the map or keywords as functions and see how it improves the readability. - Use the keywords as a function to see how it improves the readability of our code:
user=> (:hardness (:properties (:ruby gemstone-db))) 9.0
This is slightly better. But it's still a lot of nested calls and parentheses. Surely, there must be a better way!
When you need to fetch data in a deeply nested map such as this one, use the
get-in
function. It takes a vector of keys as parameters and digs in the map with just one function call. - Use the
get-in
function with the[:ruby :properties :hardness]
vector of parameters to retrieve the deeply nested:hardness
key:user=> (get-in gemstone-db [:ruby :properties :hardness]) 9.0
Great! The vector of keys reads left to right and there is no nested expression. It will make our function a lot more readable.
- Create the durability function that takes the database and the
gem
keyword as a parameter and returns the value of thehardness
property:user=> (defn durability [db gemstone] (get-in db [gemstone :properties :hardness])) #'user/durability
- Test your newly created function to make sure that it works as expected:
user=> (durability gemstone-db :ruby) 9.0 user=> (durability gemstone-db :moissanite) 9.5
Great! Let's move on to the next function.
Apparently, a ruby is not simply "red" but "Near colorless through pink through all shades of red to a deep crimson." Who would have thought? The owner is now asking you to create a function to update the color of a gem, because they might want to change some other colors too, for marketing purposes. The function needs to return the updated database.
- Let's try to write the code to change the color property of a gem. We can try to use
assoc
:user=> (assoc (:ruby gemstone-db) :properties {:color "Near colorless through pink through all shades of red to a deep crimson"}) {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:color "Near colorless through pink through all shades of red to a deep crimson"}}
It seems to work but, all the other properties are gone! We replaced the existing Hash Map at the key property with a new Hash Map that contains only one entry: the color.
- We could use a trick. Do you remember the
into
function? It takes a collection and put its values in another collection, like this:user=> (into {:a 1 :b 2} {:c 3}) {:a 1, :b 2, :c 3}
If we use the
update
function combined withinto
, we could obtain the desired result. - Try to use
update
combined withinto
to change the:color
property of the ruby gem:user=> (update (:ruby gemstone-db) :properties into {:color "Near colorless through pink through all shades of red to a deep crimson"}) {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:dispersion 0.018, :hardness 9.0, :refractive-index [1.77 1.78], :color "Near colorless through pink through all shades of red to a deep crimson"}}
That's great, but there are two problems with this approach. First, the combination of
update
andinto
is not very readable or easy to understand. Second, we wanted to return the entire database, but we just returned the"Ruby"
entry. We would have to add another operation to update this entry in the main database, perhaps by nesting anotherinto
, reducing readability even further.As with
get-in
, Clojure offers a simpler way of dealing with nested maps:assoc-in
andupdate-in
. They work likeassoc
andupdate
, but take a vector of keys (such asget-in
) as a parameter, instead of a single key.You would use
update-in
when you want to update a deeply nested value with a function (for example, to compute the new value with the previous value). Here, we simply want to replace the color with an entirely new value, so we should useassoc-in
. - Use
assoc-in
to change thecolor
property of the ruby gem:user=> (assoc-in gemstone-db [:ruby :properties :color] "Near colorless through pink through all shades of red to a deep crimson") {:ruby {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:dispersion 0.018, :hardness 9.0, :refractive-index [1.77 1.78], :color "Near colorless through pink through all shades of red to a deep crimson"}}, :emerald {:name "Emerald", :stock 85, :sales [6605 2373 104 4764 9023], :properties {:dispersion 0.014, :hardness 7.5, :refractive-index [1.57 1.58], :color "Green shades to colorless"}}, :diamond {:name "Diamond", :stock 10, :sales [8295 329 5960 6118 4189 3436 9833 8870 9700 7182 7061 1579], :properties {:dispersion 0.044, :hardness 10, :refractive-index [2.417 2.419], :color "Typically yellow, brown or gray to colorless"}}, :moissanite {:name "Moissanite", :stock 45, :sales [7761 3220], :properties {:dispersion 0.104, :hardness 9.5, :refractive-index [2.65 2.69], :color "Colorless, green, yellow"}}}
Notice how
gemstone-db
was returned entirely. Can you notice the value that has changed? There is a lot of data, so it is not very obvious. You can use thepprint
function to "pretty print" the value.Use
pprint
on the last returned value to improve the readability and make sure that ourassoc-in
expression behaved as expected. In a REPL, the last returned value can be obtained with*1
:That is much more readable. We will not use
pprint
everywhere as it takes a lot of extra space, but you should use it. - Create the
change-color
pure function, which takes three parameters: a database, a gemstone keyword, and a new color. This function updates the color in the given database and returns the new value of the database:user=> (defn change-color [db gemstone new-color] (assoc-in gemstone-db [gemstone :properties :color] new-color)) #'user/change-color
- Test that your newly created function behaves as expected:
user=> (change-color gemstone-db :ruby "Some kind of red") {:ruby {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:dispersion 0.018, :hardness 9.0, :refractive-index [1.77 1.78], :color "Some kind of red"}}, :emerald {:name "Emerald", :stock 85, :sales [6605 2373 104 4764 9023], :properties {:dispersion 0.014, :hardness 7.5, :refractive-index [1.57 1.58], :color "Green shades to colorless"}}, :diamond {:name "Diamond", :stock 10, :sales [8295 329 5960 6118 4189 3436 9833 8870 9700 7182 7061 1579], :properties {:dispersion 0.044, :hardness 10, :refractive-index [2.417 2.419], :color "Typically yellow, brown or gray to colorless"}}, :moissanite {:name "Moissanite", :stock 45, :sales [7761 3220], :properties {:dispersion 0.104, :hardness 9.5, :refractive-index [2.65 2.69], :color "Colorless, green, yellow"}}}
The owner would like to add one last function to record the sale of a gem and update the inventory accordingly.
When a sale occurs, the shop owner would like to call the
sell
function with the following arguments: a database, a gemstone keyword, and a client ID.client-id
will be inserted in thesales
vector and thestock
value for that gem will be decreased by one. As with the other functions, the new value of the database will be returned so that the client can handle the update themselves. - We can use the
update-in
function in combination withdec
to decrement (decrease by one) the stock. Let's try it with the diamond gem:user=> (update-in gemstone-db [:diamond :stock] dec) {:ruby {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:dispersion 0.018, :hardness 9.0, :refractive-index [1.77 1.78], :color "Near colorless through pink through all shades of red to a deep crimson"}}, :emerald {:name "Emerald", :stock 85, :sales [6605 2373 104 4764 9023], :properties {:dispersion 0.014, :hardness 7.5, :refractive-index [1.57 1.58], :color "Green shades to colorless"}}, :diamond {:name "Diamond", :stock 9, :sales [8295 329 5960 6118 4189 3436 9833 8870 9700 7182 7061 1579], :properties {:dispersion 0.044, :hardness 10, :refractive-index [2.417 2.419], :color "Typically yellow, brown or gray to colorless"}}, :moissanite {:name "Moissanite", :stock 45, :sales [7761 3220], :properties {:dispersion 0.104, :hardness 9.5, :refractive-index [2.65 2.69], :color "Colorless, green, yellow"}}}
The output is not very readable, and it is hard to verify that the value was correctly updated. Another useful command to improve readability in the REPL is the
*print-level*
option, which can limit the depth of the data structure printed to the terminal. - Use the
*print-level*
option to set the depth level to2
, and observe how the result is printed:user=> (set! *print-level* 2) 2 user=> (update-in gemstone-db [:diamond :stock] dec) {:ruby {:name "Ruby", :stock 120, :sales #, :properties #}, :emerald {:name "Emerald", :stock 85, :sales #, :properties #}, :diamond {:name "Diamond", :stock 9, :sales #, :properties #}, :moissanite {:name "Moissanite", :stock 45, :sales #, :properties #}}
The diamond stock has indeed decreased by 1, from 10 to 9.
- We can use the
update-in
function again, this time in combination withconj
and aclient-id
to add in thesales
vector. Let's try an example with the diamond gem andclient-id 999
:user=> (update-in gemstone-db [:diamond :sales] conj 999) {:ruby {:name "Ruby", :stock 120, :sales #, :properties #}, :emerald {:name "Emerald", :stock 85, :sales #, :properties #}, :diamond {:name "Diamond", :stock 10, :sales #, :properties #}, :moissanite {:name "Moissanite", :stock 45, :sales #, :properties #}}
It might have worked, but we cannot see the
sales
vector as the data has been truncated by the*print-level*
option. - Set
*print-level*
tonil
to reset the option, and reevaluate the previous expression:user=> (set! *print-level* nil) nil user=> (update-in gemstone-db [:diamond :sales] conj 999) {:ruby {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:dispersion 0.018, :hardness 9.0, :refractive-index [1.77 1.78], :color "Near colorless through pink through all shades of red to a deep crimson"}}, :emerald {:name "Emerald", :stock 85, :sales [6605 2373 104 4764 9023], :properties {:dispersion 0.014, :hardness 7.5, :refractive-index [1.57 1.58], :color "Green shades to colorless"}}, :diamond {:name "Diamond", :stock 10, :sales [8295 329 5960 6118 4189 3436 9833 8870 9700 7182 7061 1579 999], :properties {:dispersion 0.044, :hardness 10, :refractive-index [2.417 2.419], :color "Typically yellow, brown or gray to colorless"}}, :moissanite {:name "Moissanite", :stock 45, :sales [7761 3220], :properties {:dispersion 0.104, :hardness 9.5, :refractive-index [2.65 2.69], :color "Colorless, green, yellow"}}}
Notice that our diamond
sales
vector now contains the value999
. - Now let's write our pure function, which combines the two operations (updating the stock and the clients):
(defn sell [db gemstone client-id] (let [clients-updated-db (update-in db [gemstone :sales] conj client-id)] (update-in clients-updated-db [gemstone :stock] dec)))
- Test your newly created function by selling a
:moissanite
toclient-id
123
:user=> (sell gemstone-db :moissanite 123) {:ruby {:name "Ruby", :stock 120, :sales [1990 3644 6376 4918 7882 6747 7495 8573 5097 1712], :properties {:dispersion 0.018, :hardness 9.0, :refractive-index [1.77 1.78], :color "Near colorless through pink through all shades of red to a deep crimson"}}, :emerald {:name "Emerald", :stock 85, :sales [6605 2373 104 4764 9023], :properties {:dispersion 0.014, :hardness 7.5, :refractive-index [1.57 1.58], :color "Green shades to colorless"}}, :diamond {:name "Diamond", :stock 10, :sales [8295 329 5960 6118 4189 3436 9833 8870 9700 7182 7061 1579], :properties {:dispersion 0.044, :hardness 10, :refractive-index [2.417 2.419], :color "Typically yellow, brown or gray to colorless"}}, :moissanite {:name "Moissanite", :stock 44, :sales [7761 3220 123], :properties {:dispersion 0.104, :hardness 9.5, :refractive-index [2.65 2.69], :color "Colorless, green, yellow"}}}
Notice that the sales
vector of the moissanite entity now contains the value 123
.
In this exercise, we did not really "update" data but merely derived new data structures from others because of their immutability. Even if we work mostly with immutable data types, Clojure offers simple mechanisms that allow you to persist information. In the following activity, you will create a database that can be read and updated with the techniques acquired in this chapter, and we will even provide a helper function to make the database persistent.
Activity 2.01: Creating a Simple In-Memory Database
In this activity, we are going to create our own implementation of an in-memory database. After all, if the "Sparkling" shop owner was able to do it, then it shouldn't be a problem for us!
Our database interface will live in the Clojure REPL. We will implement functions to create and drop tables, as well as to insert and read records.
For the purposes of this activity, we will provide a couple of helper functions to help you maintain the state of the database in memory:
(def memory-db (atom {})) (defn read-db [] @memory-db) (defn write-db [new-db] (reset! memory-db new-db))
We use an atom
but you don't need to understand how atoms work for now, as they are explained in great detail later in the book. You just need to know that it will keep a reference to our database in memory, and use two helper functions, read-db
and write-db
, to read and persist a Hash Map in memory.
As guidance, we would like the data structure to have this shape:
{:table-1 {:data [] :indexes {}} :table-2 {:data [] :indexes {}}
For example, if we used our database in a grocery store to save clients, fruits, and purchases, we can imagine that it would contain the data in this manner:
{ :clients { :data [{:id 1 :name "Bob" :age 30} {:id 2 :name "Alice" :age 24}] :indexes {:id {1 0, 2 1}} }, :fruits { :data [{:name "Lemon" :stock 10} {:name "Coconut" :stock 3}] :indexes {:name {"Lemon" 0, "Coconut" 1}} }, :purchases { :data [{:id 1 :user-id 1 :item "Coconut"} {:id 1 :user-id 2 :item "Lemon"}] :indexes {:id {1 0, 2 1}} } }
Storing data and indexes separately allows multiple indexes to be created without having to duplicate the actual data.
The indexes
map stores an association between the index key and its position in the data
vector for each index key. In the fruits table, "Lemon" is the first record of the data
vector, so the value in the :name
index is 0.
These steps will help you perform the activity:
- Create the helper functions. You can get the Hash Map by executing the
read-db
function with no arguments, and write to the database by executing thewrite-db
function with a Hash Map as an argument. - Start by creating the
create-table
function. This function should take one parameter: the table name. It should add a new key (the table name) at the root of our Hash Map database, and the value should be another Hash Map containing two entries: an empty vector at thedata
key and an empty Hash Map at theindexes
key. - Test that your
create-table
function works. - Create a
drop-table
function such that it takes one parameter as well - the table name. It should remove a table, including all its data and indexes from our database. - Test that your
drop-table
function works. - Create an
insert
function. This function should take three parameters:table
,record
, andid-key
. Therecord
parameter is a Hash Map, andid-key
corresponds to a key in the record map that will be used as a unique index. For now, we will not handle cases when a table does not exist or when an index key already exists in a given table.Try to use a
let
block to divide the work of theinsert
function in multiple steps:In a
let
statement, create a binding for the value of the database, retrieved withread-db
.In the same
let
statement, create a second binding for the new value of the database (after adding the record in thedata
vector).In the same
let
statement, retrieve the index at which the record was inserted by counting the number of elements in thedata
vector.In the body of the
let
statement, update the index atid-key
and write the resulting map to the database withwrite-db
. - To verify that your
insert
function works, try to use it multiple times to insert new records. - Create a
select-*
function that will return all the records of a table passed as a parameter. - Create a
select-*-where
function that takes three arguments:table-name
,field
, andfield-value
. The function should use the index map to retrieve the index of the record in the data vector and return the element. - Modify the
insert
function to reject any index duplicate. When a record withid-key
already exists in theindexes
map, we should not modify the database and print an error message to the user.On completing the activity, the output should be similar to this:
user=> (create-table :fruits) {:clients {:data [], :indexes {}}, :fruits {:data [], :indexes {}}} user=> (insert :fruits {:name "Pear" :stock 3} :name) Record with :name Pear already exists. Aborting user=> (select-* :fruits) [{:name "Pear", :stock 3} {:name "Apricot", :stock 30} {:name "Grapefruit", :stock 6}] user=> (select-*-where :fruits :name "Apricot") {:name "Apricot", :stock 30}
In this activity, we have used our new knowledge about reading and updating both simple and deeply nested data structures to implement a simple in-memory database. This was not an easy feat – well done!
Note
The solution for this activity can be found via this link.