Simple Data Types
A data type designates what kind of value a piece of data holds; it is a fundamental way of classifying data. Different types allow different kinds of operations: we can concatenate strings, multiply numbers, and perform logic algebra operations with Booleans. Because Clojure has a strong emphasis on practicality, we don't explicitly assign types to values in Clojure, but those values still have a type.
Clojure is a hosted language and has three notable, major implementations in Java, JavaScript, and .NET. Being a hosted language is a useful trait that allows Clojure programs to run in different environments and take advantage of the ecosystem of its host. Regarding data types, it means that each implementation has different underlying data types, but don't worry as those are just implementation details. As a Clojure programmer, it does not make much difference, and if you know how to do something in Clojure, you likely know how to do it in, say, ClojureScript.
In this topic, we will go through Clojure's simple data types. Here is the list of the data types looked at in this section. Please note that the following types are all immutable:
- Strings
- Numbers
- Booleans
- Keywords
- Nil
Strings
Strings are sequences of characters representing text. We have been using and manipulating strings since the first exercise of Chapter 1, Hello REPL.
You can create a string by simply wrapping characters with double quotes ("
):
user=> "I am a String" "I am a String" user=> "I am immutable" "I am immutable"
String literals are only created with double quotes, and if you need to use double quotes in a string, you can escape them with the backslash character (\
):
user=> (println "\"The measure of intelligence is the ability to change\" - Albert Einstein") "The measure of intelligence is the ability to change" - Albert Einstein nil
Strings are not able to be changed; they are immutable. Any function that claims to transform a string yields a new value:
user=> (def silly-string "I am Immutable. I am a silly String") #'user/silly-string user=> (clojure.string/replace silly-string "silly" "clever") "I am Immutable. I am a clever String" user=> silly-string "I am Immutable. I am a silly String"
In the preceding example, calling clojure.string/replace
on silly-string
returned a new string with the word "silly" replaced with "clever." However, when evaluating silly-string
again, we can see that the value has not changed. The function returned a different value and did not change the original string.
Although a string is usually a single unit of data representing text, Strings are also collections of characters. In the JVM implementation of Clojure, strings are of the java.lang.String
Java type and they are implemented as collections of the java.lang.Character
Java type, such as the following command, which returns a character:
user=> (first "a collection of characters") \a user=> (type *1) java.lang.Character
first
returns the first element of a collection. Here, the literal notation of a character is \a
. The type
function returns a string representation of the data type for a given value. Remember that we can use *1
to retrieve the last returned value in the REPL, so *1
evaluates to \a
.
It is interesting to note that, in ClojureScript, strings are collections of one-character strings, because there is no character type in JavaScript. Here is a similar example in a ClojureScript REPL:
cljs.user=> (last "a collection of 1 character strings") "s" cljs.user=> (type *1) #object[String]
As with the Clojure REPL, type
returns a string representation of the data type. This time, in ClojureScript, the value returned by the last
function (which returns the last character of a string) is of the #object[String]
type, which means a JavaScript string.
You can find a few common functions for manipulating strings in the core namespace, such as str
, which we used in Chapter 1, Hello REPL!, to concatenate (combine multiple strings together into one string):
user=> (str "That's the way you " "con" "ca" "te" "nate") "That's the way you concatenate" user=> (str *1 " - " silly-string) "That's the way you concatenate - I am Immutable. I am a silly String"
Most functions for manipulating strings can be found in the clojure.string
namespace. Here is a list of them using the REPL dir
function:
user=> (dir clojure.string) blank? capitalize ends-with? escape includes? index-of join last-index-of lower-case re-quote-replacement replace replace-first reverse split split-lines starts-with? trim trim-newline triml trimr upper-case
As a reminder, this is how you can use a function from a specific namespace:
user=> (clojure.string/includes? "potatoes" "toes") true
We will not cover all the string functions, but feel free to try them out now. You can always look up the documentation of a string function from the preceding list with the doc
function.
Numbers
Clojure has good support for numbers and you will most likely not have to worry about the underlying types, as Clojure will handle pretty much anything. However, it is important to note that there are a few differences between Clojure and ClojureScript in that regard.
In Clojure, by default, natural numbers are implemented as the java.lang.Long
Java type unless the number is too big for Long
. In that case, it is typed clojure.lang.BigInt
:
user=> (type 1) java.lang.Long user=> (type 1000000000000000000) java.lang.Long user=> (type 10000000000000000000) clojure.lang.BigInt
Notice, in the preceding example, that the number was too big to fit in the java.lang.Long
Java type and, therefore, was implicitly typed clojure.lang.BigInt
.
Exact ratios are represented by Clojure as "Ratio" types, which have a literal representation. 5/4 is not an exact ratio, so the output is the ratio itself:
user=> 5/4 5/4
The result of dividing 3
by 4
can be represented by the ratio 3/4:
user=> (/ 3 4) 3/4 user=> (type 3/4) clojure.lang.Ratio
4/4
is equivalent to 1
and is evaluated as follows:
user=> 4/4 1
Decimal numbers are "double" precision floating-point numbers:
user=> 1.2 1.2
If we take our division of 3 by 4 again, but this time mix in a "Double" type, we will not get a ratio as a result:
user=> (/ 3 4.0) 0.75
This is because floating-point numbers are "contagious" in Clojure. Any operation involving floating-point numbers will result in a float or a double:
user=> (* 1.0 2) 2.0 user=> (type (* 1.0 2)) java.lang.Double
In ClojureScript, however, numbers are just "JavaScript numbers," which are all double-precision floating-point numbers. JavaScript does not define different types of numbers like Java and some other programming languages do (for example, long
, integer
, and short
):
cljs.user=> 1 1 cljs.user=> 1.2 1.2 cljs.user=> (/ 3 4) 0.75 cljs.user=> 3/4 0.75 cljs.user=> (* 1.0 2) 2
Notice that, this time, any operation returns a floating-point number. The fact that there is no decimal separation for 1
or 2
is just a formatting convenience.
We can make sure that all those numbers are JavaScript numbers (double-precision, floating-point) by using the type
function:
cljs.user=> (type 1) #object[Number] cljs.user=> (type 1.2) #object[Number] cljs.user=> (type 3/4) #object[Number]
If you need to do more than simple arithmetic, you can use the Java or JavaScript math
libraries, which are similar except for a few minor exceptions.
You will learn more about host platform interoperability in Chapter 9, Host Platform Interoperability with Java and JavaScript (how to interact with the host platform and its ecosystem), but the examples in the chapter will get you started with doing some more complicated math and with using the math
library:
Reading a value from a constant can be done like this:
user=> Math/PI 3.141592653589793
And calling a function, like the usual Clojure functions, can be done like this:
user=> (Math/random) 0.25127992428738254 user=> (Math/sqrt 9) 3.0 user=> (Math/round 0.7) 1
Exercise 2.01: The Obfuscation Machine
You have been contacted by a secret government agency to develop an algorithm that encodes text into a secret string that only the owner of the algorithm can decode. Apparently, they don't trust other security mechanisms such as SSL and will only communicate sensitive information with their own proprietary technology.
You need to develop an encode
function and a decode
function. The encode
function should replace letters with numbers that are not easily guessable. For that purpose, each letter will take the character's number value in the ASCII table, add another number to it (the number of words in the sentence to encode), and finally, compute the square value of that number. The decode
function should allow the user to revert to the original string. Someone highly ranked in the agency came up with that algorithm so they trust it to be very secure.
In this exercise, we will put into practice some of the things we've learned about strings and numbers by building an obfuscation machine:
- Start your REPL and look up the documentation of the
clojure.string/replace
function:user=> (doc clojure.string/replace) ------------------------- clojure.string/replace ([s match replacement]) Replaces all instance of match with replacement in s. match/replacement can be: string / string char / char pattern / (string or function of match). See also replace-first. The replacement is literal (i.e. none of its characters are treated specially) for all cases above except pattern / string. For pattern / string, $1, $2, etc. in the replacement string are substituted with the string that matched the corresponding parenthesized group in the pattern. If you wish your replacement string r to be used literally, use (re-quote-replacement r) as the replacement argument. See also documentation for java.util.regex.Matcher's appendReplacement method. Example: (clojure.string/replace "Almost Pig Latin" #"\b(\w)(\w+)\b" "$2$1ay") -> "lmostAay igPay atinLay"
Notice that the
replace
function can take a pattern and a function of the matching result as parameters. We don't know how to iterate over collections yet, but using thereplace
function with a pattern and a "replacement function" should do the job. - Try and use the
replace
function with the#"\w"
pattern (which means word character), replace it with the!
character, and observe the result:user=> (clojure.string/replace "Hello World" #"\w" "!")
The output is as follows:
"!!!!! !!!!!"
- Try and use the
replace
function with the same pattern, but this time passing an anonymous function that takes the matching letter as a parameter:user=> (clojure.string/replace "Hello World" #"\w" (fn [letter] (do (println letter) "!")))
The output is as follows:
H e l l o W o r l d "!!!!! !!!!!"
Observe that the function was called for each letter, printing the match out to the console and finally returning the string with the matches replaced by the
!
character. It looks like we should be able to write our encoding logic in that replacement function. - Let's now see how we can convert a character to a number. We can use the
int
function, which coerces its parameter to an integer. It can be used like this:user=> (int \a) 97
- It seems that the "replacement function" will take a string as a parameter, so let's convert our string to a character. Use the
char-array
function combined withfirst
to convert our string to a character as follows:user=> (first (char-array "a")) \a
- Now, if we combine previous steps together and also compute the square value of the character's number, we should be approaching our obfuscation goal. Combine the code written previously to obtain a character code from a string and get its square value using the
Math/pow
function as follows:user=> (Math/pow (int (first (char-array "a"))) 2) 9409.0
- Let's now convert this result to the string that will be returned from our
replace
function. First, let's remove the decimal part by coercing the result to anint
, and put things together in anencode-letter
function, as follows:user=> (defn encode-letter [s] (let [code (Math/pow (int (first (char-array s))) 2)] (str (int code)))) #'user/encode-letter user=> (encode-letter "a") "9409"
Great! It seems to work. Let's now test our function as part of the
replace
function. - Create the
encode
function, which usesclojure.string/replace
as well as ourencode-letter
function:user=> (defn encode [s] (clojure.string/replace s #"\w" encode-letter)) #'user/encode user=> (encode "Hello World") "518410201116641166412321 756912321129961166410000"
It seems to work but the resulting string will be hard to decode without being able to identify each letter individually.
There is another thing that we did not take into account: the
encode
function should take an arbitrary number to add to the code before calculating the square value. - First, add a separator as part of our
encode-letter
function, for example, the#
character, so that we can identify each letter individually. Second, add an extra parameter toencode-letter
, which needs to be added before calculating the square value:user=> (defn encode-letter [s x] (let [code (Math/pow (+ x (int (first (char-array s)))) 2)] (str "#" (int code)))) #'user/encode-letter
- Now, test the
encode
function another time:user=> (encode "Hello World") Execution error (ArityException) at user/encode (REPL:3). Wrong number of args (1) passed to: user/encode-letter
Our
encode
function is now failing because it is expecting an extra argument. - Modify the
encode
function to calculate the number of words in the text to obfuscate, and pass it to theencode-letter
function. You can use theclojure.string/split
function with a whitespace, as follows, to count the number of words:user=> (defn encode [s] (let [number-of-words (count (clojure.string/split s #" "))] (clojure.string/replace s #"\w" (fn [s] (encode-letter s number-of-words))))) #'user/encode
- Try your newly created function with a few examples and make sure it obfuscates strings properly:
user=> (encode "Super secret") "#7225#14161#12996#10609#13456 #13689#10609#10201#13456#10609#13924" user=> (encode "Super secret message") "#7396#14400#13225#10816#13689 #13924#10816#10404#13689#10816#14161 #12544#10816#13924#13924#10000#11236#10816"
What a beautiful, unintelligible, obfuscated string – well done! Notice how the numbers for the same letters are different depending on the number of words in the phrase to encode. It seems to work according to the specification!
We can now start working on the
decode
function, for which we will need to use the following functions:Math/sqrt
to obtain the square root value of a number.char
to retrieve a letter from a character code (a number).subs
as in substring, to get a sub-portion of a string (and get rid of our#
separator).Integer/parseInt
to convert a string to an integer. - Write the
decode
function using a combination of the preceding functions, to decode an obfuscated character:user=> (defn decode-letter [x y] (let [number (Integer/parseInt (subs x 1)) letter (char (- (Math/sqrt number) y))] (str letter))) #'user/decode-letter
- Finally, write the
decode
function, which is similar to theencode
function except that it should usedecode-letter
instead ofencode-letter
:user=> (defn decode [s] (let [number-of-words (count (clojure.string/split s #" "))] (clojure.string/replace s #"\#\d+" (fn [s] (decode-letter s number-of-words))))) #'user/decode
- Test your functions and make sure that they both work:
user=> (encode "If you want to keep a secret, you must also hide it from yourself.")
The output is as follows:
"#7569#13456 #18225#15625#17161 #17689#12321#15376#16900 #16900#15625 #14641#13225#13225#15876 #12321 #16641#13225#12769#16384#13225#16900, #18225#15625#17161 #15129#17161#16641#16900 #12321#14884#16641#15625 #13924#14161#12996#13225 #14161#16900 #13456#16384#15625#15129 #18225#15625#17161#16384#16641#13225#14884#13456." user=> (decode *1) "If you want to keep a secret, you must also hide it from yourself."
In this exercise, we've put into practice working with numbers and strings by creating an encoding system. We can now move on to learning other data types, starting with Booleans.
Booleans
Booleans are implemented as Java's java.lang.Boolean
in Clojure or JavaScript's "Boolean" in ClojureScript. Their value can either be true
or false
, and their literal notations are simply the lowercase true
and false
.
Symbols
Symbols are identifiers referring to something else. We have already been using symbols when creating bindings or calling functions. For example, when using def
, the first argument is a symbol that will refer to a value, and when calling a function such as +
, +
is a symbol referring to the function implementing the addition. Consider the following examples:
user=> (def foo "bar") #'user/foo user=> foo "bar" user=> (defn add-2 [x] (+ x 2)) #'user/add-2 user=> add-2 #object[user$add_2 0x4e858e0a "user$add_2@4e858e0a"]
Here, we have created the user/foo
symbol, which refers to the "bar"
string, and the add-2
symbol, which refers to the function that adds 2 to its parameter. We have created those symbols in the user namespace, hence the notation with /
: user/foo
.
If we try to evaluate a symbol that has not been defined, we'll get an error:
user=> marmalade Syntax error compiling at (REPL:0:0). Unable to resolve symbol: marmalade in this context
In the REPL Basics topic of Chapter 1, Hello REPL!, we were able to use the following functions because they are bound to a specific symbol:
user=> str #object[clojure.core$str 0x7bb6ab3a "clojure.core$str@7bb6ab3a"] user=> + #object[clojure.core$_PLUS_ 0x1c3146bc "clojure.core$_PLUS_@1c3146bc"] user=> clojure.string/replace #object[clojure.string$replace 0xf478a81 "clojure.string$replace@f478a81"]
Those gibberish-like values are string representations of the functions, because we are asking for the values bound to the symbols rather than invoking the functions (wrapping them with parentheses).
Keywords
You can think of a keyword as some kind of a special constant string. Keywords are a nice addition to Clojure because they are lightweight and convenient to use and create. You just need to use the colon character, :
, at the beginning of a word to create a keyword:
user=> :foo :foo user=> :another_keyword :another_keyword
They don't refer to anything else like symbols do; as you can see in the preceding example, when evaluated, they just return themselves. Keywords are typically used as keys in a key-value associative map, as we will see in the next topic about collections.
In this section, we went through simple data types such as string, numbers, Boolean, symbols, and keywords. We highlighted how their underlying implementation depends on the host platform because Clojure is a hosted language. In the next section, we will see how those values can aggregate to collections.