-
Book Overview & Buying
-
Table Of Contents
Mastering Julia - Second Edition
By :
The simplest character-based variables consist of ASCII and Unicode characters.
A single character is delimited by single quotes, whereas a string uses double quotes or, in some cases, triple-double quotes (“””), which is discussed in this section.
A string can be viewed as a one-dimensional array of characters and can be indexed and manipulated in a similar fashion as an array of numeric values:
julia>s = "Hi there, Blue Eyes!" "Hi there, Blue Eyes!"julia>length(s) 20julia>s[11] 'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)julia>s[end] '!': ASCII/Unicode U+0021 (category Po: Punctuation, other)
Hint—Try evaluating the following list comprehension: [s[i] for i = length(s):-1:1].
Observe that Julia has a built-in Char type to represent a character.
A character occupies 32 bits, not 8, which is why it can hold a Unicode character. Have a look at the following example:
# All the following represent the ASCII character capital-Ajulia>c = 'A';julia>c = Char(65);julia>c = '\U0041' 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
Julia supports Unicode code, as we see here:
julia> c = '\Uc041'
'': Unicode U+c041 (category Lo: Letter, other) As such, we can output characters from a variety of different alphabets—for example, Chinese:
julia> '\U7537'
'男': Unicode U+7537 (category Lo: Letter, other) It is possible to specify a character code of '\Uffff' but char conversion does not check that every value is valid. However, Julia provides an isvalid() function that can be applied to characters:
julia> c = '\Udff3'; isvalid(c)
false Julia uses the special C-like syntax for certain ASCII control characters such as '\b', '\t', '\n', '\r', and 'f' for backspace, tab, newline, carriage-return, and form-feed, respectively.
The backslash acts as an escape character, so Int('\s') => 115, whereas Int('\t') => 9.
If more than one character is supplied between the single quotes, this raises an error:
julia> 'Hello'
ERROR: syntax: character literal contains multiple characters The type of string we are most familiar with comprises a list of ASCII characters that, as we have observed, are normally delimited with double quotes, as in the following example:
julia>s = "Hello there, Blue Eyes";julia>typeof(s) String
The following points are worth noting:
StringAbstractString abstract type, so when defining a function expecting a string argument, you should declare the type as AbstractString in order to accept any string typeA transcode() function can be used to convert to/from other Unicode encodings:
julia>s = "αβγ";julia>transcode(UInt16, s) 3-element Vector{UInt16}: 0x03b1 0x03b2 0x03b3
In Julia (as in Java), strings are immutable—that is, the value of a String object cannot be changed. To construct a different string value, you construct a new string from parts of other strings. Let’s look at this in more detail:
s as defined previously: s[14:17] # => "Blue".s[14:2:17] => "Bu" or reverse the slice to s[17:–1:14] => "eulB".s[14:] => "Blue Eyes".s[:14] is somewhat unexpected and gives the character 'B', not the string up to and including B. This is because ':' defines a “symbol”, and for a literal, :14 is equivalent to 14, so s[:14] is the same as s[14] and not s[1:14].s[end] is equal to the 's' character.Strings allow for special characters such as \n, \t, and so on.
If we wish to include the double quotes, we can escape them, but Julia provides a """ delimiter.
So, s = "This is the double quote \" character" and s = """This is the double quote " character""" are equivalent:
julia> s = "This is a double quote \" character."; println(s);
This is a double quote " character. Strings also provide the “$" convention when displaying the value of a variable:
julia> age = 21; s = "I've been $age for many years now!"
I've been 21 for many years now! Concatenation of strings can be done using the $ convention, but Julia also uses the '*' operator (rather than '+' or some other symbol):
julia>s = "Who are you?";julia>t = " said the Caterpillar."julia>s*t or "$s$t" # => "Who are you? said the Caterpillar."
Note
Here’s how a Unicode string can be formed by concatenating a series of characters:
julia> '\U7537'*'\U4EBA'
“男人’’
Regular expressions (regexes) came to prominence with their inclusion in Perl programming.
There is an old Perl programmer’s adage: “I had a problem and decided to solve it using regular expressions; now, I have two problems.”
Regexes are used for pattern matching; numerous books have been written on them, and support is available in a variety of programming languages post-Perl, notably Java and Python. Julia supports regexes via a special form of string prefixed with r.
Suppose we define an empat pattern as follows:
julia>empat = r"^\S+@\S+\.\S+$"julia>typeof(empat) Regex
The following example will give a clue to what the pattern is associated with:
julia>occursin(empat, "[email protected]") truejulia>occursin(empat, "Fredrick [email protected]") false
The pattern is for a valid (simple) email address, and in the second case, the space in Fredrick Flintstone is not valid (because it contains a space!), so the match fails.
Since we may wish to know not only whether a string matches a certain pattern but also how it is matched, Julia has a match() function:
julia> m = match(r"@bedrock","barney,[email protected]")
RegexMatch(„@bedrock") If this matches, the function returns a RegexMatch object; otherwise, it returns Nothing:
julia>m.match "@bedrock"julia>m.offset 14julia>m.captures 0-element Array{Union{Nothing,SubString{String}},1}
A detailed discussion of regexes is beyond the scope of this book.
The following link provides a good online source for all things regex, including an excellent cheat sheet via the Quick Reference page: https://www.rexegg.com.
In addition, there are a number of books on the subject, and a free PDF can be downloaded from the following link:
https://www.academia.edu/22080976/Regular_expressions_cookbook_2nd_edition.
Version numbers can be expressed with non-standard string literals as v“…”.
These literals create VersionNumber objects that follow the specifications of “semantic versioning” and therefore are composed of major, minor, and patch numeric values, followed by pre-release and build alpha-numeric annotations.
So, a full specification typically would be “v1.9.1-rc1”, where the major version is “1”, minor version “9”, patch level “1”, and release candidate “1”.
Currently, only the major version needs to be provided, and the others will assume default values; for example, “v1” is equivalent to “v1.0.0”.
(The release candidate has no default, so needs to be explicitly defined.)
Another special form is the b“…” byte array literal, which permits string notation to express arrays of UInt8 values.
These are the rules for byte array literals:
\x and octal escape sequences produce a byte corresponding to the escape valueConsider the following two examples:
julia>A = b"HEX:\xefcc" 7-element Base.CodeUnits{UInt8,String}: [0x48,0x45,0x58,0x3a,0xef,0x63,0x63]julia>B = b"\u2200 x \u2203 y" 11-element Base.CodeUnits{UInt8,String}: 0xe2 0x88 0x80 0x20 0x78 0x20 0xe2 0x88 0x83 0x20 0x79
Here, the first three elements represent the \u2200 code, then 0x20,0x78,0x20 correspond to <space>x<space>, followed by three more elements for the \u2203 code, and finally, 0x20, 0x79, which represents <space>y.
Change the font size
Change margin width
Change background colour