Book Image

Polished Ruby Programming

By : Jeremy Evans
Book Image

Polished Ruby Programming

By: Jeremy Evans

Overview of this book

Anyone striving to become an expert Ruby programmer needs to be able to write maintainable applications. Polished Ruby Programming will help you get better at designing scalable and robust Ruby programs, so that no matter how big the codebase grows, maintaining it will be a breeze. This book takes you on a journey through implementation approaches for many common programming situations, the trade-offs inherent in each approach, and why you may choose to use different approaches in different situations. You'll start by refreshing Ruby fundamentals, such as correctly using core classes, class and method design, variable usage, error handling, and code formatting. Then you'll move on to higher-level programming principles, such as library design, use of metaprogramming and domain-specific languages, and refactoring. Finally, you'll learn principles specific to web application development, such as how to choose a database and web framework, and how to use advanced security features. By the end of this Ruby programming book, you’ll be a well rounded web developer with a deep understanding of Ruby. While most code examples and principles discussed in the book apply to all Ruby versions, some examples and principles are specific to Ruby 3.0, the latest release at the time of publication.
Table of Contents (23 chapters)
1
Section 1: Fundamental Ruby Programming Principles
8
Section 2: Ruby Library Programming Principles
17
Section 3: Ruby Web Programming Principles

Understanding how symbols differ from strings

One of the most useful but misunderstood aspects of Ruby is the difference between symbols and strings. One reason for this is there are certain methods of Ruby that deal with symbols, but will still accept strings, or perform string-like operations on a symbol. Another reason is due to the popularity of Rails and its pervasive use of ActiveSupport::HashWithIndifferentAccess, which allows you to use either a string or a symbol for accessing the same data. However, symbols and strings are very different internally, and serve completely different purposes. However, Ruby is focused on programmer happiness and productivity, so it will often automatically convert a string to a symbol if it needs a symbol, or a symbol to a string if it needs a string.

A string in Ruby is a series of characters or bytes, useful for storing text or binary data. Unless the string is frozen, you append to it, modify existing characters in it, or replace it with a different string.

A symbol in Ruby is a number with an attached identifier that is a series of characters or bytes. Symbols in Ruby are an object wrapper for an internal type that Ruby calls ID, which is an integer type. When you use a symbol in Ruby code, Ruby looks up the number associated with that identifier. The reason for having an ID type internally is that it is much faster for computers to deal with integers instead of a series of characters or bytes. Ruby uses ID values to reference local variables, instance variables, class variables, constants, and method names.

Say you run Ruby code as follows:

foo.add(bar)

Ruby will parse this code, and for foo, add, and bar, it will look up whether it already has an ID associated with the identifier. If it already has an ID, it will use it; otherwise, it will create a new ID value and associate it with the identifier. This happens during parsing and the ID values are hardcoded into the VM instructions.

Say you run Ruby code as follows:

method = :add
foo.send(method, bar)

Ruby will parse this code, and for method, add, foo, send, and bar, Ruby will also look up whether it already has an ID associated with the identifier, or create a new ID value to associate with the identifier if it does not exist. This approach is slightly slower as Ruby will create a local variable and there is additional indirection as send has to look up the method to call dynamically. However, there are no calls at runtime to look up an ID value.

Say you run Ruby code as follows:

method = "add"
foo.send(method, bar)

Ruby will parse this code, and for method, foo, send, and bar, Ruby will also look up whether it already has an ID associated with the identifier, also creating the ID if it doesn't exist. However, during parsing, Ruby does not create an ID value for add because it is a string and not a symbol. However, when send is called at runtime, method is a string value, and send needs a symbol. So, Ruby will dynamically look up and see whether there is an ID associated with the add identifier, raising a NoMethodError if it does not exist. This ID lookup will happen every time the send method is called, making this code even slower.

So, while it looks like symbols and strings are as interchangable as the method argument to send, this is only because Ruby tries to be friendly to the programmer and accept either. The send method needs to work with an ID, and it is better for performance to use a symbol, which is Ruby's representation of an ID, as opposed to a string, which Ruby must perform substantial work on to convert to an ID.

This not only affects Kernel#send but also affects most similar methods where identifiers are passed dynamically, such as Module#define_method, Kernel#instance_variable_get, and Module#const_get. The general principle when using these methods in Ruby code is always to pass symbols to them, since it results in better performance.

The previous examples show that when Ruby needs a symbol, it will often accept a string and convert it for the programmer's convenience. This allows strings to be treated as symbols in certain cases. There are opposite cases, where Ruby allows symbols to be treated as strings for the programmer's convenience.

For example, while symbols represent integers attached to a series of characters or bytes, Ruby allows you to perform operations on symbols such as <, >, and <=>, as if they were strings, where the result does not depend on the symbol's integer value, but on the string value of the name attached to the symbol. Again, this is Ruby doing so for the programmer's convenience. For example, consider the following line of code:

object.methods.sort

This results in a list sorted by the name of the method, since that is the most useful for the programmer. In this case, Ruby needs to operate on the string value of the symbol, which has similar performance issues as when Ruby needs to convert a string to a symbol internally.

There are many other methods on Symbol that operate on the internal string associated with the symbol. Some methods, such as downcase, upcase, and capitalize, return a symbol by internally operating on the string associated with the symbol, and then converting the resulting value back to a symbol. For example, symbol.downcase basically does symbol.to_s.downcase.to_sym. Other methods, such as [], size, and match, operate on the string associated with the symbol, such as symbol.size being shorthand for symbol.to_s.size.

In all of these cases, it is possible to determine what Ruby natively wants. If Ruby needs an internal identifier, it will natively want a symbol, and only accept a string by converting it. If Ruby needs to operate on text, it will natively want a string, and only accept a symbol by converting it.

So, how does the difference between a symbol and string affect your code? The general principle is to be like Ruby, and use symbols when you need an identifier in your code, and strings when you need text or data. For example, if you need to accept a configuration value that can only be one of three options, it's probably best to use a symbol:

def switch(value)
  case value
  when :foo
    # foo
  when :bar
    # bar
  when :baz
    # baz
  end
end

However, if you are dealing with text or data, you should accept a string and not a symbol:

def append2(value)
  value.gsub(/foo/, "bar")
end

You should consider whether you want to be as flexible as many Ruby core methods, and automatically convert a string to a symbol or vice versa. If you are internally treating symbols and strings differently, you should definitely not perform automatic conversion. However, if you are only dealing with one of the types, then you have to decide how to handle it. Automatically converting the type is worse for performance, and results in less flexible internals, since you need to keep supporting both types for backward compatibility. Not automatically converting the type is better for performance, and results in more flexible internals, since you are not obligated to support both types. However, it means that users of your code will probably get errors if they pass in a type that is not expected. Therefore, it is important to understand the trade-off inherent in the decision of whether to convert both types. If you aren't sure which trade-off is better, start by not automatically converting, since you can always add automatic conversion later if needed.

In this section, you learned the important difference between symbols and strings, and when it is best to use each. In the next section, you'll learn how best to use Ruby's core collection classes.