Book Image

Redis Stack for Application Modernization

By : Luigi Fugaro, Mirko Ortensi
1 (1)
Book Image

Redis Stack for Application Modernization

1 (1)
By: Luigi Fugaro, Mirko Ortensi

Overview of this book

In modern applications, efficiency in both operational and analytical aspects is paramount, demanding predictable performance across varied workloads. This book introduces you to Redis Stack, an extension of Redis and guides you through its broad data modeling capabilities. With practical examples of real-time queries and searches, you’ll explore Redis Stack’s new approach to providing a rich data modeling experience all within the same database server. You’ll learn how to model and search your data in the JSON and hash data types and work with features such as vector similarity search, which adds semantic search capabilities to your applications to search for similar texts, images, or audio files. The book also shows you how to use the probabilistic Bloom filters to efficiently resolve recurrent big data problems. As you uncover the strengths of Redis Stack as a data platform, you’ll explore use cases for managing database events and leveraging introduce stream processing features. Finally, you’ll see how Redis Stack seamlessly integrates into microservices architectures, completing the picture. By the end of this book, you’ll be equipped with best practices for administering and managing the server, ensuring scalability, high availability, data integrity, stored functions, and more.
Table of Contents (18 chapters)
1
Part 1: Introduction to Redis Stack
6
Part 2: Data Modeling
11
Part 3: From Development to Production

From key-value to multi-model real-time databases

The core data structures that are available out of the box in the Redis server solve a variety of problems when it comes to mapping entities and relationships. To start with concrete examples of modeling using Redis, the usual option to store an object is the Hash data structure, while collections can be stored using Sets, Sorted Sets, or Lists (among other options because a collection can be modeled in several other ways). In this section, we will introduce the multi-model features of Redis Stack using a comprehensive approach, which may be useful for those who are used to storing data using the relational paradigm, which implies organizing the data in rows and columns of a table.

Consider the requirement to model a list of cities. Using the relational data model, we can define a table using the SQL data definition language (DDL) instruction CREATE TABLE as follows:

CREATE TABLE `city` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`)
)

This table definition defines attributes for the city entity and specifies a primary key on an integer identifier (a surrogate key, in this case, provided the uniqueness of the attributes is not guaranteed for the city entity). The DDL command also defines an index on the CountryCode attribute. Data encoding, collation, and the specific technology adopted as the storage engine are not relevant in this context. We are focused on understanding the model and the ability that we have to query it.

Primary key lookup

Primary key lookup is the most efficient way to access data in a relational table. Filtering the table on the primary key attribute is as easy as executing the SQL SELECT statement:

SELECT * FROM city WHERE ID=653;
+-----+--------+-------------+----------+------------+
| ID  | Name   | CountryCode | District | Population |
+-----+--------+-------------+----------+------------+
| 653 | Madrid | ESP         | Madrid   |    2879052 |
+-----+--------+-------------+----------+------------+
1 row in set (0.00 sec)

Modeling a city using one of the Redis core data structures leads to mapping the data in the SQL table to Hashes, so we can store the attributes as field-value pairs, with the key name including the primary key:

127.0.0.1:6379> HSET city:653 Name "Madrid" CountryCode "ESP" District "Madrid" Population 2879052

The HGETALL command can be used to retrieve the entire hash with minimal overhead (HGETALL has direct access to the value in the Redis keyspace):

HGETALL city:653
1) "Name"
2) "Madrid"
3) "CountryCode"
4) "ESP"
5) "District"
6) "Madrid"
7) "Population"
8) "2879052"

In addition, we can limit the bandwidth usage caused by the entire row transfer to the client and select only specific attributes. The SQL syntax is as follows:

SELECT Name, Population FROM city WHERE ID=653;
+--------+------------+
| Name   | Population |
+--------+------------+
| Madrid |    2879052 |
+--------+------------+
1 row in set (0.00 sec)

In this analogy between the relational model and Redis, the command is HGET (or HMGET for multiple values):

127.0.0.1:6379> HMGET city:653 Name Population
1) "Madrid"
2) "2879052"

While we need to extract data based on the primary key identifier, the solution is at hand in both the relational database and in Redis. Things get more complicated if we want to perform lookup and search queries on the dataset. In the next examples, we’ll see how the complexity and performance of such operations may vary substantially.

Secondary key lookup

Primary key lookups are efficient: after all, the primary key is an index, and it guarantees direct access to the table row. But what if we want to search for cities by filtering on an attribute? Let’s try an indexed search against our relational database over the CountryCode column, which has a secondary index:

mysql> SELECT Name FROM city WHERE CountryCode = "ESP";
+--------------------------------+
| Name                           |
+--------------------------------+
| Madrid                         |
| Barcelona                      |
| [...]                          |
+--------------------------------+
59 rows in set (0.02 sec)

This is an efficient search because the table defines an index on the CountryCode column. To continue the comparison of the relational database versus Redis, we will need to execute the same query against the stored Hashes. For this demonstration, we will assume that we have migrated the city table to Hashes in the Redis server. By design, Redis has no secondary indexing feature for any of the core data structures, which means that we should scan all the Hashes prefixed by the “city:” namespace, then read the city name from every Hash and check whether it matches our search term. The following example performs a non-blocking scan of the keyspace, filtering on the key name (“city:*”) in batches of configurable size (three, in the example):

127.0.0.1:6379> SCAN 0 MATCH city:* COUNT 3
1) "512"
2) 1) "city:4019"
   2) "city:9"
   3) "city:103"

The client should now extract the CountryCode value from every city, compare it to the search term, and repeat until the scan is concluded. This is obviously a time-consuming and expensive approach. There are ways to improve the efficiency of such batched operations. We will explore three standard options and then show how to resolve the problem using the Redis Stack capabilities:

  • Pipelining
  • Using functions
  • Using indexes
  • Redis Stack capabilities

We will look at these in detail next.

Pipelining

The first approach to reducing the overhead of the search operation is to use pipelining, which is supported by all major client libraries. Pipelining collects a batch of commands, delivers them to the server, and collects the outputs from the server immediately before returning the result to the client. This option dramatically reduces the latency of the overall operation, as it saves on the roundtrip time to the server (an analogy that works is going to the supermarket once to purchase 30 items rather than going 30 times and purchasing one item on every visit). The pros and cons of pipelining are as follows:

  • Pros: Saves on roundtrip time and does not block the server, as the server executes a batch of commands and returns the results to the client. Therefore, it increases overall system throughput. Pipelining is especially useful when batching operations.
  • Cons: The complexity of the operation is proportional to the number and complexity of the operations in the pipeline that are executed by the server. This may increase the memory usage on the server, as it keeps the intermediate results in memory until all commands in the pipeline are processed. The client manages multiple responses, which adds complexity to its business logic, especially when it has to deal with errors of some operations in the pipeline.

Using functions

Lua scripting and functions (functions were introduced in Redis 7.0 and represent an evolution of Lua scripting for remote server execution) help to offload the client and remove network latency. The search is local to the server and close to the data (equivalent to the concept of stored procedures). The following function is an example of local search:

#!lua name=mylib
local function city_by_cc(keys, args)
   local match, cursor = {}, "0";
   repeat
      local ret = redis.call("SCAN", cursor, "MATCH", "city:*", "COUNT", 100);
      local cities = ret[2];
        for i = 1, #cities do
         local keyname = cities[i];
         local ccode = redis.call('HMGET',keyname,'Name','CountryCode')
         if ccode[2] == args[1] then
            match[#match + 1] = ccode[1];
         end;
        end;
        cursor = ret[1];
      until cursor == "0";
   return match;
end
redis.register_function('city_by_cc', city_by_cc)

In this function, we do the following:

  1. We perform a scan of the entire keyspace, filtering by the “city:*” prefix, which means that we will iterate through all the keys in the Redis server database.
  2. For every key returned by the SCAN command, we retrieve the name and CountryCode of the city using the HMGET command.
  3. If CountryCode matches our search filter, we add the city to an output array.
  4. When the scan is completed, we return the array to the client.

Type the code into the mylib.lua file and import the library as follows:

cat mylib.lua | redis-cli -x FUNCTION LOAD

The function can be invoked using the following command:

127.0.0.1:6379> FCALL city_by_cc 0 "ESP"
 1) "A Coru\xf1a (La Coru\xf1a)"
 2) "Almer\xeda"
[...]
59) "Barakaldo"

The pros and cons of using functions are as follows:

  • Pros: The operation is executed on the server, and the client does not experiment with any overhead.
  • Cons: The complexity of the operation is linear, and the function (like any other Lua script or function) blocks the server. Any other concurrent operation must wait until the execution of the function is completed. Long scans make the server appear stuck to other clients.

Using indexes

Data scans, wherever they are executed (client or server side), are slow and ineffective in satisfying real-time requirements. This is especially true when the keyspace stores millions of keys or more. An alternative approach for search operations using the Redis core data structures is to create a secondary index. There are many options to do this using Redis collections. As an example, we can create an index of Spanish cities using a Set as follows:

SADD city:esp "Sevilla" "Madrid" "Barcelona" "Valencia" "Bilbao" "Las Palmas de Gran Canaria"

This data structure has interesting properties for our needs. We can retrieve all the Spanish cities in a single command:

127.0.0.1:6379> SMEMBERS city:esp
1) "Madrid"
2) "Sevilla"
3) "Valencia"
4) "Barcelona"
5) "Bilbao"
6) "Las Palmas de Gran Canaria"

Or we can check whether a specific city is in Spain using SISMEMBER, a constant time-complexity command:

127.0.0.1:6379> SISMEMBER city:esp "Madrid"
(integer) 1

And we can even search the index for cities having a name that matches a pattern:

127.0.0.1:6379> SSCAN city:esp 0 MATCH B*
1) "0"
2) 1) "Barcelona"
   2) "Bilbao"

We can refine our search requirements and design an index that considers the population. In such a case we could use a Sorted Set and Set the population as the score:

127.0.0.1:6379> ZADD city:esp 2879052 "Madrid" 701927 "Sevilla" 1503451 "Barcelona" 739412 "Valencia" 357589 "Bilbao" 354757 "Las Palmas de Gran Canaria"
(integer) 6

The main feature of the Sorted Set data structure is that its members are stored in an ordered tree-like structure (Redis uses a skiplist data structure), and with that, it is possible to execute low-complexity range searches. As an example, let’s retrieve Spanish cities with more than 2 million inhabitants:

127.0.0.1:6379> ZRANGE city:esp 2000000 +inf BYSCORE
1) "Madrid"

We can also check whether a city belongs to the index of Spanish cities:

127.0.0.1:6379> ZRANK city:esp Madrid
(integer) 5

In the former example, the ZRANK command informs us that the city Madrid belongs to the index and is fifth highest in the ranking. This solution resolves the overhead caused by having to scan the entire keyspace looking for matches.

The drawback of such a manual approach to indexing the data is that indexes need to reflect the data at any time. Considering scenarios where we want to add or remove a city from our database, we need to perform the two operations of removing the city Hash and updating the index, atomically. We can use a Redis transaction to perform atomic changes on both the data and the index:

127.0.0.1:6379> MULTI
OK
127.0.0.1:6379(TX)> DEL city:653
QUEUED
127.0.0.1:6379(TX)> ZREM city:esp "Madrid"
QUEUED
127.0.0.1:6379(TX)> EXEC
1) (integer) 1
2) (integer) 1

Custom secondary indexes come at a price, though, because complex searches become hard to manage using multiple data structures. Indexes must be maintained, and the complexity of such solutions may get out of hand, putting the consistency of search operations at risk. The pros and cons of using indexing are as follows:

  • Pros: Simple and fast search operations are possible using Redis core data structures to create a secondary index
  • Cons: The secondary index needs to be maintained, and search operations on multiple fields (what is called a composite index in relational databases) are not immediate and need thoughtful planning, implementation, and maintenance

Next, we will examine the capabilities of Redis Stack.

Redis Stack capabilities

Caching is one of the frequent use cases for which Redis shines as the best-in-class storage solution. This is because it stores data in memory, and offers real-time performance. It is also lightweight, as data structures are optimized to consume little memory. Redis does not need any complex configuration or maintenance and it is open source, so there is no reason not to give it a try. As a real-time data storage, it seems plausible that complex search operations may not be the primary use case users are interested in when using Redis. After all, fast retrieval of data by key is what made Redis so versatile as a cache or as a session store.

However, if in addition to the ability to use core data structures to store the data, we ensure that fast searches can be performed (besides primary key lookup), it is possible to think beyond the basic caching use case and start looking at Redis as a full-fledged database, capable of high-speed searches.

So far, we have presented simple and common search problems and both solutions using the traditional SQL approach and possible data modeling strategies using Redis core data structures. In the following sections, we will show how Redis Stack resolves query and search use cases and extends the core features of Redis with an integrated modeling and developing experience. We will introduce the following capabilities:

  • Querying, indexing, and searching documents
  • Time series data modeling
  • Probabilistic data structures
  • Programmability

Let’s discuss each of these capabilities in detail.

Querying, indexing, and searching documents

Redis Stack complements Redis with the ability to create secondary indexes on Hashes or JSON documents, the two document types supported by Redis Stack. The search examples seen so far can be resolved with the indexing features. To perform an indexed search, we create an index against the hashes modeling the cities using the following syntax:

FT.CREATE city_idx
ON HASH
PREFIX 1 city:
SCHEMA Name AS name TEXT
CountryCode AS countrycode TAG SORTABLE
Population AS population NUMERIC SORTABLE

The FT.CREATE command instructs the server to perform the following operations:

  1. Create an index for the desired values of the Hash document.
  2. Scan the keyspace and retrieve the documents prefixed by the “hash:” string.
  3. Create the index corresponding to the desired data structure and, as specified by the FT.CREATE command, the Hash in this case. The indexes defined in this example are of the following types:
    • TEXT, which enables full-text search on the Name field
    • TAG SORTABLE, which enables an exact-match search against the CountryCode field and enables high-performance sorting by the value of the attribute
    • NUMERIC SORTABLE, which enables range queries against the Population field and enables high-performance sorting by the value of the attribute

As soon as the indexing operation against the relevant data – all the keys prefixed by “hash:”– is completed, we can execute the queries and searches seen so far, and more. The syntax in the following example executes a search of all the cities with the value “ESP” in the TAG field type and returns only the name of the cities, sorted in lexicographical order. Finally, the first three results are returned using the LIMIT option. Note that this query is executed against the new city_idx index, and not directly against the data:

127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' RETURN 1 name SORTBY name LIMIT 0 3
1) (integer) 59
2) "city:670"
3) 1) "name"
   2) "A Coru\xc3\xb1a (La Coru\xc3\xb1a)"
4) "city:690"
5) 1) "name"
   2) "Albacete"
6) "city:687"
7) 1) "name"
   2) "Alcal\xc3\xa1 de Henares"

It is possible to combine several textual queries/filters in the same index. Using exact-match and full-text search, we can verify whether Madrid is a Spanish city:

127.0.0.1:6379> FT.SEARCH city_idx '@name:Madrid @countrycode:{ESP}' RETURN 1 name
1) (integer) 1
2) "city:653"
3) 1) "name"
   2) "Madrid"

In a previous example, the range search was executed using the ZRANGE data structure. Using the indexing capability of Redis Stack, we can execute range searches using the NUMERIC field type. So, if we want to retrieve the Spanish cities with more than 2 million inhabitants, we will write the following search query:

127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name
1) (integer) 1
2) "city:653"
3) 1) "name"
   2) "Madrid"

Redis Stack offers flexibility and concise syntax to combine several field types, of which we have seen only a limited but representative number of examples. Once the index is created, the user can go ahead and use it, and add new documents or update existing ones. The database maintains the indexes updated synchronously as soon as documents are created or changed.

Besides full-text, exact-match, and range searches, we can also perform data aggregation (as we would in a relational database using the GROUP BY statement). If we would like to retrieve the three most populated countries, sorted in descending order, we would solve the problem in SQL as follows:

SELECT CountryCode,
SUM(Population) AS sum
FROM city
GROUP BY CountryCode
ORDER BY sum DESC
LIMIT 3;
+-------------+-----------+
| CountryCode | sum       |
+-------------+-----------+
| CHN         | 175953614 |
| IND         | 123298526 |
| BRA         |  85876862 |
+-------------+-----------+
3 rows in set (0.01 sec)

We can perform complex aggregations with the FT.AGGREGATE command. Using the following command, we can perform a real-time search and aggregation to compute the total population of the top three countries by summing up the inhabitants of the cities per country:

127.0.0.1:6379> FT.AGGREGATE city_idx * GROUPBY 1 @countrycode REDUCE SUM 1 @population AS sum SORTBY 2 @sum DESC LIMIT 0 3
1) (integer) 232
2) 1) "countrycode"
   2) "chn"
   3) "sum"
   4) "175953614"
3) 1) "countrycode"
   2) "ind"
   3) "sum"
   4) "123298526"
4) 1) "countrycode"
   2) "bra"
   3) "sum"
   4) "85876862"

To summarize this brief introduction where we addressed the search and aggregation capabilities, it is worth mentioning that there are multiple types of searches, such as phonetic matching, auto-completion suggestions, geo searches, or a spellchecker to help design great applications. We will cover them in depth in Chapter 5, Redis Stack as a Document Store, where we showcase Redis Stack as a document store.

Besides modeling objects as Hash, it is possible to store, update, and retrieve JSON documents. The JSON format needs no introduction, as it permeates data pipelines including heterogeneous subsystems, protocols, databases, and so on. Redis Stack delivers this capability out of the box and manages JSON documents in a similar way to Hashes, which means that it is possible to store, index, and search JSON objects and work with them using JSONPath syntax:

  1. To illustrate the syntax to store, search, and retrieve JSON data along the lines of the previous examples, let’s store city objects formatted as JSON:
    JSON.SET city:653 $ '{"Name":"Madrid", "CountryCode":"ESP", "District":"Madrid", "Population":2879052}'
    JSON.SET city:5 $ '{"Name":"Amsterdam", "CountryCode":"NLD", "District":"Noord-Holland", "Population":731200}'
    JSON.SET city:1451 $ '{"Name":"Tel Aviv-Jaffa", "CountryCode":"ISR", "District":"Tel Aviv", "Population":348100}'
  2. We don’t need anything else to start working with the JSON documents stored in Redis Stack. We can then perform basic retrieval operations on entire documents:
    127.0.0.1:6379> JSON.GET city:653
    "{\"Name\":\"Madrid\",\"CountryCode\":\"ESP\",\"District\":\"Madrid\",\"Population\":2879052}"
  3. We can also retrieve the desired property (or multiple properties at once) stored on a certain path, with fast access guaranteed, because the document is stored in a tree structure:
    127.0.0.1:6379> JSON.GET city:653 $.Name
    "[\"Madrid\"]"
    127.0.0.1:6379> JSON.GET city:653 $.Name $.CountryCode
    "{\"$.Name\":[\"Madrid\"],\"$.CountryCode\":[\"ESP\"]}"
  4. As we have seen for Hash documents, we can index JSON documents using a similar syntax and perform search operations. The following command creates an index for all the JSON documents with the city: prefix in the database:
    FT.CREATE city_idx ON JSON PREFIX 1 city: SCHEMA $.Name AS name TEXT $.CountryCode AS countrycode TAG SORTABLE $.Population AS population NUMERIC SORTABLE
  5. And using the FT.SEARCH command with an identical syntax as seen for the Hash documents, we can perform search operations:
    127.0.0.1:6379> FT.SEARCH city_idx '@countrycode:{ESP}' FILTER population 2000000 +inf RETURN 1 name
    1) (integer) 1
    2) "city:653"
    3) 1) "name"
       2) "Madrid"

Unlike Hash documents, the JSON supports nested levels (up to 128) and can store properties, objects, arrays, and geographical locations at any level in a tree-like structure, so the JSON format opens up a variety of use cases using a compact and flexible data structure.

Time series data modeling

Time series databases do not need any long introduction: they are data structures that can store data points happening at a certain time, indicated by a Unix timestamp expressed in milliseconds, with an associated numeric data value, typically with double precision. This data structure applies to many use cases, such as monitoring entities over time or tracking user activities for a determined service. Redis Stack has an integrated time series database that offers many useful features to manage the data points, for querying and searching, and provides convenient formatting commands for data processing and visualization. Beginning with time series modeling is straightforward:

  1. We can create a time series from the command-line interface (or from any of the client libraries that support time series):
    TS.CREATE "app:monitor:temp"
  2. Storing samples into the time series can be done with the TS.ADD command. If we would like to store the temperature measured by the sensor of a meteorological station captured every few seconds, the commands would be as follows:
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632813307
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632818179
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632824174
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20.1"
    (integer) 1675632829519
    127.0.0.1:6379> "TS.ADD" "app:monitor:temp" "*" "20"
    (integer) 1675632835052
  3. We are instructing the database to insert the sample at the current time, so we specify the * argument. We can finally retrieve the samples stored in the time series for the desired interval:
    127.0.0.1:6379> "TS.RANGE" "app:monitor:temp" "1675632818179" "1675632829519"
    1) 1) (integer) 1675632818179
       2) 20
    2) 1) (integer) 1675632824174
       2) 20
    3) 1) (integer) 1675632829519
       2) 20.1

We have just scratched the surface of using time series with Redis Stack, because data may be aggregated, down-sampled, and indexed to address many different uses.

Probabilistic data structures

Deterministic data structures – all those structures that store and return the same data that was stored (such as Strings, Sets, Hashes, and the rest of Redis structures) – are a good solution for standard amounts of data, but they may become inadequate due to the constantly growing volumes of data that systems must handle. Redis offers several options to store and present data to extract different types of insights. Strings are an example because they can be encoded as integers and used as counters:

127.0.0.1:6379> INCR cnt
(integer) 1
127.0.0.1:6379> INCRBY cnt 3
(integer) 4

Strings can also be managed down to the bit level to store multiple integer counters of variable length and stored at different offsets of a single string to reduce storage overheads using the bitfield data structure:

127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5
1) (integer) 5
127.0.0.1:6379> BITFIELD cnt INCRBY i5 0 5
1) (integer) 10
127.0.0.1:6379> BITFIELD cnt GET i5 0
1) (integer) 10

Regular counters, sets, and hash tables perform well for any amount of data but handling large amounts of data represents a challenge to scale the resources of the machine where Redis Stack is running, because of its memory requirements.

Deterministic data structures have given way to probabilistic data structures because of the need to scale up to large quantities of data and give a reasonably approximated answer to questions such as the following:

  • How many different pages has the user visited so far?
  • What are the top players with the highest score?
  • Has the user already seen this ad?
  • How many unique values have appeared so far in the data stream?
  • How many values in the data stream are smaller than a given value?

In the attempt to give an answer to the first question in the list, we could calculate the hash of the URL of the visited page and store it in a Redis collection, such as a Set, and then retrieve the cardinality of the structure using the SCARD command. While this solution works very well (and is deterministically exact), scaling it to many users and many visited pages represents a cost.

Let’s consider an example with a probabilistic data structure. HyperLogLog estimates the cardinality of a set with minimal memory usage and computational overhead without compromising the accuracy of the results, while consuming only a fraction of memory and CPU, so you would count the visited pages and get an estimation as follows:

127.0.0.1:6379> PFADD pages "https://redis.com/" "https://redis.io/docs/stack/bloom/" "https://redis.io/docs/data-types/hyperloglogs/"
(integer) 1
127.0.0.1:6379> PFCOUNT pages
(integer) 3

Redis reports the following memory usage for HyperLogLog:

127.0.0.1:6379> MEMORY USAGE pages
(integer) 96

Attempting to resolve the same problem using a Set and storing the hashes for these URLs would be done as follows:

127.0.0.1:6379> SADD hashpages "522195171ed14f78e1f33f84a98f0de6" "f5518a82f8be40e2994fdca7f71e090d" "c4e78b8c136f6e1baf454b7192e89cd1"
(integer) 3
127.0.0.1:6379> MEMORY USAGE hashpages
(integer) 336

Probabilistic data structures trade accuracy for time and space efficiency and give an answer to this and other questions by addressing several data analysis problems against big amounts of data and, most relevantly, efficiently.

Programmability

Redis Stack embeds a serverless engine for event-driven data processing allowing users to write and run their own functions on data stored in Redis. The functions are implemented in JavaScript and executed by the engine upon user invocation or in response to events such as changes to data, execution of commands, or when events are added to a Redis Stream data structure. It is also possible to configure timed executions, so periodical maintenance operations can be scheduled.

Redis Stack minimizes the execution time by running the functions as close as possible to the data, improving data locality, minimizing network congestion, and increasing the overall throughput of the system.

With this capability, it is possible to implement event-driven data flows, thus opening the doors to many use cases, such as the following:

  1. A basic library including a function can be implemented in text files, as in the following snippet:
    #!js api_version=1.0 name=lib
    redis.registerFunction('hello', function(){
        return 'Hello Gears!';
    });
  2. The lib.js file containing this function can then be imported into Redis Stack:
    redis-cli -x TFUNCTION LOAD < ./lib.js
  3. It can then be executed on demand from the command-line interface:
    127.0.0.1:6379>  TFCALL lib.hello 0
    "Hello Gears!"
  4. Things become more interesting if we subscribe to data changes as follows:
    redis.registerKeySpaceTrigger("key_logger", "user:", function(client, data){
        if (data.event == 'del'){
            client.call("INCR", "removed");
            redis.log(JSON.stringify(data));
            redis.log("A user has been removed");
        }
    });

    In this function, we do the following:

    • We are subscribing to events against the keys prefixed by the “user: namespace
    • We check the command that triggered the event, and if it is a deletion, we act and specify what’s going to happen next
    • The triggered action will be the increment of a counter, and it will also write a message into the server’s log
  5. To test this function, we proceed to create and delete a user profile:
    127.0.0.1:6379> HSET user:123 name "John" last "Smith"
    (integer) 2
    127.0.0.1:6379> DEL user:123
    (integer) 1
  6. A quick check of the server’s log verifies that the condition has been met, and the information logged:
    299:M 05 Feb 2023 19:13:09.004 * <redisgears_2> {"event":"del","key":"user:123","key_raw":{}}
    299:M 05 Feb 2023 19:13:09.005 * <redisgears_2> A user has been removed

    And the counter has increased:

    127.0.0.1:6379> GET removed
    "1"

Through this book, we will come to understand the differences between Lua scripts, Redis functions, and JavaScript functions, and we will explore the many possible programmability features along with proposals to resolve challenging problems with simple solutions.

So, what is Redis Stack?

Redis Stack combines the speed and stability of the Redis server with a set of well-established capabilities and integrates them into a compact solution that is easy to install and manage – Redis Stack Server. The RedisInsight desktop application is a visualization tool and data manager that complements Redis Stack Server with a set of functionalities useful for visualizing data stored by different models as well as providing interactive tutorials with popular examples, and more.

To complete the picture, the Redis Stack Client SDK includes the most popular client libraries to develop against Redis Stack in the Java, Python, and JavaScript programming languages.

Figure 1.1 – The Redis Stack logo

Figure 1.1 – The Redis Stack logo

Redis Stack empowers users with the liberty to use it for free in development and production environments and merges the open source BSD-licensed Redis with search and query capabilities, JSON support, time series handling, and probabilistic data structures. It is available under a dual license, specifically the Redis Source Available License (RSALv2) and the Server Side Public License (SSPL).

So, in a few examples, we have introduced new possibilities to modernize applications, and now we owe you an answer to the original question, “What is Redis Stack?

Key-value storage

To define what Redis Stack is, we need to go back for a moment to its origins, because Redis is the spinal cord of Redis Stack. Redis was born as in-memory storage to accelerate massive amounts of queries and achieve sub-millisecond latency while optimizing memory usage and maximizing the ease of adoption and administration. It appeared at the same time as other solutions taking part in the NoSQL wave and deviating from relational modeling. While the key-value Memcached store was an already established solution, Redis became popular too as a type of key-value storage. So, we can surely say that Redis Stack can be used as a key-value store.

Data structure server

However, considering Redis Stack as a simple key-value data store is reductive. Redis is best known for its flexibility in storing collections such as Hashes, Sets, Sorted Sets or Lists, Bitmaps and Bitfields, Streams, HyperLogLog probabilistic data structures, and geo indexes. And, together with data structures, its efficient low-complexity algorithms make storing and searching data a joy for developers. We can certainly say that Redis Stack is also a data structure store.

Multi-model database

The features introduced so far are integrated into Redis Stack Server and extend the Redis server, turning the data structure server into a multi-model database. This provides a rich data modeling experience where multiple heterogeneous data structures such as documents, vectors, and time series coexist in the same database. Software architects will appreciate the variety of possibilities for designing new solutions without multiple specialized databases and software developers will be empowered with a rich set of client libraries that improve the ease of software design. Database administrators will discover how shallow the learning curve is to learn to administer a single database rather than installing, configuring, and maintaining several data stores.

Data platform

The characteristics discussed so far, together with stream processing and the possibility to execute JavaScript functions for event-driven development, push Redis Stack beyond the boundaries of the multi-model database definition. Combining Redis, the key-value data store that is popular as a cache, with advanced data structures and multi-model design, and with the capability of a message broker with event-driven programming features, turns Redis Stack into a powerful data platform.

We have completed the Redis Stack walk-through, and to conclude this chapter, we will briefly discuss how to install it using different methods.