Book Image

Reactive Programming for .NET Developers

Book Image

Reactive Programming for .NET Developers

Overview of this book

Reactive programming is an innovative programming paradigm focused on time-based problem solving. It makes your programs better-performing, easier to scale, and more reliable. Want to create fast-running applications to handle complex logics and huge datasets for financial and big-data challenges? Then you have picked up the right book! Starting with the principles of reactive programming and unveiling the power of the pull-programming world, this book is your one-stop solution to get a deep practical understanding of reactive programming techniques. You will gradually learn all about reactive extensions, programming, testing, and debugging observable sequence, and integrating events from CLR data-at-rest or events. Finally, you will dive into advanced techniques such as manipulating time in data-flow, customizing operators and providers, and exploring functional reactive programming. By the end of the book, you'll know how to apply reactive programming to solve complex problems and build efficient programs with reactive user interfaces.
Table of Contents (15 chapters)
Reactive Programming for .NET Developers
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface

Dataflow programming


Changing the application state is not something wrong by itself, but there are different programming approaches that may produce better results and together give the developer a more pleasant working experience. A typical use case happens when we deal with in-move data (or living data or data stream), where we may find that using interaction logic constructs that change state, such as if, for, and so on, is a poor performing choice together with a poor design. In-move data is any kind of data stream, such as a video stream, an application insights stream, and so on. Because of its statelessness, it is obvious that a stateless programming approach offers better results than a state-driven one.

We are used to dealing with static data, such as a variable, a database, or anything else such as some binary- or string-based data. All such data is data-at-rest, static data, or simply data.

As an example if we execute a select statement against a relational database, we will always have a result set containing the exact value contained in the database table at the specific time we executed the query. A second later, the table could experience an update statement that could change any row's data without the first client (the one executing the select statement) receiving an update on such data changes. To address these kinds of data changes without having to face issues between different relational database clients, there are optimistic and pessimistic concurrency checks (a bit outside the scope of this book). Obviously, the less we need to synchronize code to access a concurrent resource, the better our code will perform.

In imperative programming, control-flow is responsible for the good execution of the application. Such flow is usually made of multiple code rows that do something on input/output ports and somehow alter the application's state until the desired result is achieved, whereas in dataflow programming, data flows in and out of the different stages of a flowchart, as it behaves in a workflow.

Obviously, the different types of programming will greatly change the developers' experience and programming capability of the language. It is very difficult (and conceptually a bit wrong) to compute something by executing some interaction logic in data flow programming, because this kind of programming is simply outside the core design of the programming paradigm.

A practical example can be seen in data integrational , Extract, Transform, and Load (ETL) workflows, such as those available in SQL Server Integration Services (SSIS), as shown in the following screenshot. An ETL workflow has the task of reading (extracting) data from a data source (relational or not) and mapping (transforming) such data by grouping or aggregating it with other data sources or by executing transforming functions. Then, data flows (loads) into a target data store (relational or not) for future simplified access. SSIS is the tool for designing these kinds of workflows within the SQL Server Business Intelligence suite.

A SSIS dataflow task doing some transformations on data from a relational database

Generally speaking, outside the SQL server-oriented implementation of SSIS, within data flow programming, instead of having a huge code base in a high-level code, we have something like a data workflow. A digraph (directed graph)—an ordered version of a usual flowchart. Here is an example:

A simple representation of a dataflow digraph made of three recurring stages

Within the Microsoft universe, the only data flow programming compliant language is the Microsoft Visual Programming Language available for the Microsoft Robotics Developer Studio environment for robotics programming. Instead, SSIS simply uses data flows to handle data integrations between databases.

Statelessness

The unavailability of the state, a key concept of data flow programming, is the opposite of what happens in all imperative or object-oriented programming-based applications. This behavior drastically changes the programming experience.

A stateless design never stores (temporary or persisted) application or user data with the need of changing it in time for computational needs.

We cannot use temporary variables to store changeable values, such as the total of an invoice. We cannot use an index to jump around a collection or an array, and we cannot iterate it. Obviously, if we need to write a function that needs a variable, we cannot use a variable anymore; in other words, we can't use variables that act as a business logic state persistence.

When we write a function, the data will simply have an origin, a target, and one or multiple transformations in multiple stages.

Thanks to this stateless design of the whole application, it is easy to see that each stage can run on a different thread together and each input or output endpoint can run on another thread and so on. The stateless design is the key that makes the design able to scale out quite perfectly according to Amdahl's Law.

As well as performance results, a stateless design brings higher testability rates of the whole application (bear in mind ASP.NET WebForms versus ASP.NET MVC) together with a more modern approach in programming style that avoids the use of interactive loops such as for, for...each, and relatives.

The data-driven approach

The last evolution of the imperative programming paradigm is object-oriented programming. Such paradigms request we model our business world into a high-level domain model. This means that, in our code, we will find an object representing any living entity of our business model; a invoice or a customer are examples of such objects. Such models drive business logic. They do not need to be persisted in a one-to-one representation from the model to the persistence store (usually, a relational database). This approach is called domain-driven design. The opposite of such an approach is the data-driven design that makes direct actions against data without a real discrimination between data and business.

Because of the intrinsic behavior of dataflow programming, a data-driven design is the natural choice when designing a solution based on such programing paradigms.

But in the modern .NET-based programming style, the use of business-related entities in the various stages of the dataflow execution is available and suggested too.

Data streams

A data stream is the flow of some data in time, usually of a unique format, that is available to one or multiple readers.

Examples are television video streams, YouTube video streams, Twitter or RSS feeds, Azure EventHub, and so on. Those who are used to C# programming will remember the namespace System.IO that contains different classes made for stream programming, such as BinaryReader/BinaryWriter that makes available streaming any CLR low-level type or StreamReader/StreamWriter that makes available streaming any text supporting various encodings from ASCII to UTF32.

In other words, a data stream is some data in a specific time instance. Time is the key concept for understanding a data stream. It is all about running data or in-move data. Without the time component, data can never flow in a stream.

Depending on the stream, it may support seeking operations, such as the ability to go forward and backward along the stream to start flowing data exactly at the desired time. Television video streams do not support such a feature. Microsoft Azure EventHub (a data stream) instead supports the feature in a configured time window of usually some hours.

The Azure EventHub is a paralleled stream service for streaming any data, usually used for Internet of Things (IoT) devices, telemetry values, or diagnostic purposes as an application insights collection. A similar choice within the Azure offering is the IoT Hub, another streamed service totally oriented to IoT devices that support more specific protocols.

Observer pattern

The Observer pattern, is a published subscribe style pattern; it defines the ability to register for data change or event signaling. Although written words may be something new to read about, the observer pattern is something absolutely overused in the event-driven programming paradigm of the Microsoft oriented languages in the last 20 years.

When we handle the Button click event, either in Visual Basic (from version 1 to 6) or in modern .NET Windows Forms, WPF applications, or even ASP.NET applications, we are simply using an implementation of the observer pattern.

The pattern defines an event generator, also known as subject, that fires the event and one or multiple event listeners or observers (in .NET, also known as event handlers) that will do something in reaction to the data state change or event signaling.

When dealing with data flow programming, the observer pattern is the one responsible for the acknowledgment between stages of the new data available. Each stage informs the following stages of new data available with a signal. Stages do not know about the overall design of the digraph. They simple signal the new data availability event; all subsequent stages that are observing the previous one for new data will be then acknowledged. This design makes useless the need of an overall data state, thus the design is stateless. Each stage observes or is being observed. That is all. Such data flowing between stages are data streams.