Book Image

Storm Real-time Processing Cookbook

By : Quinton Anderson
Book Image

Storm Real-time Processing Cookbook

By: Quinton Anderson

Overview of this book

<p>Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!<br />Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.<br /><br />The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.</p>
Table of Contents (16 chapters)
Storm Real-time Processing Cookbook
Credits
About the Author
About the Reviewers
www.packtpub.com
Preface
Index

Unit testing a bolt


Unit testing is an essential part of any delivery; the logic contained in the bolts must also be unit tested.

Getting ready

Unit testing often involves a process called mocking that allows you to use dynamically generated fake instances of objects as dependencies in order to ensure that a particular class is tested on a unit basis. This book illustrates unit testing using JUnit 4 and JMock. Please take the time to read up on JMock's recipes online at http://jmock.org/cookbook.html.

How to do it…

  1. In the src/test/java folder, create the storm.cookbook package and create the StormTestCase class. This class is a simple abstraction of some of the initialization code:

    public class StormTestCase {
    
      protected Mockery context = new Mockery() {{
            setImposteriser(ClassImposteriser.INSTANCE);
        }};
        
    
        protected Tuple getTuple(){
            final Tuple tuple = context.mock(Tuple.class);
            return tuple;
        }
    }
  2. Then create the TestRepeatVisitBolt class that extends from StormTestCase, and mark it with the parameterized runner annotation:

    @RunWith(value = Parameterized.class)
    public class TestRepeatVisitBold extends StormTestCase {
  3. The test case logic of the class is contained in a single execute method:

    public void testExecute(){
          jedis = new Jedis("localhost",6379);
          RepeatVisitBolt bolt = new RepeatVisitBolt();
          Map config = new HashMap();
          config.put("redis-host", "localhost");
          config.put("redis-port", "6379");
          final OutputCollector collector = context.mock(OutputCollector.class);
          bolt.prepare(config, null, collector);
    
          assertEquals(true, bolt.isConnected());
    
          final Tuple tuple = getTuple();
          context.checking(new Expectations(){{
             oneOf(tuple).getStringByField(Fields
                    IP);will(returnValue(ip));
             oneOf(tuple).getStringByField(Fields
                   .CLIENT_KEY);will(returnValue(clientKey));
             oneOf(tuple).getStringByField(Fields
                    .URL);will(returnValue(url));
            oneOf(collector).emit(new Values
                   (clientKey, url, expected));
         }});
    
            bolt.execute(tuple);
            context.assertIsSatisfied();
    
            if(jedis != null)
                jedis.disconnect();
        }
  4. Next, the parameters must be defined:

    @Parameterized.Parameters
      public static Collection<Object[]> data() {
         Object[][] data = new Object[][] { 
           { "192.168.33.100", "Client1", "myintranet.com", "false" },
           { "192.168.33.100", "Client1", "myintranet.com", "false" },
           { "192.168.33.101", "Client2", "myintranet1.com", "true" },
           { "192.168.33.102", "Client3", "myintranet2.com", false"}};
      return Arrays.asList(data);
     }
  5. The base provisioning of the values must be done before the tests using Redis:

    @BeforeClass
        public static void setupJedis(){
            Jedis jedis = new Jedis("localhost",6379);
            jedis.flushDB();
            Iterator<Object[]> it = data().iterator();
            while(it.hasNext()){
                Object[] values = it.next();
                if(values[3].equals("false")){
                    String key = values[2] + ":" + values[1];
                    jedis.set(key, "visited");//unique, meaning it must exist
                }
            }
        }

    Tip

    It is always useful to leave data in the stack after the test completes in order to review and debug, clearing again only on the next run.

How it works…

Firstly, the unit test works by defining a set of test data. This allows us to test many different cases without unnecessary abstractions or duplication. Before the tests execute, the static data is populated into the Redis DB, thus allowing the tests to run deterministically. The test method is then executed once per line of parameterized data; many different cases are verified.

JMock provides mock instances of the collector and the tuples to be emitted by the bolt. The expected behavior is then defined in terms of these mocked objects and their interactions:

context.checking(new Expectations(){{oneOf(tuple).getStringByField(Fields.IP);will(returnValue(ip));
oneOf(tuple).getStringByField(Fields.CLIENT_KEY);will(returnValue(clientKey));oneOf(tuple).getStringByField(Fields.URL);will(returnValue(url));
oneOf(collector).emit(new Values(clientKey, url, expected));
        }});

Although these are separate lines of code, within the bounds of the expectations they should be read declaratively. I expect the getStringField method of the tuple to be called exactly once with the value ip, and the mock object must then return a value to the class being tested.

This mechanism provides a clean way to exercise the bolt.

Tip

There are many different kinds of unit tests; often it becomes necessary to test against a DB in such a manner; if you can help it, rather mock out all dependencies of the class and implement a true unit test. This would be possible with the geography bolt due to the resolver abstraction.