Client Libraries for Stroom

June 21, 2016 Kasper Jeppesen

The web interface provided with the Stroom reference implementation is by far the quickest way to prototype simple flows of stream-based data, but if you are looking to do some serious number crunching or need to use extensive libraries for data processing tasks such as NLP you will probably want to write a standalone service that can read and process data from one or more streams in your Stroom installation. For this purpose, you can either manually access Stroom through the Http-based API or use the client library introduced in the v0.1.3 release.

This first version of the client library is written in Java 8, specifically server side JVM usage. Let’s have a look at a very simple code sample that instantiates the Stroom client library, gets the count of documents in a stream, and appends a document.

The interface to manipulate a stream is straightforward and is in fact the exact same interface we use inside the Stroom reference implementation to implement Javascript services. You could easily use this interface to implement your own continuously running filter or aggregate service, but the client library already includes functionality to easily write these kinds of services. The next sample demonstrates this functionality by using a very simple lambda function to set up a filter service that will continuously filter data as it is being added to the input stream.

Just as is the case in the Javascript services, you can return new objects and even return an array of objects from your filter functions. The following sample reads documents from an input stream that each has a text field. It splits this text and sends a separate document for each word out to the output stream.

Aggregates are equally simple to implement using the Java 8 lambda syntax. The following sample reads the previous word output and maintains an aggregate with a simple count.

All of these filters and aggregates maintain their state in Stroom in exactly the same way the Javascript services defined through the web interface do—by maintaining state in a “.state” stream. This means that you can start and stop your services without any problems—they will automatically continue where they left off. If you want to reprocess a stream, you can call reset on the service before calling start.

We look forward to adding more features to this Java-based client library as well as introducing new client libraries for many other languages currently in use at DoubleDutch. Got a specific language you would love to be writing your Stroom services in? Let us know in our google group: https://groups.google.com/forum/#!forum/stroomdata

As always, follow us on Twitter @StroomData to stay up-to-date on releases, screencasts, and other Stroom related news! Have a look at the github repo https://github.com/DoubleDutch/StroomData for more information and to download the latest release!

Previous Article
The Curious Case of the Very Lazy, but Blazingly Fast, JSON Parser

I wrote a JSON parser to improve performance for a very specific use case and found that it was also blazi...

Next Article
Command Line Tools for Stroom