tl;dr Stroom Data comes with a set of command line tools that lets you experiment with data held in Stroom using your prefered languages and frameworks.
At the core of Stroom Data lies the idea of the JSON document stream. It is a concept that through its simplicity both sets limitations and opens up possibilities. If you work in a team of data engineers that is planning a 12-month-long project to build a platform capable of ingesting terabytes of data daily for a customer-facing product, then you may find the limitations to be a hindrance. But, if you are a single engineer needing to ingest gigabytes of data daily in order to make them available for future exploratory work that is not yet well understood, then the possibilities should excite you!
If you want to export a stream so you can attack it with your current toolset—whether it be Python, R, Julia, Mathematica or some other awesome system we haven’t even heard of—all you need to do is use the sd_dump tool to write the contents of a stream to a file and sd_load to load your output file back into Stroom.
The following sample shows the simplicity of loading a file of raw json into a stream.
$ sd_load metrics output.json 100.0% |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 1000K docs, 220MB data [23229 docs/s, 4.9 MB/s] Completed in 43 seconds
For some use cases, you don’t even need to extract the data in bulk, instead you can use the tools sd_read and sd_write to pipe a stream through your own tools and back into a stream in Stroom. The following sample shows how even grep can be used to filter the contents of a stream and write the output to another stream.
$ sd_read metrics | grep “iOS” | sd_write ios_metrics
In this sample, sd_read reads a stream of JSON documents called “metrics” and writes each of the documents as a line of text to standard out. Grep filters the data and passes it on to sd_write which then reads its input from standard in, collects it into batches, and appends it to a stream called ios_metrics.
The following screencast shows the usage of these tools in further detail.
Got a specific language you would love to be writing your Stroom services in? Let us know in our google group: https://groups.google.com/forum/#!forum/stroomdata
As always, follow us on Twitter @StroomData to stay up-to-date on releases, screencasts, and other Stroom related news! Have a look at the github repo https://github.com/DoubleDutch/StroomData for more information and to download the latest release!