-
Notifications
You must be signed in to change notification settings - Fork 0
Storm programming model
It works on JVM, written in Closure but developers can write their application in many languages including Java, Groovy, Ruby, Python, JavaScript, Perl, PHP. Storm is designed to work with existing queuing and database technologies.
At the highest level, Storm is comprised of topologies. A topology is a graph of computations -- each node contains processing logic and each path between nodes indicates how data should be passed between nodes.
Inside of toplogies you have networks of streams, which are unbounded sequences of tuples. Storm provides a mechanism to transform streams into new streams using spouts and bolts. Spouts generate streams, which can pull data from a site like Twitter or Facebook and then publish it in an abstract format. Bolts consume input streams, process them, and then optionally generate new streams.
A stream can be simple and consumed by a single bolt or it could be complex, require multiple streams, and hence require multiple bolts. A bolt can do most anything, including run functions, filter tuples, perform stream aggregation, join streams, or even execute calls to a database.
Storm's data model is represented by tuples. A tuple is a named list of values of any type. Storm supports all primitive types, Strings, and byte-arrays and you can build your own serializer if you want to use your own object types. Your spouts will "emit" tuples and your bolts will consume them. Your bolts may also emit tuples if their output is destined to be processed by another bolt downstream. Basically, emitting tuples is the mechanism for passing data from a spout to a bolt, or from a bolt to another bolt.


