Node.js modules you should know about: jsonstream

2012-01-04 10:42

Node.js modules you should know about: jsonstream

by Peteris Krumins

at 2012-01-04 02:42:47

original http://feedproxy.google.com/~r/catonmat/~3/X7r_RjoOs6M/nodejs-modules-jsonstream

node logoHello everyone! This is the thirteenth post in the node.js modules you should know about article series.

The first post was about dnode - the freestyle rpc library for node, the second was about optimist - the lightweight options parser for node, the third was about lazy - lazy lists for node, the fourth was about request - the swiss army knife of HTTP streaming, the fifth was about hashish - hash combinators library, the sixth was about read - easy reading from stdin, the seventh was about ntwitter - twitter api for node, the eighth was about socket.io that makes websockets and realtime possible in all browsers, the ninth was about redis - the best redis client API library for node, the tenth was on express - an insanely small and fast web framework for node, the eleventh was semver - a node module that takes care of versioning, the twelfth was cradle - a high-level, caching, CouchDB client for node.

This time I'll introduce you to a very awesome module called JSONStream. JSONStream is written by Dominic Tarr and it parses streaming JSON.

Here is an example. Suppose you have couchdb view like this:

{"total_rows":129,"offset":0,"rows":[
  { "id":"change1_0.6995461115147918"
  , "key":"change1_0.6995461115147918"
  , "value":{"rev":"1-e240bae28c7bb3667f02760f6398d508"}
  , "doc":{
      "_id":  "change1_0.6995461115147918"
    , "_rev": "1-e240bae28c7bb3667f02760f6398d508","hello":1}
  },
  { "id":"change2_0.6995461115147918"
  , "key":"change2_0.6995461115147918"
  , "value":{"rev":"1-13677d36b98c0c075145bb8975105153"}
  , "doc":{
      "_id":"change2_0.6995461115147918"
    , "_rev":"1-13677d36b98c0c075145bb8975105153"
    , "hello":2
    }
  },
  ...
]}

And you wish to only filter out doc values from the rows. You can do it easily with JSONStream this way:

var parser = JSONStream.parse(['rows', /./, 'doc']);

This creates a stream that parses out rows.*.doc.

Since it's a stream you have to feed it data and then have it output the data somewhere. You can do it very nicely and idiomatically in node this way:

req.pipe(parser).pipe(process.stdout);

Here is the output:

{
  _id: 'change1_0.6995461115147918',
  _rev: '1-e240bae28c7bb3667f02760f6398d508',
  hello: 1
}
{
  _id: 'change2_0.6995461115147918',
  _rev: '1-13677d36b98c0c075145bb8975105153',
  hello: 2
}

Where req is request to couchdb view and parser is the JSONStream parser, and it all gets piped to process.stdout. The output, as you can see, is only the rows.*.doc. That was a really easy way to parse a JSON stream without reading the whole JSON into memory.

You can install JSONStream through npm as always:

npm install JSONStream

JSONStream on GitHub: https://github.com/dominictarr/JSONStream.

Sponsor this blog series!

Doing a node.js company and want your ad to appear in the series? The ad will go out to 14,000 rss subscribers, 7,000 email subscribers, and it will get viewed by thousands of my blog visitors! Email me and we'll set it up!

Enjoy!

If you love these articles, subscribe to my blog for more, follow me on Twitter to find about my adventures, and watch me produce code on GitHub!