Node.js for general scripting

I have been fiddling around with Node lately, and I really, really like it. A lot. It is intended for building fast, scalable network applications and data-intensive real-time applications that run across distributed devices, but it is also very handy for one-off scripting.

I have a ~2,000 line JSON file for some variable meta that looks like this x 82:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
"p1": {
"db": "population",
"field": "p1",
"category": "character",
"title": "Population",
"description": "Total Neighborhood Profile Area (NPA) population.",
"importance": "Total population is a baseline measure that indicates the number of people living in an NPA. It is used to calculate density, and other per capita data.Trends in population show where the community is growing and where it is decreasing, which can help the City, County and Towns as well as other services providers know where infrastructure and other services may be needed.The 2010 Census reported 308.7 million people in the United States, a 9.7 percent increase from the Census 2000 population of 281.4 million.",
"tech_notes": "Provides the population based on the 2010 Census for each NPA using Block Group geography. Census block groups are the second smallest unit of measure used in the decennial Census. Only the census block is smaller. A block group is a cluster of census blocks within the same census tract. The average block group contains 39 blocks comprising between 600 and 3,000 people, with an optimum size of 1,500 people.",
"source": "U.S. Census, 2010",
"links": "<a href='http://www.census.gov/2010census/data/'>2010 U.S. Census</a><br><a href='http://charmeck.org/city/charlotte/growthstrategy/Pages/default.aspx'>Charlotte-Mecklenburg's growth strategy</a>",
"style": {
"breaks": [
0,
1500,
2500
],
"colors": [
"#D8F2ED",
"#2ca79e",
"#154F4A"
]
}
}

This seemed like a powerful-smart idea at the time. How often do you need to edit that kind of stuff?

If you’re on the Quality of Life project, it turns out the answer is every day for a year. And the people doing the editing aren’t developers. Suddenly putting the meta in JSON is a terrible idea. I didn’t have time to fix it back then, but for the next go-round most of this stuff is getting crammed into 82 markdown files. Now a little one-off script to do that is in order.

Normally I reach for Python for this kind of stuff, and that’s fine - json.loads(meta.json) and you have a Python data structure you can iterate through. But for funsies I thought I’d give it a go in Node instead.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
var     start = new Date().getTime(),
_ = require('./underscore-min.js'),
meta = require('./metrics.json'),
toMarkdown = require('./to-markdown.js').toMarkdown,
out = "",
fs = require('fs');

_.each(meta, function(item, i) {
out = "## " + item.title + "\n";
out += item.description + "\n\n";
out += "### Why is this important?\n";
out += item.importance + "\n\n";
out += "### About the Data\n";
out += item.tech_notes + "\n\n";
out += "_Source: " + item.source + "_\n\n";
out += "### Additional Resources\n";
_.each(item.links.split("<br>"), function(item){
out += "+ " + toMarkdown(item) + "\n";
});
fs.writeFile("./meta/" + item.field + ".md", out, function(err) {
if(err) {
console.log(err);
}
});
});

console.log("done in " + Math.abs(start - new Date().getTime()) / 1000 + " seconds");

I know, I didn’t need to use Underscore just for this, plain old JavaScript would have been fine. Shut up.

One of the great things about Node is you can usually use the same libraries you use in client-side browser scripting. Aside from Underscore (shut up) I’m grabbing a nifty HTML to Markdown converter for the links. I iterate through the JSON, build a markdown string, and write it to a file named after the variable. It outputs files like this:

p1.markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
## Population
Total Neighborhood Profile Area (NPA) population.

### Why is this important?
Total population is a baseline measure that indicates the number of people living in an NPA. It is used to calculate density, and other per capita data.Trends in population show where the community is growing and where it is decreasing, which can help the City, County and Towns as well as other services providers know where infrastructure and other services may be needed.The 2010 Census reported 308.7 million people in the United States, a 9.7 percent increase from the Census 2000 population of 281.4 million.

### About the Data
Provides the population based on the 2010 Census for each NPA using Block Group geography. Census block groups are the second smallest unit of measure used in the decennial Census. Only the census block is smaller. A block group is a cluster of census blocks within the same census tract. The average block group contains 39 blocks comprising between 600 and 3,000 people, with an optimum size of 1,500 people.

_Source: U.S. Census, 2010_

### Additional Resources
+ [2010 U.S. Census](http://www.census.gov/2010census/data/)
+ [Charlotte-Mecklenburg's growth strategy](http://charmeck.org/city/charlotte/growthstrategy/Pages/default.aspx)

That I can give to a non-programmer to edit. And they will. Every day. For a year.

Node reads ~2,000 lines of JSON, loops 164 times (82 times + an internal loop in each), and outputs 82 markdown files, in ~0.05 seconds. Yes, that’s 5 hundredths of a second. I wasn’t exactly in a hurry for this, but daaaaaaaaaammmn.

For general automation/ETL type stuff and one-off scripts like this, I’m adding Node to my arsenal.