Data Team

Updates to the Dailygraphics Rig
Adding support for Google Docs, CSV files, and a new JavaScript bundler

I didn’t choose to join Chalkbeat solely because the team was already using the open-source tools I had written at NPR, but I have to admit that it did make the transition a lot easier. One of the virtues of a successful open-source project is that it makes knowledge transferable. While the NPR Interactive Template and Dailygraphics Rig are not de facto standards, they’re common enough throughout the industry (and similar enough to related tools from other newsrooms) that learning them has benefits.

Of course, the other virtue of open-source software is that improvements don’t have to be limited to just the originating team — they can also come from users, and then make their way out to the broader community when they’re merged upstream. To that end, we’ve been working at Chalkbeat on adding new capabilities to Dailygraphics Next, making it an even more powerful tool for building charts and graphs on the web.

Supporting Google Docs and ArchieML

Historically, the Dailygraphics workflow has loaded text in from spreadsheets along with any other data. This works for fairly small and uniform pieces of content, like headlines and chatter, but it’s awkward for any substantial amount of text.

Here at Chalkbeat, we do most of our traditional charts through Datawrapper, falling back on the rig for graphics that don’t fit into a standard template, prioritize more interaction, or (in the case of our NYC COVID policy flowchart) both. That means it has become more common for us to need structured text, such as in our school board voter guides.

We added Docs support (with built-in ArchieML parsing via Betty) as a “secret” feature — it’s not built into the default UI the way that Sheets are, but can be added to a template or a graphic manifest to enable a similar “open document” button on any graphic. Once added, you can access the raw file via TEXT.raw in your HTML templates, and the parsed ArchieML object will be available as TEXT.parsed.

There are of course advantages in terms of structure — ArchieML makes it possible to build out freeform and nested data structures in a way that a spreadsheet can’t. It also gives us better options for migrating graphics to or from the Interactive Template, which added Docs support a few years back. But the real win for this feature is how much easier it is to edit and collaborate, since reporters and editors can add specific comments or track changes much more easily than in a cramped grid cell. If you’re doing any kind of narrative presentation in a graphics embed, it’s worth checking out!

Infrastructure upgrades

Last year, new users of Dailygraphics started to notice warnings in the console for the mold-source-map package — nothing fatal, but alarming. This package turns out to be part of the venerable Browserify bundler, which the rig used to combine and compile client-side scripts. One of the earliest script compilation tools from the modern JavaScript tooling era, Browserify was clearly suffering from some developer neglect compared to more modern tools like Webpack or Rollup.

The Dailygraphics rig does have slightly different needs from most front-end projects, which means we can’t just pull in off-the-shelf bundler configuration. For one thing, it’s building an arbitrary set of bundles based on the contents of the graphics repo, not a single well-known output. It also needs to compile dependencies from modules in a non-standard location, since the graphics are kept outside the rig itself. And it should be able to do so entirely in-memory, without producing files on disk, since Dailygraphics Next doesn’t store any of its compiled artifacts locally.

After some experimentation, we were able to move pretty seamlessly from Browserify to Rollup as a bundler, without breaking compatibility with older graphics at either Chalkbeat or at NPR. This change should put the rig onto firmer ground moving forward, including creating an easier path for moving to the newest versions of D3 (which is only distributed using the ES module import syntax that Rollup natively supports).

While we were in the guts of the rig’s dependencies, we were also able to update some other infrastructure: it now uses the newest version of the AWS client libraries, which we hope will make it more reliable when publishing and synchronizing graphics.

CSV and offline support

If you’ve ever had to set up a fresh newsroom dataviz toolchain, you know that one of the most painful parts of the process is getting authorization right. While the Google integration of the Dailygraphics rig is one of its biggest selling points, it’s also a cryptic trudge through OAuth tokens, environment variables, and a constantly changing service console. The need to authorize against a Google account also means that even with aggressive caching, it’s often difficult to use the rig if you’re traveling or on a less-reliable network connection.

Spurred by a summer of travel and relocation, team member Kae Petrin and I decided to take this opportunity to give the rig the ability to work with local data instead of pulling from Sheets. We did so by adding CSV support to its data pipeline based on a new manifest key.

If you want, you can now create graphics templates that never talk to Google at all, just by swapping references to the COPY variable in the HTML layer over to the new CSV object. This also plays nicely with the rig’s synchronization support: large data files can be excluded from source control, synchronized to S3, processed externally, and accessed as local CSV.

Hand-in-hand with this functionality, there’s now a new --offline flag for the rig that disables the authentication checks it normally performs. This “airplane mode” will still reach out to the Google Drive API if you load a graphic that includes a "sheet" or "doc" key in its manifest, but it won’t ever redirect you to the “authorize account” screen if your connection drops momentarily, making it a handy option for developers who want a fully local workflow, or who need the rig to just back off a little bit while on the road.

Join us on the cutting edge (or not)

Do these additions sound interesting to you? If so, feel free to pull from our Dailygraphics fork, which is under active development. However, if you’d prefer a little more vetting, many of these features have already been merged upstream into the main NPR version of the rig, where they’re tested by the team there before being enabled. At the time of this writing, the Rollup bundler and AWS upgrades have been merged in, and Docs support is in a branch undergoing testing.