The Astropy Problem

This came across my “professional” social feeds today (meaning the Facebook groups frequented by people who are astronomers and/or scientists).

The paper itself is something that one might expect to actually show up as an opinion piece in Nature or Science, though I doubt it will. It’s a little too biographical for those places.

The general gist of things is that this white paper is calling out what most people already know: code costs money and providing good code to the scientific community is costing some people in terms of their careers.

 

But, open source code is free, right?

This has been the comment that I’ve heard for years now. There are really awesome pieces of code that I use nearly every day that are produced by people who don’t even give me the option to give them money for the amount of time that they save me. And uniformly these people have a day job in the sciences.

When I’ve brought this up to people that maybe the people (especially those of older generations), the response is usually “Well, those people would be writing code anyways…” Which is just bonkers.

The Astropy Way

I’m going to assume that you’ve taken the ten minutes to go read the paper I linked above. If not, go do this now.

I’ll wait, seriously.

TL;DR? Fine.

The short version is that the coding community that’s support a vast majority of the world’s scientific research is dependent upon an (unpaid) workforce of people. This is unsustainable in an environment where facult jobs at academic institutions are baised against people who’s efforts allow other people to publish but who rarely get the citation record that they deserve. We also appear to have funding agencies who are perfectly happy to take advantage of these resources but are unwilling to devote any money to their development.

Pass the hat, but everyone must contribute (kind of).

The solution put forward in the paper is to introduce a mandatory line item for funding a dedicated on-site programmer once a project reaches a financial watermark.

On the surface, I think this makes a bit of sense: If a project is big enough so that it will take a significant advantage of the code (and ask the developers to support their particular needs). And while I admite the thought of “If a project is big enough, then it can afford to pay…” I think that practically the paper doesn’t actually go far enough to democratize the system.

It also got me wondering about the raw numbers. Are there really enough whales projects out there that could support several (many?) profession scientist/programmers? So I went digging.

The National Science Foundation (NSF) is pretty awesome, in that it actually publishes all of the yearly awards. You can download a list of everything that you (here assuming you’re an American taxpayer) helped fund this year and everything is stored in a pretty easy to parse XML code. NASA tends to post long PDFs of various items that it’s funded over the year, but that doesn’t really help me out all that much in terms of ease of access to data.

I pulled down the NSF 2016 data and wrote a little python notebook to run through all of it (using the handy dandy XML parser library that comes standard in python as well as some date handling routines in Astropy) to see how many projects the NSF funds to the tune of more than $1m per year. If you figure that the average  faculty might make $100k and University overhead rates are close to 100%, then this basically covers five people working full time on a project per year.

How many would you guess?

I know from my experience that even space projects like the one that I work on typically operate on a shoestring budget (hence the fancy 2009 VW Golf that I drive…). But maybe there are large collaborations out there that are pulling in far more money…

Wrong!

I would have bet more than three…wow. And only really two, since one of the awards was actually bridge funding for the NSF management between fiscal years.

 

I think maybe this is hiding some of the cost of these projects, but the idea that any of the grants that are currently being funded by NSF could easily cover an additional $150k in developer costs seems to stretch the imagination. Oh well.

But a good start…

However, I think the one point that this paper does make very nicely is that it’s going to be essential to get some thinking along these lines into the next decadal survey.

If you’re not familiar with the term, a decadal survey (like this one) is where the science community gets together to decide what the priorities should be for the next decade. NASA and NSF usually acts on these recommendation and  every large mission (>$1bn) that NASA has produced in the last fourty years has been the result of a recommendation out of a decadal survey.

I think the idea here would be to start some kind of ground swell supporting data scientists moving forward. Though I think maybe the target needs to be the NASA centers themselves rather than the broader “academic” science community.

I know that in heliophysics there is a cadré of programmers that maintain the software that a large fraction of the community uses. It would be interesting to know how they’re funded and see if that sort of model could address similar needs in astrophysics. The one problem with this (as the paper points out) is that funding for a specific project or satellite usually comes with the caveat that producing anything other than mission-specific code for the data processing pipeline is off the table.

Which brings us all the way back to the beginning again. It’s a problem, and I’m not sure what the solution could be.

Leave a Reply

Your email address will not be published.