I've been thinking a little about how to make d8taplex more accessible. One of the challenges is that users don't necessarily know what data - or what variables - are available in the million plus time series in the system. One idea to help surface this sort of information is to mine the tables for concepts in the labels given to the variables.
I've done a little of this and while there is a long way to go, I can see a couple of trends.
Firstly, country names and the names of other geographic and political areas are extremely common. Thus it would seem appealing to provide some sort of location based pivot to the data.
Secondly, some of the rarer concepts that appear in multiple data sets (from different sources) might be worth surfacing. For example, crack cocaine, debt redemption, manslaughter, road users, nitrogen dioxide, etc.
Thirdly, the user of parentheticals is very interesting. This may indicate abbreviations, notes, units of measure, magnitudes (e.g., thousands), etc.
Providing this conceptual angle looks like a useful area of investment, so I will continue to explore.
Thanks to David Joerg of Data Collective for the discussion.


Comments