Data sources

As much as possible we have used data from official sources such as ONS

, Eurostat

One reason to do this is to enable the reproducibility of our analysis, and to remove the reliance of the tool on proprietary sources.

Having said this, in a small number of instances we have used proprietary data sources, such as Crunchbase

for the analysis of venture capital investment.


We use NUTS2

regions as our geographical unit of analysis. This has allowed us to collect data about regional R&D activity which is only available at that level. We note that where possible we have also calculated indicators at a higher level of granularity (NUTS3) as well as using policy-relevant LEP
boundaries. These will be released when the tool is published later in 2020.

In many cases we have reverse geocoded observations available at high level of geographical resolution (for example, the geographical coordinates of a higher education institution) using NUTS2 boundary files available from Eurostat. When doing this, we have assigned observations to regions in the NUTS2 version that was in use at the time when the data were collected / when the events captured in the data took place.

Data processing

In general, we have avoided complex data processing beyond what was required to aggregate data at our preferred level of geographical resolution. There are however a couple of exceptions to this:

  • We have calculated indices of economic complexity for UK regions used the algorithm developed by Hausman and Hidalgo (2009)

  • We have measured levels of employment in entertainment and cultural sectors using an industrial segmentation based on the methodology developed by Delgado et al (2015)

  • We have identified UKRI-funded research projects in STEM disciplines using a machine learning analysis of project descriptions presented in Mateos-Garcia (2017)


We indicate those indicators based on experimental methodologies or data sources where relevant.