One reason to do this is to enable the reproducibility of our analysis, and to remove the reliance of the tool on proprietary sources.
Having said this, in a small number of instances we have used proprietary data sources, such as Crunchbase
We use NUTS2
In many cases we have reverse geocoded observations available at high level of geographical resolution (for example, the geographical coordinates of a higher education institution) using NUTS2 boundary files available from Eurostat. When doing this, we have assigned observations to regions in the NUTS2 version that was in use at the time when the data were collected / when the events captured in the data took place.
In general, we have avoided complex data processing beyond what was required to aggregate data at our preferred level of geographical resolution. There are however a couple of exceptions to this:
We have calculated indices of economic complexity for UK regions used the algorithm developed by Hausman and Hidalgo (2009)
We have measured levels of employment in entertainment and cultural sectors using an industrial segmentation based on the methodology developed by Delgado et al (2015)
We have identified UKRI-funded research projects in STEM disciplines using a machine learning analysis of project descriptions presented in Mateos-Garcia (2017)
We indicate those indicators based on experimental methodologies or data sources where relevant.