GII 2020 Special Section: Matching Science & Technology (S&T) Clusters to Population

Tuesday, October 20, 2020

As in previous years the Global Innovation Index 2020 will feature a special section ranking the top 100 science and technology (S&T) clusters in the world. In this year’s edition, we adopt a new methodology adjustment for population aggregation to improve results in terms of geographically defining an S&T “cluster”.


Utilizing population data to enhance our cluster comparisons provides substantial improvement to our analysis. Unfortunately, aligning our “bottom up” clusters with typical population statistics is less than ideal. Our identified clusters almost never conform to standard administration boundaries with which we could find population statistics (for example, census blocks in the U.S. or NUTS—2/3 regions in the European Union). In addition, finding consistent administrative population data across multiple countries proved difficult.


To address these issues, we turned to the European Commission’s Global Human Settlement population distribution data. This data provides an estimation of population for every 250–300 square meters. By disaggregating census population data based on satellite imagery, we are able to plot population based on where people actually live, rather than just on arbitrary political boundaries. Having the population distribution at such a high level of detail allows us to reaggregate population into custom geographies (i.e., our clusters). Thus, just like our inventor/author geocoded locations, this population data allows us to define total population from the bottom up.


Matching the population data with our clusters is done geographically by capturing all pixels that are contained within a cluster’s area. For the purposes of aggregating population, we defined a cluster’s area as all space within 0.05 degrees of each inventor’s location. Once the buffer radius was applied, we combined all areas of a cluster into one final polygon. We achieved the final total population by summing the values of all the population pixels that are contained in the final cluster polygon.


The use of a buffer was preferred to possible alternative methods, due to its ability to capture nearby population pockets. For example, if we had limited our cluster area to edges defined only by our cluster points, we may have missed dense population areas that were just next to one of our points. This would have caused an underestimation of the population. As can be seen in Figure A-1, if we had used only our cluster points to define the edges of San Jose-San Francisco, we would have missed the dense urban area of Concord, California. The use of buffers also minimizes errors that could occur from overreliance on imprecise geolocation. For example, our scientific publication data is only geocoded at the city. Thus, the use of a buffer for these points more appropriately reflects the lack of precision that some of our geolocated points have.


Figure A-1: Comparing buffer radius to define a science and technology “cluster”

Source: WIPO Statistics Database, March 2020; Schiavina et. al, 2019.


Buffers require a choice of radius size or how much area around the point should be included. Similar to choosing the radius and density parameters used for DBSCAN, we chose a buffer radius that minimizes the potential for false negatives (not capturing population areas that should be included in the cluster) and false positives (capturing areas that should not be included). Increasing the buffer radius decreases the risk of underestimating the population but increases the risk of overestimating it. If we had used 0.01 degrees as the radius, we would not have captured Concord, causing an underestimation. However, if we had chosen 0.10 degrees, we would have captured the city of Antioch, California, which is in the next valley over from Concord. This would have caused an overestimation of the population. Therefore, we calculated population using a number of different radiuses for the buffer and looked at the changes in the population estimations, preferring the one that minimized large shifts. When compared to other distances, a radius of 0.05 degrees minimized large shifts in the total population calculated across all clusters as well as minimized the maximum population shift of any one cluster.


This adapted excerpt comes from the GIobal Innovation Index 2020: Top 100 Science and Technology Clusters special section, authored by Kyle Bergquist and Carsten Fink, both from the World Intellectual Property Organization (WIPO).

You may also like

View all blog posts