Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Bluetongue symptoms

Resource allocation for biomedical research: analysis of investments by major funders


For the first time, data from major funders of biomedical research are collated in a harmonised and standardised way through the World RePORT platform, allowing for instrumental information on what is being funded, by whom and where, to be analysed and shared on a yearly basis and on a global level. The availability of this information fills an important knowledge gap where this type of information was only available for some diseases or countries [1–6].

The World RePORT platform is hosted by the United States of America’s National Institutes of Health (NIH) and represents a coordinated and collaborative data-sharing effort among 10 major funders of health research that are members of the Heads of International Research Organizations group. Collectively, 8 of the 10 funders that have reported since 2012 account for approximately 76% of the annual health research expenditure of 41 major public and philanthropic funders of health research, as reported by Viergever and Hendriks in 2015.

The specific objectives of this study are to explore how investment decisions on biomedical research by the 10 funders who reported data in 2016 have been allocated among recipient countries and organisations and to develop a method using text data-mining techniques to classify these grants into health categories. This analysis allows the assessment of what is being funded more broadly and for particular health areas of global importance such as research grants for neglected diseases and for pathogens on the research and development (R&D) blueprint list, which have been identified by WHO as a priority list of pathogens due to their expected highly infectious nature [9, 10].

This analysis is part of the World Health Organization Global Observatory on Health Research and Development work’s with the overall goal of enabling evidence-informed deliberations and decisions on priorities for new investments in health R&D.

Data source

Grants data for 2016 were collected using the export function of the World RePORT online platform, complemented, where available, with grant abstracts collected directly from each funder’s website and mapped to the exported World RePORT database using the unique grant identifier number.

The World RePORT data include information on direct (primary) grants provided to recipient institutions as well as collaborations with other institutions resulting from these grants (indirect grants administered by recipient institutions).

Data analysis

The analysis first explored the distribution of direct grants according to the parameters below and then explored the nature of collaborations between institutions that resulted from those direct grants. The following questions were explored (the analysis is also available in interactive data visualisations from the WHO Global Observatory for health R&D, which enables exploration of several of these parameters in relation to each other [12, 13]):

Distribution of grants by:

fundergrant recipients’ region, income group, country and institutionstype of grant (e.g. research, training)health category: disease or conditionAverage grant durationNature of collaborations between recipients of direct grants and institutions they collaborated with

The data on funding amounts for 2016 was also explored but, since they have not been complete or harmonised yet for 2016, they were not considered for this analysis.

Data checks for consistency and internal validity were performed using Microsoft Excel software. These included internal validity such as valid range of years or uniform country names.

Classification of grants by region and income group

Regional classification follows the WHO regional groupings. Country income group classification is based on the world development indicators of the World Bank. When the country or area was not included in the World Bank income classification list (2% of the data), we performed an online search of the most recent and reliable data on gross domestic product per capita for these areas and applied the cut-off point for income groupings proposed by the World Bank to classify them into one of the four income groups.

Classification of grants by type

To determine the type of grant, we searched for existing taxonomies, glossaries or categories of the type of grants from the websites of major health research funders (such as National Science Foundation’s glossary and NIH’s glossary and acronym list) and contacted the focal points of each the World RePORT platform funders for any unpublished sources. The lists we retrieved generally included long lists of keywords not appearing to belong to an intentional classification of projects by type (e.g. outcomes, software, database, evaluation, anthropology). We therefore developed our own synonyms list to capture the various terms used to refer to the following categories that emerged from the data: core institutional funding, training (e.g. postgraduate degrees), capacity strengthening (e.g. fellowship, prize), meetings and networking. All other grants falling outside of these categories were classified as research. The categories and list of synonyms for each category were refined and expanded in various iterations during data cleaning and analysis. This was done by reviewing the grant titles and searching for various ways of expressing the category in a snowball manner, including language variations. The search continued until no further synonyms were found.

Sensitivity analysis for the health category classification approach

To assess the accuracy of the disease categorisation algorithm, we first stratified the data by funder and calculated the percentage of each funder’s contribution to the total number of direct grants in 2016. We then drew a random sample aiming for 100 records, representing a confidence level of 95%. The sample was weighted by funders contribution, which after rounding up, resulted in 107 records. Indirect grants (resulting from collaborations with primary grant recipients) were excluded from this analysis as they had the same title and abstract as direct grants. The sample was drawn from the whole data, whether ultimately classified or not.

Two authors independently reviewed the sample (AHR and TA). At the end of the process, the coding by reviewers was compared, and any discrepancy was resolved by consensus. The following process was used:

If a classification was available, record (yes or no) whether the disease categorisation is accurateFor inaccurate or no classification, classify the reasons into the following categories:

Use of unspecific or highly technical language without reference to a disease (e.g. molecular biology, cell biology, biochemistry, basic sciences)General topics with no disease focus, including non-research types of grants such as training or core fundingNew synonyms discoveredThe disease was not the first mentioned close to the beginning of the text fieldThe topic of the grant was on more than one disease

Distribution of grants by funder, type of grant and average grant duration

As shown in Table 1, a total of 69,420 grants were provided by the 10 funding organisations in 2016. The United States of America’s NIH funded the greatest number of grants (52,928; 76%) and had the longest average grant duration (6 years and 10 months). Out of the total number of grants, 70.4% were for research (48,879), followed by training (13,008; 18.7%) and meetings (2907; 4.2%) (Fig. 1).

Distribution of grants by recipients’ region, income group, country and institution

Of grant recipients by income group, high-income countries received 98.9% of all grants, whereas low-income countries received only 0.2% (165) (Table 2). Among the 450 grants received by African countries (Table 3), South Africa (upper–middle-income country) received the highest number of grants (156; 34.7%) and was the fifth on the list of top 10 countries that received the highest number of grants. The remaining 9 countries were in the European (7) and the Americas regions (2) (Table 3).

Distribution of grants by health category

Almost three-quarters of all grants were for non-communicable diseases (72%; 40,035), followed by communicable, maternal, perinatal and nutritional conditions (20%; 11,123) and injuries (6%; 3056) (Table 4, Fig. 2).

Among non-communicable diseases, 24% (9483 grants) were for malignant neoplasms, followed by mental and substance use disorders (15%; 5945), neurological conditions (12%; 4981), and cardiovascular diseases (11%; 4473). Among communicable, maternal, perinatal and nutritional conditions, nearly 80% of grants (8826) were for infectious and parasitic diseases, followed by respiratory infections (7%; 738), nutritional deficiencies (6%; 651) and neonatal conditions and maternal conditions (both at 4%; 496 and 412, respectively) (Table 4).

Looking at select health areas of global importance, analysis of grants for neglected tropical diseases show that they represented 1.1% (792) of all grants, of which dengue (16%; 125 grants) and leishmaniasis (13%,102 grants) were the two individual diseases that received the highest number of grants. Similarly, 0.4% (274) of all grants were for one of the priority diseases on the WHO list of highly infectious pathogens (R&D blueprint pathogens); 83% of these were for Ebola virus disease (43%; 117), Zika virus disease (32%; 89) and severe acute respiratory syndrome (8%; 21).

Nature of collaborations resulting from direct grants

Around 10% (6918) of direct grants resulted in collaborations with other institutions, which did not always translate into a transfer of funds from the primary recipient to the collaborating institutions; 96.4% (6669) of these direct grants had been awarded to recipients in high-income countries (Table 5) and 75.8% (14,619) of the collaborations resulting from these grants were with others in high-income countries. In fact, for each income group, collaborations were most likely to be with others in the same income group, followed by institutions in high-income countries. For example, grant recipients in low-income countries (66) collaborated most with institutions in low-income countries (88), followed by institutions in high-income (78), lower–middle-income (11) and upper–middle-income (8) countries (Table 5).

Sensitivity analysis

Table 6 describes the sample size for the sensitivity analysis and the percentage of each funder’s contribution to the total number of direct grants (69,420) in 2016. The sample consisted of 107 records, after rounding up of percentage figures.

Table 7 shows that, out of a random sample of 107 grants, 81% were assigned to a health category and, in 91% of the cases, the classification was accurate. Classification accuracy was 98% when the title was used compared to 84% when the abstract was used. However, classification based on abstract contributed around 50% of classified grants, hence its usefulness. In 40% of the cases when a grant was not classified, no abstract was available. In the 28 cases where grants were misclassified, the main reasons were unspecific or very technical language used with no disease mentioned (11; 39%), general topic not linked to a specific disease focus (7; 25%), or new synonyms were discovered that could have allowed a classification to be made (9; 32%).

Overall, applying a data-mining algorithm that selects the first mention of a disease in the title or, failing this, the abstract, appears to yield reliable results; only in 1% of all classified grants (1/87) was the primary disease not the first mentioned in the title or abstract. In this case, the attributed disease was associated with the primary disease topic of the research.


The analysis presented in this paper provides, for the first time, an overall overview of what is being funded, by whom and where, among major international funders of biomedical research globally and for all disease areas.

The analysis highlights important findings on current resource allocation decisions and the nature and reach of research collaborations across regions. These include the large share (72%) of non-communicable diseases among all grants, the very small proportion of direct funding reaching low-income countries (0.2%), and the fact that neglected diseases such as those on the WHO list of neglected tropical diseases remain very neglected in terms of R&D investments (only 1.1% of all grants provided to this area).

These findings are consistent with a recent analysis of health products in the pipeline from discovery to market launch for all diseases globally, which showed that 87% of products are for non-communicable diseases and less than 0.5% where for one of the diseases on the WHO list of neglected tropical diseases.

Additional details and a multitude of iterations and combinations of the analysis presented in this paper can be explored on the WHO Global Observatory on Health R&D website, allowing for various combinations of questions to be examined together (by funder, disease, institution, etc.) [12, 13].

This information will help funders of health research explore how best to increase efficiency, coordinate investments, contribute to capacity for health research and focus on areas where there are needs and gaps. It is also of interest to researchers to explore areas where research gaps or abundancies exist among these funders, topic areas of interest and expertise among research institutions for possible future collaborations as well as main areas of interest for these funders.

The Observatory will continue to update this analysis with new data, which will allow, over time, an analysis of trends in research allocation and collaborations to be explored, including the extent to which research funding for areas where public health needs of low- and middle-income countries are greatest are covered and the extent to which research institutions in these countries are benefiting from these grants.

This paper also made an important contribution to automated data-mining methodologies applied to health data by developing and testing the hypothesis that the primary disease focus of a submission is most likely be the first-mentioned closest to the beginning of the text field. The fact that this was also applicable to the abstract is very encouraging, as almost 50% of the grants were classified using the abstract field, allowing a higher proportion of the grants to be classified. That said, the title was the most accurate field for textual data mining when it was comprehensively written.

Overall, and considering the results of the sensitivity analysis, this method provides a reasonable solution to categorise and analyse a multitude of databases by health category – this is important information for monitoring and setting priorities for new investments in health research and development. The health category and synonyms list are available on the Observatory website and will be periodically updated with new synonyms to encourage further data analysis and knowledge-sharing in this field.

As with any analysis of this type, various limitations are involved, including the small number number of funders included, the likelihood that the classification of grants by category and type did not accurately classify grants, and the fact that some funders were not able to account for all the collaborations resulting from their primary grants due to lack of information on these.

That said, the funders included in this analysis are estimated to contribute a high proportion of annual investments in health research globally, and the results of the sensitivity analysis of the data-mining method yielded very encouraging results. Therefore, these findings can be considered a reasonable indication of what is being funded by these funders and can serve as a basis for the expansion of this analysis and further improvement in funder and research grant databases. Most importantly, the findings presented here provide various insights on important resource allocation questions that we hope will assist in informing future investment decisions.

Areas for improvement in the development and maintenance of research grant databases include making available a health category field, ideally using a drop-down menu to avoid the inhomogeneous entries of text fields, that the applicants can use to categorise their submission as well as a field to categorise the type of grant into the research (with their subcategories) or non-research categories, which would tremendously contribute to the better coordination and monitoring of capacity-strengthening initiatives worldwide.


The findings presented here provide a cross-sectional view of investment decisions by 10 major international funders of health research, whose value extends beyond the actual information presented here to further stimulating the thinking about key elements, trends and tendencies in global resource allocation for R&D in general. More importantly, it highlights the persistent low investments for important public health areas such as neglected diseases (1.1%) and the very small share of international research funding going to low-income countries (0.2%). The findings, and the various other combinations of questions that can be explored through the Observatory’s data visualisations, provide new knowledge and insights as well as endless possibilities to test different patterns and relationships for all diseases or R&D areas, thus maximising the potential of learning from available data that was previously unexploited.