Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Bluetongue symptoms

Chinese Social Media Reaction to Information about 42 Notifiable Infectious Diseases


Social media have become increasingly useful in digital disease detection and surveillance, such as Twitter data analysis in experimental digital surveillance of influenza and cholera. Social media are also used as communication tools in public health, such as health communications via Twitter on breast cancer and diabetes.

Equivalent to Twitter, which is blocked by the Chinese authorities, Weibo is a popular microblogging service in China. By the end of 2012, Sina Weibo, the leading service provider, claimed to have more than 500 million registered users. Weibo users discuss a variety of social issues, including post-disaster management and politics. The Chinese health authorities also use Weibo as a communication tool. Recently, scientists began analyzing Weibo data in light of its implications in public health. For example, Weibo users’ reactions to a self-presented suicide attempt and to the outbreaks of Middle East Respiratory Syndrome coronavirus (MERS-CoV) and avian influenza A(H7N9) merited our attention. A recent study also revealed the great variety of health-related keywords used in Weibo posts.

While outbreak detection and incidence prediction are one of the goals of digital epidemiology, a good theoretical understanding of why people tweet about diseases or health conditions is largely lacking. Our understanding of online reaction to media exposure of disease information should go beyond seeing it as social media “chatter” or as spikes of search queries that reduce the predictive power of digital epidemiological tools. A better understanding will allow public health agencies to improve their health communication strategies and digital epidemiologists to better understand and model the relationship between time series of social media trends and disease incidence trends.

In this article, we extend our previous study to cover keywords associated with all the infectious diseases that were notifiable in mainland China in 2012. Qualitative content analysis was performed to identify the news and information that triggered an elevated volume of microblog traffic (peaks in daily Weibo count on certain keywords). Our results will allow public health practitioners to better formulate their health communication messages and shed light upon the underlying assumption of digital epidemiology that an increase in disease incidence may trigger an increase in social media messages related to the disease.

Data acquisition and sampling

Our Weibo data were collected via the Weiboscope project, maintained by KWF and his team in the University of Hong Kong, as described elsewhere. Initially, a list of about 350,000 indexed microbloggers was generated by systematically searching the Sina Weibo user population using the Sina Weibo Application Programming Interface (API) (Fig 1). The users were selected based on an inclusion criterion of having 1,000 followers or more when the project began data collection in 2011. The rationale for our high-follower-count samples was two-fold. First, social media users with high follower count are more influential than those with fewer followers and usually attract disproportionately large public attention. Second, this criterion can exclude the spam accounts that are very common among Chinese social media.

As previously described, the raw Chinese microblog data were acquired in Comma-Separated Values (CSV) format and sorted by week. In the CSV files, metadata, such as the post content, the created date and user ID, are available for secondary analysis. We de-identified the user IDs by converting them into a different string of characters (known as “hashing”). The user identity codes were obfuscated using an algorithm similar to base64 algorithm to generate the unique but anonymized identifier. The screen names referenced in the posts were replaced with the corresponding obfuscated user identity codes. Each file begins with its properties in the first line, followed by the record of the post. For the purpose of this study, we limited our time frame from January 1, 2012 to December 31, 2012.

Keyword detection and data analysis

Similar to our previous study, we prescribed a list of keywords for keyword search in the dataset. They were based on the list of notifiable infectious diseases in mainland China, as specified by the Law of the People’s Republic of China on the Prevention and Treatment of Infectious Diseases, as of 2012 (Table 1; viral hepatitis is counted as one disease in the law, but we counted Hepatitis A, B, C and E separately following the disease reporting scheme of the Chinese Center for Disease Control and Prevention). We then obtained a time series of aggregated daily counts of Weibo posts containing a specified keyword for each disease. We selected the day of the year with the largest number of posts pertinent to each disease and, using an online platform that we developed, obtained the content of the Chinese posts that were generated on that day. If there were two peaks of equal magnitude, we obtained the content of both. We obtained the microblog content of the whole year of 2012 for typhus (n = 14) and leishmaniasis (n = 24), given their small numbers. Three co-authors manually read the retrieved microblog content, grouped them by topics (e.g. sharing the same piece of news), performed preliminary coding, and counted them. A representative post was selected for each group. They manually searched online and identified the relevant news, events or information that triggered the elevated microblog traffic. Afterwards, the first author manually performed quality check by selecting the microblog posts of about one-third of the list of diseases and performed the grouping himself. He revised the groupings if necessary. He reviewed all the groupings and preliminary coding of the microblog contents and their representative posts, for emerging themes to develop the following coding scheme. For the major news, events or information that triggered the highest peak of daily microblog post count for each notifiable disease in mainland China in 2012, the first author coded them into the following categories (Table 1):

The representative post for each group was then translated into English to be presented in S1 File. Please refer to S1 File for detailed content analysis for the peak of daily microblog post count for each disease in 2012.

Ethics Statement

The protocol of data processing and anonymization was approved by the Human Research Ethics Committee for Non-Clinical Faculties, The University of Hong Kong, and by the Institutional Review Board, Georgia Southern University (H14167). This paper is the report of a secondary data analysis of the WeiboScope dataset of 2012. All the Weibo posts in the original WeiboScope dataset were publicly available posts. No attempt was made to inform Weibo users of the current study. The data had been de-identified before the current analysis began.

1(a) News (outbreaks)

Chinese microbloggers reacted to news about outbreaks of cholera, hepatitis C, influenza A(H1N1) and anthrax in different parts of China (Table 2). An example was the news of an outbreak of hepatitis C infection in Zijin County, Guangdong Province (Fig 2; Table 3). Allegedly, syringes in the health clinic in the township were used repeatedly. Over 90% of the posts in the peak were news posts (or re-posts). Other posts include personal comments, including the personal experience of a journalist who investigated the case and who overcame many hurdles before the report was published (see S1 File for details).

1(b) News (cases)

News of cases of human infection of avian influenza, epidemic hemorrhagic fever (in the United Kingdom), epidemic encephalitis B, scarlet fever, malaria, echinococciosis and diarrhea also attracted microblog users’ attention (Table 2). One example was a severe malaria infection in a Chinese worker who had returned from Africa; this case attracted a lot of attention (Fig 2; Table 3), including newspaper coverage (see S1 File for details).

2. Health education / information

Health promotion campaigns organized by the Chinese health authorities, such as World AIDS Day (Fig 2) and National Immunization Day for poliomyelitis could attract Weibo users’ attention. Weibo users could also share health information from other sources too, as in the case of hepatitis A, hepatitis E, measles, typhoid, pertussis, diphtheria, tetanus, scarlet fever, brucellosis, leptospirosis, mumps, rubella, conjunctivitis and filariasis (Table 2). One example is that the recommendation made by the Advisory Committee of Immunization Practice of the U.S. Centers for Disease Control and Prevention that all pregnant women should receive the Tdap 3-in-1 vaccine was circulated online by certain microbloggers in China (Fig 2; Table 3). Both peaks of posts about diphtheria and pertussis were online conversations about this recommendation. However, not necessary all health information came from formal sources. For example, many were circulated as health advice or information provided by certain popular websites or Weibo users. For example, the peaks for measles and mumps were about “Four diseases that have symptoms similar to the common cold” (see S1 File for details).

3. Alternative health information / Traditional Chinese Medicine

Alternative health information or knowledge of Traditional Chinese Medicine may attract Weibo users’ attention, as in the case of dysentery. The peak for “dysentery” was generated by a post (and its reposts) about “food that cannot be consumed together with chicken meat”, including penis et testis canis, a type of traditional Chinese medicine (Fig 2; Table 3). According to that post, consuming it together with chicken meat will lead to dysentery (see S1 File for details).

4. Commercial advertisement / entertainment

Commercial advertisements and the entertainment industry might also lead to an increase in Weibo posts that mentioned certain diseases, as in the case of plague, dengue, influenza and leprosy (Table 2). An example was an advertisement about holidays in Thailand. The travel agency gave customers mosquito repellent cream as a gift to prevent dengue (Fig 2; Table 3; see S1 File for details).

5. Social issues

Weibo users discussed various social issues, such as gender equality (gonorrhea and syphilis), discrimination against carriers of certain viruses (hepatitis B), the loss of trust between the government and the citizens (SARS), and the loss of trust between the medical profession and the patients (meningitis, schistosomiasis and hand-foot-and-mouth disease) (Fig 2; Table 2). One example was a petition to end discrimination against hepatitis B virus (HBV) carriers, addressed to the Chinese People’s Congress and the Chinese People’s Political Consultative Conference (Fig 2; Table 3). A civil rights advocate who was also an HBV carrier used Weibo to contact delegates and asked if they would submit the proposal on behalf of the China’s 100 million HBV carriers (see S1 File for details).

6. Others

News and online discussion that were not pertaining to infectious diseases might mention the infectious diseases (rabies and tuberculosis) for other reasons. An example was the so-called “Miami cannibal attack” in the United States leading to a peak on the keyword for rabies (Fig 2; Table 3; see S1 File for details).

Strength and limitations

Our study is the first to categorize Chinese microblog contents on 42 infectious diseases and also identify the related news and/or information. While another study compared Chinese and US social media use in health communication at the onset of the H1N1 pandemic, no Weibo data were analyzed therein.

In our study, we manually coded Chinese microblog posts to obtain a better understanding of the actual contents. While there are other studies that used computerized machine learning methods, or keyword search to analyze health-related Twitter contents, there are others that manually coded Twitter data on: childhood obesity, influenza A(H1N1), and antibiotics.

The Chinese microblogger community is a fraction of the 1.3 billion people living in mainland China. According to a random sampling study, Weibo users are more likely to be male and living in provinces/regions that are more economically developed (with some exceptions). Compared to the younger generation, the older Chinese are in general less likely to use the internet to seek health information. Furthermore, our dataset is comprised of the 350,000 users who had 1,000 followers or more. Of these 350,000 users, 5000 were Chinese dissident writers, journalists and scholars, and another 38,000 were users with an authenticated (also known as VIP) status with more than 10,000 followers, as discussed in the original study. While our sample constituted less than 1% of all registered users of Sina Weibo, this sampling strategy allowed us to avoid many spam accounts. The Weibo universe is highly heterogeneous. A random sampling study found that over 50% of Sina Weibo users have never posted anything, whereas 80% of the original posts were created by 5% of the users. Hence, our sample represents the most influential Chinese microbloggers who contributed a majority of Weibo contents and drew the most attention as far as reposts and comments were concerned. Nonetheless, our findings may not be generalizable to samples obtained through other sampling strategies. Our operational sampling parameters were not determined to optimize data collection specific to any diseases. Future research design that is customized for specific epidemiologic research may help reconfirm our research findings.

In the absence of content analysis of Weibo posts for each disease keyword at the baseline, and given the diversity of information that triggered the elevated Weibo activities, we did not attempt to compare across the categories of information that triggered the elevated Weibo activities. Future research along that direction is warranted.

Our analysis is only a small step towards a better understanding of the Chinese online community’s understanding and communications about health issues. Many questions remained unresolved. For example, how can we predict users’ reactions to different types of news, events or information? What, if any, behavioral changes will health information on social media bring about? What is the relationship between healthcare-seeking behavior and tweeting behavior? Our study provides the ground work for future research that will shed light upon these issues.


Infectious disease-related information triggered Chinese social media users’ reactions to create and repost microblog posts. Our content analysis categorized them into 5 categories, namely, news of an outbreak or a case, health education / information, alternative health information (or Traditional Chinese Medicine), commercial advertisement / entertainment, and social issues. Our study opens the door towards a better understanding of online health information diffusion and health information seeking behavior that will better equip digital epidemiologists to fine-tune their prediction and detection tools, and will enable health communicators to fine-tune their social media messages.

The identification of news and information pertaining to infectious diseases that trigger an elevated microblog activity and the contents thereof has utility and significance in health communications and disease surveillance. Our study covers 42 infectious diseases. We demonstrated how information and issues about these diseases drew social media users’ attention and led to increased social media traffic. Such finding, in turn, will pave the way for future studies about online health communications. This study contributes to the literature by analyzing data from Sina Weibo, the leading microblogging platforms in China, where Twitter is blocked. We analyze the reactions of a sample of influential users in a national online community to health-related news, information and social issues. This helps facilitate better social media health communication and may shed light on the “false positives” in digital disease detection. The lessons learned in our study can be transferrable to other middle income countries.