Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹COVID-19 screening

Incidence and comparison of retrospective and prospective data on respiratory and gastrointestinal infections in German households


Acute respiratory infections (ARI) and acute gastrointestinal infections (AGI) are the most common childhood infections, and they cause significant burden and costs in the health care system [1, 2]. Other household members and especially parents are at risk to frequently acquire these infections from their children. In consequence, households including children have a higher risk of suffering from infectious diseases, especially in winter seasons. Roy et al. published a systematic review of 33 studies with various designs on incidence of AGI in developed countries and reported an incidence, which varies between 0.1 and 3.2 infections per person year. For ARI, an Australian study has reported an incidence of 3.2 cases per person year in the general population. Also, ARI frequency is known to be higher in households with children aged younger than 5 years than in other households.

Data on both types of infections can either be collected prospectively or retrospectively. Whereas the latter approach may be biased due to recall problems, prospective data collection can be very complex, time consuming, and costly. Besides, loss to follow up is a further disadvantage of prospective studies, which may occur already at 1 month of follow up. Recently, Viviani et al. compared the incidence estimates of AGI in their retrospective telephone survey with another study with prospective data collection during the same time period, and found higher incidence estimates in the retrospective data collection for the general population. But publications explicitly comparing prospective and retrospective data collection for AGI in the households with children attending day care are scarce. Furthermore, no publications comparing both forms of data collection in households with children are available for ARI. Therefore, the aim of this study was to estimate the incidence of respiratory and gastrointestinal episodes in German households with children attending day care and to compare prospective and retrospective data collection in this setting.

Recruitment and study population

We conducted a 4 month prospective cohort study in the winter period 2014/2015 and recruited participants in 75 of 151 day care centres (DCCs) in Braunschweig, Lower Saxony, Germany. Initially, we contacted all DCCs in Braunschweig via a letter and a subsequent call. Around 50% of the contacted DCCs agreed to participate. In some of these DCCs, we got permission to invite parents to our study when they came to pick up their children from the DDC or during organisational meetings of the parents; in others, we only provided them flyers and these were distributed by the child care workers. We invited parents of children aged 0–6 years with their whole households for participation. Data on incidence of infections was collected between November 2014 and March 2015 for all household members. Participants in one study arm used a prospective health diary for documentation of infections for the whole study period, while those in the second study arm reported infections retrospectively. Participants could opt for participation in either study arm; the prospective data collection additionally included collection of nasal swabs in case of respiratory symptoms.


In the prospective study arm, one parent filled infection episodes for all household members in the diary, including start and end of each episode. Participants in the retrospective study arm reported the total number of infections for every household member in the last 2 months in questionnaires provided at 2 months and 4 months after beginning of the study. Definitions of ARI (based on Lambert et al. [12, 13]) and AGI (based on WHO definition of diarrhoea) were provided in both groups at the start of the study (see Additional file 1 for details). The definition of a new episode required at least 3 consecutive days without symptoms following the previous episode. Socioeconomic data on household level and demographic data on all household members were also collected.

Ethics statement

The study protocol was approved by the Ethics Committee of Hannover Medical School (No. 2380–2014) and reviewed by the Federal Commissioner for Data Protection and Freedom of Information. Written informed consent was obtained from all participants.

Statistical analysis

We used tabulation to describe the study population and performed chi-squared test as well as Wilcoxon rank-sum test to assess differences between the two study groups (p-value < 0.05). Poisson regression was used to model monthly (30 days) incidence rates with an offset (person days observed) to adjust the denominator in the rate calculations and allowing for over-dispersion (using quasipoisson distribution for the generalized linear model (GLM)). The covariates of interest were the variables indicating the month and the study arm. The offset for the prospective group were the actual number of days observed for the individual in a given month. In contrast, for the retrospective group, total days in the respective month were used as offset. In the retrospective group, the exact date of onset of the disease was not known. Therefore, for the analysis considering specific months, we had to randomly assign the disease episodes to the 2 months before the return of the questionnaire. We chose an allocation scheme giving equal probability weights for 2 months, if the questionnaire was received in the first half of the month (½ and ½ for the two previous months respectively). When the questionnaire was received in the second half of the month, we allocated the episodes across 3 months giving double probability weight to the middle month, and equal weight to the remaining two half months in consideration (¼, ½, and ¼ for the 3 months, respectively). We also allocated questionnaires that arrived in the second half of April to 3 months (¼ February, ½ March and ¼ April), but since the prospective study was only conducted until March, we deleted the observations for April (10 ARI episodes and 2 AGI episodes). Based on observations from previous studies that participants might be more likely to join the study if they currently have an infection, we removed initial observations in the prospective arm, if the participant reported symptoms for these days, so that every individual started the observation with a healthy day. In total, this led to a decrease of 19% in the number of ARI episodes and 2% in the number of AGI episodes for November and December. Since infection episodes might also have triggered participation in the retrospective arm of the study, we removed the same fraction of infection episodes as in the prospective study arm. This was done by removing randomly the episodes from the first questionnaires, which would be potentially allocated to November and December in the retrospective group. The removal was done according to the age specific reduction that was observed in the prospective group due to starting with a healthy day for every individual. For ARI, the age group specific episode reduction was 9%, 11%, 13%, and 9% in the age groups 0–3 years, 4–6 years, 7–15 years and over 15 years, respectively. For AGI, the age group specific episode reduction was 4% and occurred only in the group of 0–3 years.

In a sensitivity analysis, equal probability weights (1/3, 1/3, and 1/3) were used for all 3 months when the questionnaire arrived in the second half of a month. When the questionnaire arrived in the first half of the month, the initially used probability weights remained unchanged. In a further sensitivity analyses, we removed the observations for the first 5 days for every individual in the prospective group, similar to Bayer and colleges, and proportionally decreased the infection episodes from the retrospective group in a similar way as described for the original comparison. Incidence rates per person year have additionally been calculated for comparison with existing literature.

Study population

Approximately 4300 children attended the 75 DCCs included in the study, but due to different recruitment strategies, we do not know the exact number of parents invited to participate. In total, 341 families indicated interest to participate in our study. Of those, 95 families finally agreed to participate in the prospective group and 108 families in the retrospective group. Health diaries were returned from 77 households (with 282 household members), and retrospective questionnaires on the frequency of infections from 100 households (404 household members). In the prospective group, participants provided data for 32,739 days (nearly 100%) out of 32,798 expected days; in the retrospective group, 180 questionnaires were returned (90% of first questionnaires, 86% of second questionnaires). The median household size was four, and in most of these households at least one parent had a higher education (university degree). There were no differences in terms of parental education or age and sex composition of the households between the prospective and the retrospective study groups (Table 1).

Incidence of ARI and AGI episodes

Altogether, 1320 episodes of ARI were reported: 558 in the prospective and 762 in the retrospective group. For AGI, there were in total 331 episodes reported: 146 episodes in the prospective group and 185 in the retrospective group. After excluding episodes at the beginning of the records, 507 ARI episodes and 144 AGI episodes remained in the prospective group. After applying the proportional correction in the retrospective group, there were 691 ARI episodes and 183 AGI episodes remaining (Table 2).

The incidence for ARI varied by calendar month and ranged from 0.44 per person month to 0.54 in the retrospective group and from 0.36 to 0.63 in the prospective group. The mean monthly incidence for the retrospective group was 0.52 as compared to 0.47 for the prospective group. Incidence of ARI showed a decrease with increasing age of participants for both study arms, but was still substantial among adults (Table 2). After adjusting for the calendar month, the retrospective study arm was associated with a similar incidence as the prospective study arm (incidence rate ratio (IRR) for retrospective versus prospective data collection 1.11 [95% confidence interval (CI) 0.99; 1.24], p = 0.42) (Fig. 1).

The incidence for AGI also varied across the calendar months, ranging from 0.10 to 0.20 per person month in the retrospective study arm, and 0.08 to 0.22 in the prospective study arm. The mean monthly incidence for the retrospective group was 0.14 as compared to 0.13 in the prospective group. Also for AGI, the age specific incidence showed a general decrease with increasing age of children, with still substantial incidence among adults (Table 2). After adjusting for the calendar month, the prospective study arm was associated with a similar incidence as in the retrospective study arm (IRR 1.10, [95% CI 0.89; 1.37], p = 0.26) (Fig. 1).

Interestingly, the incidence estimates for all age groups were comparable between the prospective and the retrospective study group, except for participants in the age group “7–15 years”. They had an approximately 50% lower incidence in the prospective group compared to the retrospective group for ARI as well as for AGI, but the result was only statistically significant for ARI (p = 0.02).

Young children (0–3 years) had the highest incidence of infection (for both AGI and ARI), with similar results in the prospective and the retrospective group (IRR for the retrospective group with respect to the prospective group, adjusted for the age groups, was 1.11 ([95% CI 0.96; 1.28], p = 0.26) for ARI, and 1.07 ([95% CI 0.87; 1.31], p = 0.59) for AGI).

For all sensitivity analyses, the results with respect to the data collection mode were similar for both, ARI and AGI, and did not show a statistically significant difference between the two study groups (prospective versus retrospective).


In a longitudinal study on ARI and AGI in households with preschool children comparing prospective and retrospective data collection, we demonstrated that for short recall periods (2 months) both designs produce similar incidence estimates.

As expected, AGI had a lower incidence in our cohort than ARI. While children had a higher incidence for ARI and AGI, there was also a substantial burden among older household members. For AGI, our incidence estimates over all age groups are consistent with the previously reported data by Roy et al. (0.1–3.2 per person year). In our study, we did not include fever as a symptom for ARI, what might lead to lower overall incidence estimates. Different disease definitions for ARI are used in literature, e.g. including fever as a defining symptom [12, 13, 18, 19] or not [20–22]. Use of different definitions can lead to considerable differences in the reported incidence. Our results on the incidence (per person year) of ARI (5.7 in the prospective and 6.5 in the retrospective arm) were higher than obtained in an Australian study by Chen and Kirk (3.2) in the general population. The higher incidence in our study can be explained by a higher proportion of families with at least one child attending day care, because these children have a higher risk for respiratory infections [24–26]. It was also observed that households with children have a general higher risk for respiratory infections. Moreover, we conducted our study during the main season for respiratory infections in the Northern hemisphere (November–April).

Studies focusing on the comparison between prospective and retrospective data collection are scarce. In 2014, van der Steen and colleagues called for more methodologic studies in the context of dementia research, which compare prospective and retrospective data collection. The recent study by Viviani et al. found higher incidence estimates for gastrointestinal infections in the retrospective design in studies using recall periods of 7 and 28 days, compared to a prospective study during the same time period. Recall intervals in our study were even longer (2 months) than in the study by Viviani et al. (7 or 28 days) and might therefore not be comparable. We used a more specific definition for AGI in our study (Additional file 1) than the one in the UK, additionally contributing to the insufficient comparability. But despite these differences, we did not find a lower overall incidence of gastrointestinal infections in our study. However, as mentioned in the context of ARI, we conducted our study during the winter months and included only households with children attending day care, elucidating higher overall incidences. Regarding the recall intervals, Viviani et al. also found that the frequency of disease reports was a lot lower in their 28-days recall group than in their 7-days recall group (rate ratio comparing incidence in 7-day and 28-day recall groups: 2.9). This difference might be explained by people forgetting illness with time, as it was already shown by others. As we used a recall period of 2 months, it might explain why our retrospective results are not significantly higher than our prospective results. We did not find any comparable study which compares the mode of data collection in terms of ARI. In general, the incidence of ARI is a lot higher than for AGI, but AGI episodes result in a greater burden of health care resources. But this is another comparison, requiring a separate investigation, which was not conducted in our study.

To our knowledge, our study was the first of its kind, investigating the differences between prospective and retrospective data collection for both, respiratory or gastrointestinal infectious diseases in the household setting. However, our study was subject to some limitations: Participants in our study were not randomly allocated, but could opt for either the retrospective or the prospective mode of data collection. Since the prospective data collection included sampling of biomaterials, one could expect that those more dedicated opted for the prospective data collection. Still we did not observe a higher incidence in this group and did not find differences between the two study groups regarding socio-demographic characteristics. Another point is that the participants in our study knew in advance that they will receive a questionnaire on the frequency of infections after 2 and after 4 months of the study period, which could enhance their reporting. Results might be different in simple one time surveys, where participants are asked to report previous episodes without announcing the inquiry beforehand, because they might remember previous diseases as more recent and report them incorrectly in the enquired time period. Furthermore, we did not study the question if symptom severity can be also recalled over the recall period, so if this information is required, additional investigation is necessary.


In conclusion, retrospective (over 2 months) and prospective data collection in our study resulted in similar incidence estimates in case of ARI and AGI. Considering that participants will be informed about coming future recall periods, retrospective data collection might be an adequate method in the investigated setting, if the recall intervals are not too long and no need for bio sampling or details on the severity of diseases exists.