Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
.
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Camel flu risk

Musings on privacy issues in health research involving disaggregate geographic data about individuals

Definitions–the security-confidentiality-privacy triad

In micro-scale geographical analyses involving data about specific individuals, data security, confidentiality and privacy form an intertwined triad. A recent US CDC (Centers for Disease Control and Prevention) foundation course on public health law defines privacy as the "individual's right to control the acquisition, use and disclosure of their identifiable health information". The same course goes on to define confidentiality as the "privacy interests that arise from specific relationships (e.g., doctor/patient, researcher/subject) and corresponding legal and ethical duties", and then describes security as the "technological or administrative safeguards or tools to protect identifiable health information from unwarranted access, use, or disclosure". To explain the relationships between the three terms, the course quotes a key sentence from Ware: "If the security safeguards in an automated system fail or are compromised, a breach of confidentiality can occur and the privacy of data subjects invaded".

Mainstream media and public interest in the subject

Actual or potential breaches (technological or legal) of data security and confidentiality and the subsequent actual or potential invasions of individuals' privacy are quite commonly reported in mainstream media. For example, in March 2009, the Joseph Rowntree Reform Trust published its 'Database State' report on the legality, safety and effectiveness of the British government's major database systems. Of 46 databases assessed in this report, only six were found to have a proper legal basis for any privacy intrusions and were deemed proportionate and necessary in a democratic society. The report authors concluded that two NHS (National Health Service) systems, the Detailed Care Record (DCR) and the Secondary Uses Service (SUS), were almost certainly illegal and that a number of others including the Summary Care Record (SCR) would be legal only with patient consent, but, with the current absence of an effective opt-out, it too was almost certainly illegal.

We also read the story of an anonymous Canadian girl whose death was associated with a prescribed acne drug. She was eventually identified by the media who compared the de-identified prescription data set against obituaries. The comparison helped in narrowing down the search to four possible girls, then by contacting all families the right one was found.

High-profile security breaches (e.g., data loss or theft of ill-protected confidential data) are also not uncommon. For example, it was reported in May 2009 that a laptop containing non-encrypted data (names, addresses, dates of birth, employers, national insurance numbers, salary information, and bank details) of 109,000 UK pensioners has been stolen. The data were merely password-protected, and possibly without any appropriate safeguards for data self-destruction in case of brute-force password attacks. It is very easy to find out passwords in a short time using common hardware, e.g., NVIDIA CUDA GPUs (Compute Unified Device Architecture Graphics Processing Units), and readily available software, or even to completely bypass the passwords and directly access the underlying non-encrypted data.

In public health worldwide, any public identification of an individual's health status and address, regardless of contagion level or risk, is usually prohibited. But individual privacy rights must also be balanced with legitimate public concerns and interests. The publicly-accessible, online mapping of SARS (Severe Acute Respiratory Syndrome) in Hong Kong a few years ago using disaggregate case data at individual infected building level in near real time was one of the noticeable exceptions to the well-established public health confidentiality rule.

Research literature: location privacy concerns and solutions

The biomedical and public health literature on geographic information systems (GIS) and spatio-temporal analyses features a large number of research papers mentioning or addressing location privacy, e.g.,. A must-read paper (not specifically health-related) dating back to 1994 shows how chronic privacy issues are in GIS research. Some research papers identified privacy as a potential or actual issue of concern (e.g., in reproductive health research; in birth defects surveillance and research; in research relevant to policy on diet, physical activity, and weight; in environmental health research; and in health and social care planning), while others went one step further by suggesting some comprehensive solutions (e.g.,), workarounds, or frameworks and principles of practice (e.g.,) to mitigate or resolve these privacy concerns.

A number of confidentiality-preserving statistical and epidemiological data processing methods (data aggregation and transformations) have been proposed that can be applied to original location data to preserve individuals' privacy while maintaining some acceptable level of data usefulness for geographical analyses. But the use of precise addresses will continue to be needed in many cases to improve data analysis results or make them possible at all. The famous John Snow's map of the 1854 Cholera outbreak in London only solved the problem because the unique locations of individual cases were known. There will always be this implicit trade-off between privacy concerns (e.g., easiness of re-identification) and the types and accuracy of the results of geographical health analyses that are possible with a given data set (original, unaltered vs. transformed or aggregated data). And that is where software agents can offer a potential solution that preserves the full fidelity of the original data.

Moving beyond conventional GIS research and geographical analyses, mobile phones and other electronic gadgets are rapidly gaining location awareness and wireless Web connectivity, thus promising new spatial technology applications and services (e.g.,), which will yield vast amounts of spatial information and online maps that can even reveal users' whereabouts in real time. These novel spatial tools and services are certainly opening many new useful possibilities, but are not without their challenging security and privacy concerns.

The 'missing ring': data security

Data security is relatively under-mentioned in discussions about confidentiality-preserving solutions for location data, despite its key importance in the aforementioned security-confidentiality-privacy triad. Consider the following scenario: a health GIS researcher has legitimate and IRB (institutional review board)-approved access to patient data containing precise geographical identifiers for analysis and reporting purposes, with full patient consent. The reporting is done in ways that do not identify individual patients when posting publicly-accessible/online results and maps. If the reporting must be made at some level of detail or granularity that can potentially identify individual patients, the results are only shared within approved, small teams of users with legitimate access rights and 'need to know'. The whole scenario seems fine as far as the protection of individuals' privacy is concerned. IRB approval has been sought, adequate reporting methods and policies are in place to prevent the disclosure of any confidential data to non-authorised parties, and we even have the patients' explicit consent to conduct the study. However, without appropriate additional security safeguards, there will always be many unmitigated risks of data theft or loss and of unwanted data disclosure to non-authorised or non-authenticated parties, all of which can compromise the privacy of the data subjects. (Ideally, IRBs should be scrutinising the security component as well before granting approvals.)

A carefully blended, purpose-built combination of overlapping security measures is always the solution, depending on the type, sensitivity, value, and risks/costs assessment of the data to be protected. Various types of advanced cryptography, multimodal biometrics, and other methods can be combined, as necessary. Data access can also be controlled or restricted in such a way that two or more persons must be physically present each time and authenticated (e.g., via biometrics) to unlock the data. Security measures cover and include, among other things, ensuring physical building security, using computer security cable locks, using computers with a built-in TPM (Trusted Platform Module) chip for cryptographic functions, performing full disk encryption with TPM (e.g., using BitLocker), implementing brute-force password attack protection (data are automatically erased after a pre-set number of failed access attempts), using hardware/software firewalls and other forms of network security, implementing adequate access policies and authentication (at computer BIOS–Basic Input/Output System level, Operating System-level, and application level), considering Multilevel Security (MLS), using biometrics (e.g., fingerprint readers and facial recognition), using advanced secure USB flash drives with military grade hardware encryption (e.g.,) instead of ordinary flash drives, keeping detailed data inventories and electronic audit trails of all accesses and transactions, blanking of computer display and machine locking or auto-log-off if a machine is left unattended, and the secure decommissioning and discarding of old equipment and data storage media, e.g., using software utilities like SDelete to prevent the kind of issues described in. Also equally important are staff training and the development of a 'security culture' in the organisation, e.g., guided by ISO/IEC 27002 2005 (formerly ISO/IEC 17799 2005), an ISO (International Organization for Standardization) standard for information security and a code of practice for information security management.

Personal information and privacy legislations

A discussion on location privacy solutions for health research would be incomplete without reflecting on some of the underlying reasons that necessitate their development. The very notion of privacy is itself a complex fabric of interwoven philosophical and psychosocial threads. Perhaps this is why the associated bureaucratic and legal landscape is as complex as it is – and often blamed for the issue. A large majority of public health professionals consider privacy to be an obstacle to public health; when asked for the underlying reasons, survey respondents in Canada and the UK most commonly identified bureaucracy and legislation.

There is no universal legislation to guide and govern the activities of public health professionals, particularly where issues of privacy are concerned. Instead, nations have their own constraining or enabling privacy and data protection laws, with some being such a maze of cross-referenced "legalese" that familiarising oneself with them – let alone gaining a thorough understanding of them – becomes a daunting task. 'Additional file 1' provides a brief compilation and comparison of relevant personal information and privacy legislation in Canada and UK, with particular focus on location and public health as seen and understood by an epidemiologist.

Some recent research projects and events about the subject

The issues of location privacy were also the subject of GeoPKDD (Geographic Privacy-Aware Knowledge Discovery and Delivery), a three-year EU-funded project that was recently completed in November 2008. GeoPKDD's main research question was 'how to discover useful knowledge about human movement behaviour from mobility data (e.g., location data from mobile phones), while preserving the privacy of the people under observation?' The project attempted to develop new privacy-preserving methods for extracting knowledge from large amounts of raw data about individuals referenced in space and time. GeoPKDD organised the 'First Interdisciplinary Workshop on Mobility, Data Mining and Privacy: Preserving anonymity in geographically referenced data' on 14 February 2008 in Rome, Italy.

Another research activity worth mentioning in our context is the proposal by Helen Chen at Agfa HealthCare in Canada and her colleagues at the World Wide Web Consortium–W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLSIG) to explore Semantic Web solutions for patient data security, confidentiality, consent and privacy (in general, i.e., they are not focusing on location privacy, but their proposal is still broadly relevant to our topic). Previously sufficient de-identification techniques can be rendered inadequate because it is now possible to re-identify an identity via inference on the Web. Semantic Web technology is making headway to even more powerful data links, connections and inferences of this type. However, in the healthcare domain, this very success of the technology is putting individuals' privacy at much greater risks. Chen's idea is to develop novel privacy-preserving solutions by harnessing the very same Semantic Web technology that can exacerbate these privacy risks.

From 5–8 June 2009, the Urban and Regional Information Systems Association (URISA), a non-profit American association of professionals using GIS and other information technologies to solve challenges in state/provincial and local government agencies and departments, organised its Second GIS in Public Health Conference in Providence, Rhode Island, USA. One of the pre-conference workshops held on the 5th of June 2009 focused on issues related to 'Protecting Privacy and Confidentiality of Geographic Data in Health Research'. Select highlights from this workshop are presented in the remaining part of this article.

Emerging issues

There are three emerging spatial confidentiality topics of concern. The first involves Google Street View, an excellent research tool that allows us to "see" areas that are described or mapped in publications. The implication this has for reengineering is the ability to see potential candidates within an area. If we again think of the "bulls eye" effect within a smoothed surface, if this area has been driven by the Google Street View team (and thankfully at this point areas of sparse geography also tend to be the least covered), we could literally view each option within the central pixel until a house match is found. Even with multiple alternatives, it might be possible to spatially prioritise the potential buildings based on characteristics of the health conditions, or other information gleaned from the paper. For example, is the disease more typically associated with a multi-family unit than a single residence?

The second area of concern involves the use of biometric sensors synched to a GPS (Global Positioning System) unit. This field of research offers great potential in terms of linking health outcomes to the fine-scale built environment. However, a fear expressed at the URISA workshop was that output from these devices, usually shown as a series of dots on an aerial photograph, will begin to accompany research papers. Sure enough, within one day of the workshop a new issue of a GIS journal published this exact output. The underlying aerial photograph makes reengineering from the image extremely easy, and the point concentrations from the GPS unit correspond to areas of highest activity, including the home. This is not a good situation, especially when the participants are part of a vulnerable population, such as children.

Finally, we are worried about the current trend by social scientists of including spatial data in their research, especially those who use mixed methods. A mixed method approach combines both qualitative and quantitative data. For example, spatial video data of the recovering neighbourhoods of New Orleans, LA, USA, are currently being collected. These data are extracted from the video as three-dimensional surfaces that can be mapped or analysed for recovery or abandonment. At the same time, videos of the narrative of the neighbourhood participants add further commentary to the surfaces, such as why a building has not been returned to. Many of these comments contain sensitive information such as the health of an owner. If we map this information, others could easily disseminate it through online consumer geoinformatics services like Google Earth and Google Map, and even link it using suitable geo-mashups to other readily available online information about the individuals concerned (e.g., on social networking sites), thus revealing a more detailed picture about them. Do our subjects really know all what could be exposed through such mapping? (But one should also consider the difference between what is technically possible and what is practically likely to happen, i.e., will there really be someone with the motive, will and ability to do these privacy threatening Web inferencing and mapping exercises in each and every case? A risks-costs-benefits assessment might help in such situations.) Although these situations may not fall foul of any HIPAA standard, nor probably concern an IRB, we are now at a point where changing geospatial technologies must stimulate debate that goes beyond the normal community participatory ethical standards used by researchers.

Because of the widespread adoption of GIS-light Internet applications, and cheap and easy-to-use mobile mapping devices (for example, ones which can tag pictures with coordinates), health related spatial confidentiality is now no longer the concern of only geographic information scientists, or even GIS users, but also of a far broader range of academics and other people.

Conclusion

Although the general public's concerns about privacy in research have sometimes been exaggerated by the scientific community (and by a few vocal privacy advocates in the media, who do not adequately represent the position of the wider masses), we believe there are still many cases where these concerns are real and legitimate, and where data subjects need to be protected (e.g., from identity theft). A 'one-size-fits-all' privacy-preserving solution is unlikely to be successful or to be able to capture and properly address the complex requirements, which might also vary from country to country, of the very many (i) user roles, with different access privileges and 'needs to know' in relation to various input and output data types; (ii) intra and extra-mural data sharing arrangements, especially when data need to be moved across heterogeneous organisations; (iii) governing legislations and policies; (iv) possible forms of data inputs that can be released for research and the associated conditions; (v) health study types and goals, data analysis methods and the data requirements in each case; (vi) possible study outputs/results reporting and publication forms (closed or public); (vii) situation-specific security risks; and (viii) risks-costs-benefits assessment, among other aspects and requirements that are involved in this area of research and need to be considered on a case-by-case basis.

Different privacy-preserving solutions can be applied concurrently or singly on various elements of this complex chain, e.g., on input data prior to release to researchers (e.g., aggregation or transformations) and/or on the research outputs (e.g., access restriction or masking), depending on the specific situation at hand; so a comprehensive, context-aware approach is needed to assist researchers in choosing and applying the right solution(s) in each case.

Kamel Boulos (unpublished research notes, 2008–2009) proposed the development of a case-based reasoning software framework (cf. case law) that covers, and continuously "learns from", the growing body of possible and emerging health research scenarios and applications involving precise geographical identifiers about individuals. The goal of such a 'one stop shop' framework would be to streamline the provision of clear and individualised guidance for the design and approval of new research projects, including crisp recommendations on which specific privacy-preserving solution(s) and approach(es) would be suitable in each case. This would spare researchers and IRBs the need to 'reinvent the wheel' with each new study, saving them precious time and efforts spent investigating the same issues every time, and preventing avoidable errors and omissions along the way. This decision framework should ideally have an easy-to-use, wizard-based visual frontend, guiding users throughout the whole process of describing and diagnosing their needs, and proposing (with appropriate explanations/justifications) suitable solutions to address them.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MNKB conceived and drafted the manuscript, and conducted a mini-survey of the literature on the subject. AJC provided material on URISA's workshop held on 5 June 2009 about location privacy issues in health research. PA contributed material on 'personal information and privacy legislations', including 'Additional file 1'. All authors read and approved the final manuscript.