Privacy and Health Research in a Data-Driven World

Preparing to load PDF file. please wait...

0 of 0
100%
Privacy and Health Research in a Data-Driven World

Transcript Of Privacy and Health Research in a Data-Driven World

Privacy and Health Research in a
Data-Driven World
An Exploratory Workshop Sponsored by the Office for Human Research Protections (OHRP), Department of Health and
Human Services (HHS)
September 19, 2019

Privacy and Health Research in a Data-Driven World
OHRP Exploratory Workshop: September 19, 2019
Welcome and Introduction............................................................................................................................ 3 Session I: Is Privacy a Casualty of Advancing Research? ............................................................................ 3
Session I Introduction ............................................................................................................................... 3 Unexpected Forms of Risk in Data Science/Artificial Intelligence Research .......................................... 4 Striking a Balance: Benefit-Risk Analysis for Big Data Research ........................................................... 5 Understanding Individual Privacy ............................................................................................................ 6 Panel Discussion for Session I .................................................................................................................. 8 Session II: Approaches to Protecting Privacy & Confidentiality................................................................ 11 Session II Introduction ............................................................................................................................ 11 Transforming Health Measurement and Care Delivery Through Patient-Generated, Permissioned Data from Daily Life ..................................................................................................... 11 The Vivli Experience in Sharing Clinical Trial Data Globally............................................................... 14 The Use of “Differential Privacy” as a Statistical Method for Protecting Confidentiality in Data Publications............................................................................................................................................. 15 Panel Discussion for Session II............................................................................................................... 16 Session III: Protecting Privacy & Confidentiality: A Shared Responsibility ............................................. 19 Session III Introduction........................................................................................................................... 19 IRBs and Big Data Research................................................................................................................... 20 A Framework for Ethics Committees for Reviewing Research Protocols with Privacy and Confidentiality-Related Risks in Electronic Environments .................................................................... 22 Facing the Future: Operational Solutions to the Regulatory Challenges of Big Data Research ... 24 Ethical Considerations for the Review of Big Data Research Beyond the Common Rule..................... 25 Shared Responsibility in Ethical Big Data Research .............................................................................. 26 Panel Discussion for Session III ............................................................................................................. 28 References................................................................................................................................................... 32 Online Resources ........................................................................................................................................ 32

Privacy and Health Research in a Data-Driven World

2

Welcome and Introduction
• Jerry Menikoff, M.D.; Director, Office for Human Research Protections (OHRP) • Yvonne Lau, MBBS, MBHL, Ph.D.: Director, Division of Education and Development, OHRP
Dr. Menikoff, Director of the U.S. Department of Health and Human Services (HHS) Office for Human Research Protections (OHRP), welcomed everyone to the meeting and thanked OHRP staff from its Division of Education and Development (DED) for planning the workshop. He explained that OHRP’s goal is to provide an open forum to explore critical issues. Today’s world is one in which unimaginable amounts of data are used for research purposes, including data collected in connection with clinical care. Members of the public also generate large amounts of data. There is “tremendous potential to use these data to benefit everyone,” including the opportunity to discover new insights into treatment and innovative ways to support health. However, there is no consensus about the best ways to collect, store, and share these data ethically. OHRP staff are here as listeners and to collaborate with colleagues to find the best ways to support ethical research.
Dr. Lau, Director of DED, also welcomed everyone to the workshop. She explained that DED’s mission is to conduct public outreach and education. The workshop is on a timely topic and features a panel composed of diverse experts. It begins with an exploration of the landscape of big data research and the challenge of promoting public good while respecting privacy concerns. The second session provides an opportunity to hear how various entities are addressing these challenges. The workshop concludes by highlighting roles and responsibilities of different entities in identifying and responding to challenges. Each session features an hour of discussion involving all members of the panel.
Note: This report provides only highlights of speakers’ presentations. For an in-depth look at what each speaker had to say, please see the presenters’ slides.
Session I: Is Privacy a Casualty of Advancing Research?
• Moderator: Jodi Daniel, J.D., M.P.H.; Crowell & Moring LLP
Session I Introduction
• Jodi Daniel, J.D., M.P.H.; Crowell & Moring LLP
The goal of this session was to explore the problem of privacy protection in a data-rich world and consider the tensions that exist between the societal good that could come from big data research and the real and perceived risks to individuals, as well as the public’s perspectives about broad data sharing. Ms. Daniel explained that speakers will consider the general landscape at present and the privacy and ethical considerations raised by current opportunities to use “big data,” such as concepts of privacy, data ownership, the types and goals of data research, and how to maintain public trust.

Privacy and Health Research in a Data-Driven World

3

Unexpected Forms of Risk in Data Science/Artificial Intelligence Research
• Jacob Metcalf, Ph.D.; Data & Society Research Institute
Dr. Metcalf explained that the term “big data” is a vague term used by industry, but it is not helpful or accurate. He prefers the term “pervasive data.” These data are not just bigger. They bridge multiple dimensions of a person’s life. They can be collected in real time and coordinated among sensors, mix varieties of datasets (public/private, research/commercial, identifiable/de-identified), and reveal intimate details about individual lives. Machines, he observed, don’t respect boundaries between our different selves, but will take and analyze everything they can access. It is the nature of machine learning to “jump” domains and make predictions that apply what it knows from one domain to make an inference in another.
Most pervasive data science research uses methods and types of data that exempt it from the regulatory framework, including the requirement that it be reviewed by an Institutional Review Board (IRB). It typically uses pre-existing “public” datasets, including data borrowed/bought/gleaned from Internet services. These data are usually sufficiently de-identified to qualify as “exempt” under the Common Rule. The risks posed by these data are generally downstream, unlike the harms that IRBs operating under the Common Rule are used to considering. Also, pervasive data research does not require an “intervention” as defined by the Common Rule and involves the use of tools that are so prolific that literally everyone may be a research subject and anyone can be a researcher. Further complicating the issue of oversight is the fact that such researchers may not be associated with institutions that could provide mechanisms for ethical review of their proposed activities.
The heart of pervasive data research is the development of models that can predict behavior. The “creepy” factor, as many see it, is that these models can also be used to influence behavior. The findings do not apply only to the knowing or unknowing “subjects” whose data were used in developing the predictions, but also to others who never contributed data and can now be targeted with specific messages, such as political or sales pitches.
Dr. Metcalf reviewed the genesis and evolution of the “research ethics scandal” involving Cambridge Analytica (CA). Briefly, the research made use of viral quizzes on Facebook – presented as fun personality tests – to collect the “likes” not only of participants who took the quiz, but of all their friends. This information could be used to predict voting preference, among other traits. No academic research was ever published as a result of this research. The proposal for the research was approved within a day by a university IRB, pointing to the need for better understanding of the implications of this type of research among IRB members. The models developed through this research – highly portable and economically valuable – were used by Cambridge Analytica in the 2016 campaign and are still held today by CA’s successor company, Emerdata.
The Cambridge Analytica episode has many hallmarks of a research ethics scandal in the age of pervasive data, which Dr. Metcalf enumerated as follows:
● Metrics jumping between domains, e.g., psychiatry to social media profiles to electoral data, ● Research that is exempt under Common Rule for narrow technical reasons, ● Blurred lines between academic and commercial research, ● Use of Application Program Interface (API) tools intended for commercial and advertising
purposes to gather data for academic research, ● Abuse of mTurk workers (workers accessed through an Amazon crowdsourcing mechanism), ● Deceptive/opaque recruiting tactics for human subjects – a strong signal of unethical research,

Privacy and Health Research in a Data-Driven World

4

● Predictive population models as research output become tools for intervention in individual lives, and
● Downstream effects nearly impossible to imagine because the models are highly portable and far more valuable than the actual data.
Striking a Balance: Benefit-Risk Analysis for Big Data Research
• Brenda Leong, CIPP/US; Future of Privacy Forum
Ms. Leong’s remarks explored the tension between the need for beneficial research using big data and the responsibility of investigators to respect privacy and maintain public trust. There is a growing awareness of the “creepy factor” alluded to by Dr. Metcalf, and new ways of restricting such research are being explored. For example, the California legislature made several amendments to the California Consumer Privacy Act (CCPA) in September 2019, including expanding the definition of “data broker.” A data broker, “a business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship,” must register annually with the state Attorney General, and increasing monitoring and controls by the state are possible.
In this new context, what is research? In contrast to traditional research, big data research often involves the use of data that may be collected as we live our lives rather than through a defined intervention. It is more difficult to draw a line between research and product development (for example, tools to influence behavior based on profiles). When do various legislative requirements come into play? Does this depend on the source of data, where the research is conducted, or other factors?
Ms. Leong enumerated the wide variety of harms that could result from this type of research. Individual harms (as opposed to collective or societal harms) may include loss of opportunities for employment, social benefits, insurance, housing, or education. For example, algorithms may be used to show females or other specific groups different types of job opportunities. Economic opportunities may vary, with algorithms employed in the service of credit discrimination and differential pricing. Social detriments might include dignitary harms, even surveillance and loss of liberty.
Harms may also result from unintended leakage of information. A model’s output may be used to recreate the model and discern the identity of specific individuals. Behavioral harms or attacks could involve manipulating the model directly to discriminate against groups based on specific factors. Collective harms may create risks and impacts for people whose data were never included in developing the model; nevertheless, information about them can be inferred. Increasingly powerful predictions can affect those outside the model’s original “ecosystem” (e.g., Facebook). For example, an application might be developed to evaluate any person’s emotional state in a non-research setting without their knowing participation. “Consent” is no longer a sufficient control. A new framework is needed to assess risks. (See, for example, a 2018 white paper developed by the Future of Privacy Forum: Beyond Explainability: A Practical Guide to Managing Risk in Machine Learning Models.)
Analysis of specific projects involves examining complex tradeoffs of potential harms and benefits for individuals, groups, and society. It is necessary to document initial objectives and underlying assumptions and identify undesired outcomes. Models must be evaluated to determine their purpose and assess the accuracy, transparency, fairness, and potential applications of algorithms. Review must also attempt to foresee the significance or personal impact of deploying the model on the organizations that use it, end users, third parties, individuals, and social systems. Another important consideration is whether the data produced are really accurate and reliable, since use of the models in some contexts may have serious consequences (for example, facial recognition models used to provide evidence of criminal activity).

Privacy and Health Research in a Data-Driven World

5

Tools do exist for facilitating this type of analysis and providing protection to unwitting “subjects.” Privacy Enhancing Technologies (PETS) can, for example, help people protect personally identifiable information (PII) online. “White Hat/Red Team” hacking exercises may be useful in identifying data vulnerability. Business processes can develop multiple lines of defense within organizations. Law and policy can also provide some controls on egregious misuse of models, as well as guidelines for what is acceptable. For example, the U.S. Department of Homeland Security has developed Fair Information Practice Principles (FIPP) to guide its privacy program. Ms. Leong assured listeners that people are working hard to find ways of monitoring trends to identify concerns and to find the best way to allow legitimate profits and benefits from this type of research without enabling exploitation.

Understanding Individual Privacy
• Cinnamon Bloss, Ph.D.; University of California, San Diego

Dr. Bloss’s presentation focused on how people think about privacy. She noted that the concept of privacy rights has been around a long time and was embodied in laws in the late 1800s, partly as a result of concerns about the work of photographers and newspapers. Although we often discuss privacy as if we all understand what it means, she maintained that it really means different things to different people. Individuals differ in what information they are willing to share, with whom, and for what reasons.
To get a better understanding of how people think about privacy, the National Human Genome Research Institute funded a study with the following specific aims:
• To refine a conceptual model of privacy through literature review, individual interviews, focus groups, consultation with experts, and analyses of preliminary data;
• To develop a psychometrically sound instrument to measure individual Privacy Affinities and Privacy Environment Responses to personal health data technologies; and
• To administer the scale in a larger population and use it to explore the relationship between privacy and other factors, including the propensity to adopt Personal Health Data (PHD) technologies, the propensity to share PHD for research, and disease type and stage.
Dr. Bloss reported preliminary findings from this study. The qualitative approach used 44 interviews and 9 focus groups to explore attitudes to privacy and sharing personal health information. Some of the values that emerged reflected cultural ways of thinking about privacy that are poor predictors of individual behavior because they are too widely accepted. These included:
• “Moral right”: Individuals should have control over information about themselves because that is the correct or moral arrangement.
• “Personal responsibility”: Maintaining privacy is something that people must work at. • “Tradeoff”: Privacy or personal information may be traded strategically for benefits, goods, or
services. • “Nothing to hide”: Privacy is only necessary to protect information that is sensitive or
stigmatizing. • “Fatalism”: Privacy is already lost or does not exist.
Individual privacy values are more useful in predicting behavior. They differentiate people and show how they think about sharing or protecting information. Dr. Bloss envisioned these on a grid in which attitudes are grouped in four categories – open, intimate, anonymous, or reserved – and paired with four different ways of thinking about sharing personal health information that might be associated with each category in different circumstances, as shown below.

Privacy and Health Research in a Data-Driven World

6

Currently, the project team is in the process of developing a psychometric assessment tool with four subscales based on this analysis. The hypothesis is that subscale profiles will result in combinations that may be described as Privacy Types (similar to personality types) that differ in their openness to data sharing.
In conclusion, Dr. Bloss suggested different reasons we might seek to understand and measure individual attitudes to privacy. These included:
• To promote rigorous research on an ill-defined topic, • To understand people’s privacy-related behaviors (for example, why people might say one thing
and do another), • To enable safe data sharing for biomedical research (for example, by tailoring informed consent
or developing decision aids), • To enhance patients’ control of personal health data, • To develop approaches for addressing privacy concerns in clinical settings (for example, by
tailoring telehealth interventions), and • To promote user-centered design of health technologies and information technology.

Privacy and Health Research in a Data-Driven World

7

Panel Discussion for Session I
What does giving permission mean? Ms. Daniel observed that personal preferences and profiles stored on online devices provide a framework for using and protecting data, but people do not understand the downstream risks associated with their data. Patients may not understand what giving permission to use their data actually means in this context. How do we address this problem?
Dr. Bloss called this a “huge challenge” that speaks to the limitations of the concept of informed consent in this arena. There is no easy answer. One approach is the use of decision aids to help people who may not understand intricate issues related to data management but can certainly link certain values to decisions and outcomes. Trusted entities could develop such tools.
Dr. Metcalf called, instead, for collective protections. He held that researchers should not rely on individual decision making to protect us as a community. While considering potential harms to individuals is “incredibly important,” there are certain choices individuals can’t make. The public health analogy is helpful here: for example, children should be vaccinated to have access to public schools. The potential for collective harms calls for collective solutions.
The challenge of unregulated research. In devising strategies to prevent harms from big data research and protect privacy, Mr. Barnes highlighted a “jurisdictional question” related to the First Amendment of the Constitution. Some research can be regulated because the Federal Drug Administration (FDA) can address commerce that crosses state lines or because the research is funded by the federal government. Even when the research is subject to the Common Rule, in some circumstances investigators can legally do what they want with their data. How do we square open and transparent use of public data with the objective of preventing harms?
In response, Dr. Metcalf observed that the fact that the Common Rule mandates the use of IRBs when federal funds are used does not mean that IRBs cannot be used in other contexts. Institutions can control the use of their own resources. They can certainly tell researchers that the researchers cannot use the institution’s labs unless their research is reviewed. Beneficence and justice still matter.
When consumer data are at issue, Ms. Leong said that consumer protection laws allow companies to use data for secondary purposes with permission, but consumers are given some protection from exploitation through the agreements they make with business entities.
Does HIPAA help restrict the use of big data? Ms. Daniel observed that the Health Insurance Portability and Accountability Act (HIPAA) offers rules about data use that take collective harms and benefits into account. Certain uses of data are allowed, and the individual doesn’t get to say yes or no. When this law was passed, she said, we made a set of choices as a country about what uses of data are permissible. However, the law is now 20 years old, and the issues we face regarding big data were not apparent when it was passed.
The HIPAA law, Mr. Barnes observed, is based on commerce law and governs the electronic submission of data. However, business also has rights, and the jurisdiction of the Food and Drug Administration (FDA) to suppress speech by companies about their products is currently under attack. If contracts are established and violated by a company, that is fraud – but if data are truly in the public realm, it isn’t clear how we get the right to control how these data are used in all contexts. We are just not set up as a society to do this.

Privacy and Health Research in a Data-Driven World

8

How do we bring local context into the consideration of privacy issues? Dr. Buchanan wondered how local context can be addressed as privacy issues are considered. In general, local context is addressed through IRB review, which takes into account the norms and values of different subpopulations. Groups may view privacy differently. How do we ensure IRBs are prepared to address varying expectations about privacy?
Dr. Bloss suggested that to address cultural issues related to privacy, as well as those that arise in other areas, diversity among investigators and IRB members is part of the solution. We can amend the ways in which we approach the issue through empirical research. However, more work is needed in this area.
How should we think about “privacy”? While we use the term “privacy” a lot, Ms. Daniel observed, we have heard that it means different things. How should we think about it? Are there different constructs to address it? What’s the right framework for thinking about risks, benefits, and trust? Do individual differences make it difficult to address this?
Dr. Zimmer cautioned against “falling into the consent trap.” Once collected, data are removed from context and the value of informed consent is limited. Instead, he suggested, we need to give more attention to the life cycle of the data and what happens to them over time. Since intentions change, Dr. Kilpatrick suggested, a sequential consent process is more appropriate in this type of research. She found Dr. Bloss’s work in this area “compelling.”
Ms. Leong held, however, that “we are way beyond the point that any individual can control the chain of consequences” once data are made available. Millions of people may be involved, and consent is no longer the main defense against misuse. Socially defined purposes are more important at this scale.
Dr. Metcalf agreed with Ms. Leong. Once artificial intelligence “goes out and does something in the world,” controlling “who knows what” is very difficult. The best approach is to focus not on individual control at the beginning of the chain, but on social control at the output stage. Spelling out “who can do what to us” through a clear social contract has more potential. Mr. Gupta agreed that policies related to the use of data are more important than a focus on consent, especially given the fact that people tend not to understand what they are consenting to when they give out data, or even the fact that they are giving data (for example, every time they use a cell phone).
What is the right approach to “consent” in this context? Given the complexities just explored around the traditional consent process as applied to research using big data, Ms. Daniel asked, what is the right framework for thinking about the possible benefits and harms of such research?
One of the challenges in addressing this issue, Mr. Barnes noted, is that no data are ever really deidentified. Do we dispense with the notion of exemption on this basis? It is often said, and has been said here, that we should not ban the research, but we should rather try to ban bad uses of the data. These are much tangible. Mr. Gupta agreed, noting that “the data are not necessarily bad” in themselves – it is the ways they are used that may cause harm. However, Dr. Zimmer observed that we cannot confidently predict harms where pervasive data are involved.
Because of her work with rare disease foundations, Dr. Li saw this problem “through a different lens.” Many people with such diseases feel strongly that data should be shared. She doubted that it was possible to clearly differentiate among types of data to understand which should be protected.

Privacy and Health Research in a Data-Driven World

9

Dr. Garfinkel asked Mr. Barnes if banning misuse of data would violate the First Amendment. He responded that misuse could be prohibited, but you probably could not ban the reidentification of public data.
How do we think about harms? How do we even know what they are? Ms. Daniel said it was apparent that setting rules about how data are used seems difficult unless we are able to determine up front what the potential harms might be and how to prevent them. Also, we’ve heard that people think about harms differently. How do we address this challenge?
Dr. Garfinkel observed that no one has yet mentioned the usefulness of integrity models, which offer well-developed frameworks for analyzing sender and recipient interactions. One challenge in applying the models, however, is the lack of transparency and visibility around how data are collected. We need to begin by making passive collection of data visible so we have a better idea what corporations are taking and how it might be used.
Dr. Metcalf agreed that an integrity framework is useful for looking at informational harms and can be implemented so as to make it hard to make a bad decision. Platforms typically “don’t report back”; if data were collected years ago and were unlabeled, the platform no longer knows to whom they once belonged. However, we are starting to see platforms that have the capacity to present metadata to scientists to enable good intent. It would help to think not just about banning inappropriate uses of data, but also about facilitating the best intent. For example, if protected health information is used, all features should travel with the metadata.
“What about reidentification?” the moderator asked. Dr. Garfinkel said deidentification does not work; anonymization does not produce anonymized data.
Mr. Barnes wondered what the real concern was regarding algorithms: that they are used to make decisions at all, the possibility that they are wrong – or that they actually work.
How do we provide training and promote cultural change among researchers? Dr. Bloss stressed the importance of training data scientists to “first do no harm.” We currently have few resources to help train researchers about the specific ethical considerations for research with big data, Ms. Kasimatis Singleton said, and we need to think through what would be useful. Mr. Barnes observed that cultural change among such researchers, if it can be achieved, would be much more long-lasting than trying to find a legal solution.
Could we be overestimating the potential harms? An audience member wondered whether we might be overestimating the potential harm from big data and associated privacy violations. Perhaps there is a cultural shift in the way people think about privacy that we should take in account.
When people release data on the Internet about themselves, Dr. Garfinkel observed, people are usually taking advantage of a platform and assuming it has certain controls. They may not realize their data are being archived, and they typically can neither see nor control the flow of information. There is a role for education and a role for regulations, but there is also a system development issue. Mr. Gupta noted that some people share too much, and others do not even know they are sharing. Ms. Leong added that people who never supplied their data are also potentially exploited and harmed. Dr. Li reminded panelists that Dr. Metcalf’s presentation showed how our own choices may have implications for potential harms to our friends. Science tends to race ahead of regulatory frameworks, and that is happening now.

Privacy and Health Research in a Data-Driven World

10
DataPrivacyPeopleModelsBenefits