Data management case: large scale survey

Cécile van Heukelom

2 years ago

Disclaimer: The use case presented below is inspired by a real-world study, in which research teams faced specific data-related challenges from a storage/processing perspective, privacy and compliance with GDPR point of view, or ethical considerations and institutional ethical review. We hope this case-study will provide you with insight into what may be required for your own work and trigger interesting discussions on such key aspects of research activities. However, the advice provided here should not be directly applied to any other research project. Please consult with relevant experts in your institutions.

Project background

The research team is investigating people’s preferences regarding potential public policy changes. The objective is to obtain the perspective of the general population on specific health-related public measures.

To ensure that a representative sample of the population will participate in the survey, the research team will outsource the execution of the survey to a private company specialized in this domain. The research team and the survey company have agreed on a protocol ensuring that the research team will not receive personal data from survey participants.

The population in question is from a European country, where the research team is located.

The scientific publication concluding this research will be supported by a dataset containing: the questionnaire itself with the opening statement, aggregated version of the dataset, and the R script used to produce the aggregated results.

Research data management considerations

Data collection and storage

The research team expects between a thousand to two thousand responses. To collect data about people’s preferences, survey respondents are asked to “rate” statements using a scale system such as: ‘On a scale of 1 to 5, to what extent do you agree with the following statement…’

The data collected via the survey comprise a section regarding the demographic profile of the respondents. The demographic information collected are:

Education
Income range
Age range

The survey tool and responses reside within the researcher’s institutional storage to ensure data security. The raw survey data is not shared externally but only accessible to the research team.

The company disseminating the survey can only view the survey questions and not the responses. Only the aggregated survey data is made publicly available upon completion of the research.

The full process for collection of the survey data is as follows:

The researchers create the survey using their internal tool
Once ready, the link is shared with the survey company
Then survey company invites the participants to their web page, and redirect the participants to the researchers’ survey page (that is how the company keeps track of survey responses and respondents so far)
The data is collected at the researcher’s institutional storage
All data processing is internal to the research institute, allowing for all approved resources.

After collection, the survey data is used with a script (in R) to obtain descriptive statistics.

Key considerations on data collection and storage

The survey tool should be approved by the institutional privacy and ethics team.
A secure storage location should be accessible by the research team (e.g., an institutional cloud storage solution).
The survey answers should not be shared beyond the research team (3rd parties do not have access to survey answers).
Only aggregated research data is shared.
The raw data (survey answers) will be deleted after the associated publication is accepted. In this instance, given the little demographic information collected (age range and income range and highest diploma/education level), and the size of the target population (all adults in a country), the complete set of answers could be published.

Ethics

Informed consent is required from respondents who participate in the survey. Respondents are made aware that the survey results will be made publicly available in an aggregated format upon completion of the research. It is important to note that respondents were not asked about their state of health but were asked about their preferences regarding health-related public measures which is deemed less sensitive. Informed consent is obtained by presenting an opening statement to the participants as the first question to the survey. They must agree to continue. Only completed surveys are considered – interrupted or incomplete survey results are discarded.

Key considerations on ethics

Participants may get compensated for participating in the survey via the survey company. It is the ethical responsibility of the research team to choose a proper company to do it.
The survey’s opening statement should include an explanation of the relationship between the research team and the survey company and clarify that there is no personal data transferred between the two parties.
The opening statement of the survey should also make clear that beyond the demographic data collected, no other personal data is collected by the research team, the survey is considered anonymous by design.
All input will be treated with strict confidentiality.

Privacy & legal aspects

The survey responses contain the respondents’ demographic profile; however, this data does not allow for re-identification of respondents. Because of the high number of survey respondents and the limited demographic information being collected, the privacy of respondents is not considered to be at risk, the survey can be considered as anonymous

Since the personal data flow between the research institution and the survey company is eliminated, the research team does not need to address GDPR related questions regarding data transfer.

Key considerations on Privacy and legal aspects

By design, no personal data is transferred between the research team and survey company.
Ensuring that the company is certified/competent to perform this distribution function.
A contract between the company and the research team (Institution) clearly lays out the company’s responsibilities and obligations.
If compensation is provided for participation in the survey, the company should compensate the participants in a legally permissible manner.

Acknowledgement: this use case was written jointly by data stewards from TU Eindhoven, University of Twente, and TU Delft. We would like to thank the data stewards of these universities who contributed to the use case.