Data management case: large scale survey

Dis­claimer: The use case pre­sent­ed below is inspired by a real-world study, in which research teams faced spe­cif­ic data-relat­ed chal­lenges from a storage/processing per­spec­tive, pri­va­cy and com­pli­ance with GDPR point of view, or eth­i­cal con­sid­er­a­tions and insti­tu­tion­al eth­i­cal review.  We hope this case-study will pro­vide you with insight into what may be required for your own work and trig­ger inter­est­ing dis­cus­sions on such key aspects of research activ­i­ties. How­ev­er, the advice pro­vid­ed here should not be direct­ly applied to any oth­er research project. Please con­sult with rel­e­vant experts in your insti­tu­tions.

Project background

The research team is inves­ti­gat­ing peo­ple’s pref­er­ences regard­ing poten­tial pub­lic pol­i­cy changes. The objec­tive is to obtain the per­spec­tive of the gen­er­al pop­u­la­tion on spe­cif­ic health-relat­ed pub­lic mea­sures.

To ensure that a rep­re­sen­ta­tive sam­ple of the pop­u­la­tion will par­tic­i­pate in the sur­vey, the research team will out­source the exe­cu­tion of the sur­vey to a pri­vate com­pa­ny spe­cial­ized in this domain.  The research team and the sur­vey com­pa­ny have agreed on a pro­to­col ensur­ing that the research team will not receive per­son­al data from sur­vey par­tic­i­pants.

The pop­u­la­tion in ques­tion is from a Euro­pean coun­try, where the research team is locat­ed.

The sci­en­tif­ic pub­li­ca­tion con­clud­ing this research will be sup­port­ed by a dataset con­tain­ing: the ques­tion­naire itself with the open­ing state­ment, aggre­gat­ed ver­sion of the dataset, and the R script used to pro­duce the aggre­gat­ed results.

Research data man­age­ment con­sid­er­a­tions

Data collection and storage

The research team expects between a thou­sand to two thou­sand respons­es. To col­lect data about people’s pref­er­ences, sur­vey respon­dents are asked to “rate” state­ments using a scale sys­tem such as: ‘On a scale of 1 to 5, to what extent do you agree with the fol­low­ing state­ment…’

The data col­lect­ed via the sur­vey com­prise a sec­tion regard­ing the demo­graph­ic pro­file of the respon­dents. The demo­graph­ic infor­ma­tion col­lect­ed are:

  • Edu­ca­tion
  • Income range
  • Age range

The sur­vey tool and respons­es reside with­in the researcher’s insti­tu­tion­al stor­age to ensure data secu­ri­ty. The raw sur­vey data is not shared exter­nal­ly but only acces­si­ble to the research team.

The com­pa­ny dis­sem­i­nat­ing the sur­vey can only view the sur­vey ques­tions and not the respons­es. Only the aggre­gat­ed sur­vey data is made pub­licly avail­able upon com­ple­tion of the research.

The full process for col­lec­tion of the sur­vey data is as fol­lows:

  • The researchers cre­ate the sur­vey using their inter­nal tool
  • Once ready, the link is shared with the sur­vey com­pa­ny
  • Then sur­vey com­pa­ny invites the par­tic­i­pants to their web page, and redi­rect the par­tic­i­pants to the researchers’ sur­vey page (that is how the com­pa­ny keeps track of sur­vey respons­es and respon­dents so far)
  • The data is col­lect­ed at the researcher’s insti­tu­tion­al stor­age
  • All data pro­cess­ing is inter­nal to the research insti­tute, allow­ing for all approved resources.

After col­lec­tion, the sur­vey data is used with a script (in R) to obtain descrip­tive sta­tis­tics.

Key con­sid­er­a­tions on data col­lec­tion and stor­age

  • The sur­vey tool should be approved by the insti­tu­tion­al pri­va­cy and ethics team.
  • A secure stor­age loca­tion should be acces­si­ble by the research team (e.g., an insti­tu­tion­al cloud stor­age solu­tion).
  • The sur­vey answers should not be shared beyond the research team (3rd par­ties do not have access to sur­vey answers).
  • Only aggre­gat­ed research data is shared.
  • The raw data (sur­vey answers) will be delet­ed after the asso­ci­at­ed pub­li­ca­tion is accept­ed. In this instance, giv­en the lit­tle demo­graph­ic infor­ma­tion col­lect­ed (age range and income range and high­est diploma/education lev­el), and the size of the tar­get pop­u­la­tion (all adults in a coun­try), the com­plete set of answers could be pub­lished.

Ethics

Informed con­sent is required from respon­dents who par­tic­i­pate in the sur­vey. Respon­dents are made aware that the sur­vey results will be made pub­licly avail­able in an aggre­gat­ed for­mat upon com­ple­tion of the research. It is impor­tant to note that respon­dents were not asked about their state of health but were asked about their pref­er­ences regard­ing health-relat­ed pub­lic mea­sures which is deemed less sen­si­tive. Informed con­sent is obtained by pre­sent­ing an open­ing state­ment to the par­tic­i­pants as the first ques­tion to the sur­vey. They must agree to con­tin­ue. Only com­plet­ed sur­veys are con­sid­ered — inter­rupt­ed or incom­plete sur­vey results are dis­card­ed.

Key con­sid­er­a­tions on ethics

  • Par­tic­i­pants may get com­pen­sat­ed for par­tic­i­pat­ing in the sur­vey via the sur­vey com­pa­ny. It is the eth­i­cal respon­si­bil­i­ty of the research team to choose a prop­er com­pa­ny to do it.
  • The sur­vey’s open­ing state­ment should include an expla­na­tion of the rela­tion­ship between the research team and the sur­vey com­pa­ny and clar­i­fy that there is no per­son­al data trans­ferred between the two par­ties.
  • The open­ing state­ment of the sur­vey should also make clear that beyond the demo­graph­ic data col­lect­ed, no oth­er per­son­al data is col­lect­ed by the research team, the sur­vey is con­sid­ered anony­mous by design.
  • All input will be treat­ed with strict con­fi­den­tial­i­ty.

Privacy & legal aspects

The sur­vey respons­es con­tain the respon­dents’ demo­graph­ic pro­file; how­ev­er, this data does not allow for re-iden­ti­fi­ca­tion of respon­dents. Because of the high num­ber of sur­vey respon­dents and the lim­it­ed demo­graph­ic infor­ma­tion being col­lect­ed, the pri­va­cy of respon­dents is not con­sid­ered to be at risk, the sur­vey can be con­sid­ered as anony­mous

Since the per­son­al data flow between the research insti­tu­tion and the sur­vey com­pa­ny is elim­i­nat­ed, the research team does not need to address GDPR relat­ed ques­tions regard­ing data trans­fer.

Key con­sid­er­a­tions on Pri­va­cy and legal aspects

  • By design, no per­son­al data is trans­ferred between the research team and sur­vey com­pa­ny.
  • Ensur­ing that the com­pa­ny is certified/competent to per­form this dis­tri­b­u­tion func­tion.
  • A con­tract between the com­pa­ny and the research team (Insti­tu­tion) clear­ly lays out the com­pa­ny’s respon­si­bil­i­ties and oblig­a­tions.
  • If com­pen­sa­tion is pro­vid­ed for par­tic­i­pa­tion in the sur­vey, the com­pa­ny should com­pen­sate the par­tic­i­pants in a legal­ly per­mis­si­ble man­ner.

Acknowl­edge­ment: this use case was writ­ten joint­ly by data stew­ards from TU Eind­hoven, Uni­ver­si­ty of Twente, and TU Delft. We would like to thank the data stew­ards of these uni­ver­si­ties who con­tributed to the use case.

Related Articles

Discover more from 4TU.ResearchData

Subscribe now to keep reading and get access to the full archive.

Continue reading