Research Guides

Data Management

Advantages of a respository

Why would you choose to deposit your data into a repository? A repository is helpful because it:

  • Provides a metadata structure for you to fill in
  • Serves as a backup vehicle for your data
  • May preserve your data for the future
  • Makes sharing your data easy
  • Others may cite your research more
  • May provide some computational/online analysis tools for people to use your data
  • Publishes the data for you by giving your dataset a unique persistent identifier, e.g., DOI

Selecting a data repository

There are some things to keep in mind when selecting a repository. Data in a repository should be:

  • Persistent (not likely to be modified)
  • Searchable and browsable
  • Retrieved or downloaded easily
  • Cited

A wide variety of institution-based and discipline-specific repositories exist for digital data. The repository itself should be: 

  • Appropriate for the type of data you generate
  • Appropriate for the audience of the repository (so they will make use of your data!)
  • Open access

If both a discipline-specific repository and an institution-based one exist for your data, then consider depositing in both locations to maximize discovery and safety of the data. 

NYU's Guide to selecting a repository.

NIH's guide.

A good comparison of various popular general repositories. 

Taylor and Francis

 

Locating a Data Repository

Data repositories

Many more data repositories are available online than can be listed here. Consult re3data.org, an external resource, for an extensive list of discipline-specific repositories.

FAIRsharing.org is a curated, searchable registry of metadata standards; databases and repositories; and funder and journal policies that are relevant to specific domains or types of data.

OpenDOAR is a global Directory of Open Access Repositories. You can search and browse through thousands of registered repositories based on a range of features, such as location, software or type of material held. Try it out for yourself:

Repository Finder is another good search (beta) tool to help you navigate and discover the most appropriate repository for your data.

list broken down by discipline.

CUNY Academic Works accepts all data formats, and is dedicated to collecting and providing access to the research, scholarship, and creative and pedagogical work of the City University of New York.

OSF allows for the collection, analysis, publishing and discovery of projects and supported data. 

FigShare allows you to share all of your data, negative results and unpublished figures.

Dataverse Network Project (DVN), is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit. 

Inter-university Consortium for Political and Social Research (ICPSR) – The world’s largest archive of digital social science data. ICPSR staff can guide you in preparing your data for archiving and distribution.

GitHub is generally OK for software projects.

Zenodo is increasingly becoming a standard and each deposit is assigned a DOI.

Open Context reviews, edits, annotates, publishes and archives research data and digital documentation.

Confidentiality

It is vital to maintain the confidentiality of research subjects for reasons of ethics and to ensure the continuing participation in research. At the same time, data on research subjects can be shared if proper steps are taken to maintain participant confidentiality:

Informed consent should make a provision for data sharing: When obtaining informed consent from study participants, ensure confidentiality while also enabling the option of data sharing. Even if you are not certain that you will share your research data with others, you must obtain informed consent at the outset. For an example of how to write informed consent forms to allow for data sharing, see the U.K. Data Archive guide to consent or the ICPSR Confidentiality Language for Informed Consent Agreements.

Evaluate the sensitivity of your data: Researchers should consider whether or not their data contains either direct or indirect identifiers that could be utilized with other public information to identify research participants. If so, steps should be taken to remove or mask these in public-use data files.

Obtain a confidentiality review: A benefit to depositing your data with some archives, such as ICPSR, is that their staff will review your data for the presence of confidential information.

Comply with CUNY regulations: Grad Center researchers concerned about confidentiality issues with their data should consult the CUNY Human Research Protections Program (HRPP).

Comply with regulations for health research: HIPPA Privacy Rule, Information for Researchers.

Enable restricted use of your data: Do you want to make your data available in a more restricted, limited-access manner? The ICPSR DSDR program has resources for data producers including a tool for Designing a Restricted Data Use Contract.

CITI Training

All CUNY faculty members, postdoctoral scholars, graduate and undergraduate students involved in human subjects research as key personnel must complete the applicable Basic Course (e.g. HSR for Social & Behavioral Faculty, Graduate Students, & Postdoctoral Fellows) in the protection of human subjects prior to Institutional Review Board (IRB) approval of their protocol.  More info here.

Rebecca Banchik is the GC's Director of the Human Research Protection Program.

 Adrienne Klein, is the GC's Director of Special Projects and Research Integrity Officer, of Research and Sponsored Programs.  

 

Citing Data

When writing a paper or doing a presentation, it is important to cite not only the literature consulted but also the data files used, even if they are data files that you have produced.

Citing data is important in order to:

  • Give the data producer appropriate credit
  • Enable readers of your work to access the data, for their own use or to replicate your results

Elements of a citation include:

  • Author(s)
  • Title
  • Year of publication: The date when the dataset was published or released (rather than the collection or coverage date)
  • Publisher: the data center/repository
  • Any applicable identifier (including edition or version)
  • Availability and access: URL or other location information for the data

Examples:

Bachman, Jerald G., Lloyd D. Johnston, and Patrick M. O'Malley. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 1998 [Computer file]. Conducted by University of Michigan, Survey Research Center. ICPSR02751-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producer and distributor], 2006-05-15. http://dx.doi.org/10.3886/ICPSR02751.

ASTER Global Digital Elevation Model, version 1, ASTGTM_N11E122_num.tif, ASTGTM_N11E123_num.tif, Ministry of Economy, Trade, and Industry (METI) of Japan and NASA, downloaded from https://wist.echo.nasa.gov/api/, October 27, 2009

Related links:

ICPSR: Why and how should I cite data?

DataCite