Research Guides

Data Management

This guide outlines the how's and why's of managing research data at the CUNY Grad Center.

Data Management Instruction

I can assist you in improving the capture, organization, management, storage & preservation, presentation and dissemination of your research data

Virtual Office Hours

Do you have reams of research data that you need to organize and document? Do you need to ensure that it is accessible to the public and/or preserved for the long term? Are you applying for a grant that requires you to create a data management plan? If so, then this drop-in session is for you.

Stephen Klein will help you navigate the world of data management during drop-in video consultations  on the second Tuesday of the each month from 2-3pm. Advance registration is possible but not required.

Click here to join during Tuesdays (the third Tuesday of the month) 2-3pm and if another person is being assisted, you'll be kept in the "waiting room" until the librarian is available.   Or email Stephen for an alternative meeting time if Tuesday afternoons do not work.

Librarian

Profile Photo
Stephen Klein
Contact:
212 817 7074

CUNY Links

CUNY Computing and Information Services: Security Policies & Procedures

CUNY Computing and Information Services: Endpoint Encryption Best Practices

CUNY Academic Commons: Data Management Tools (Note: Some of the tools listed on this page may not be appropriate for data management plans or long-term data management.)

Office of Institutional Research and Assessment (OIRA)'s Guiding Questions and General Tips for Working with Data for Program Reviews handout.

All CUNY faculty members, postdoctoral scholars, graduate and undergraduate students involved in human subjects research as key personnel must complete the applicable Basic Course (e.g. HSR for Social & Behavioral Faculty, Graduate Students, & Postdoctoral Fellows) in the protection of human subjects prior to Institutional Review Board (IRB) approval of their protocol.  More info here.

Rebecca Banchik is the GC's Director of the Human Research Protection Program.

 Adrienne Klein, is the GC's Director of Special Projects and Research Integrity Officer, of Research and Sponsored Programs.  

Essentials to a Data Management Plan (DMP)

Data Management Plans include information on:

DataOne's 'Primer on Data Management: What you always wanted to know is a comprehensive guide helping users become familiar to the most relevant steps in the data lifecycle. 

What counts as data?

Observational: data captured in real-time, usually irreplaceable (e.g., censor data, telemetry, survey data, sample data, neuroimages)

Integrated and transformed: data from different sources, but transformed so disparate data ensuring data compatibility (document provenance, workflows and changes). 

Experimental: data from lab equipment, often reproducible, but can be expensive (e.g., gene sequences, chromatograms, toroid magnetic field data)

Simulation: data generated from test models where model and metadata (inputs) are more important than output data (e.g., climate models, economic models)

Derived or compiled: data that is reproducible, but very expensive (e.g., text and data mining, compiled database, 3D models, data gathered from public documents)

DMPTool

DMPTool Blog | Guidance & resources for your data management plan

The DMPTool is an online tool that includes data management plan templates for many of the large funding agencies that require such plans. The tool includes general guidance, links to helpful documentation, issues to consider, and specific questions to think about as you prepare your data management plan. Space is provided to compose a response for each of the main areas that your funding agency would like for you to address in your plan. You can save and come back to your plan as often as you like. When you are finished, you can export your plan in plain text format and insert it into your grant proposal.

Evaluate your data needs

 

MIT's Data Management Overview

Data description

  1. What type of data will be produced? Will it be reproducible? What would happen if it got lost or became unusable later?
  2. How much data will there be? How quickly will it grow? How often will it change? Once archives/stored, what kind of access will be needed to use it?
  3. Who will use the data now, and in the future?
  4. Who controls the data (PI, student, lab, CUNY, funding agency)? What intellectual property considerations might apply?
  5. How long should the data be retained? How long would you expect it to be useful, e.g. through the end of grant/experiment, 3-5 years, 10-20 years, permanently?
  6. What is Data?

Standards

  1. Is there good project and data documentation?
  2. What directory and file naming conventions will be used?
  3. What project and data identifiers will be assigned?
  4. What file formats are used? Are they standards-based or proprietary?
  5. Are there tools or software needed to create/process/visualize the data? Are the tools or software proprietary?
  6. Is there an ontology or other community standard for data sharing/integration?

Access, Sharing, and Re-use

  1. Any special privacy or security requirements? e.g., personal data, high-security data
  2. Any sharing requirements? e.g., funder data sharing policy
  3. Any other funder requirements? e.g., data management plan in grant proposals
  4. What is your storage and backup strategy?
  5. When will it be shared and where? How broadly will it be shared? Are there I/O throughput issues with respect to the size of the datasets?
  6. Who in the research group will be responsible for data management?