Dissertations & Theses

Depositing Digital Work

Please consult the information below if submitting work beyond a PDF. Note that additional preliminary pages may be required in your project narrative PDF. See the library's guidelines for documenting digital projects for deposit.

Websites

If you are submitting a website, please fill out the digital component submission form so the library can archive it. Because there is no single authoritative method of archiving and preserving a website, the library uses two methods of preservation:

  1. Crawling your site with Archive-It, which adds your site to the Internet Archive. Please read the "Best Practices for Developing Your Website" box below. 
  2. Asking you to submit a Web Archive (WARC) file created using WebRecorder. Please review how to use WebRecorder to create a WARC file.
  3. If indifferent about capturing  your website's interactivity, you can also use a tool such as HTTrack to capture a static version of the site and then upload a zip of the capture. Note: Please test your capture first. 

Supplemental Files

If you are submitting supplemental files (still images, etc.) independent of a website (as opposed to embedded on a website), please consult this list of recommended formats for archiving (more info). During the deposit process to Academic Works you will be provided with an opportunity to upload supplemental files. If you are submitting numerous supplemental files, please combine them into one .zip or .tar file.

If audio, video, or other large media files constitute your entire submission, or comprise a part of your submission, a preferred hosting solution is to create an account on the Internet Archive (archive.org) and upload the content (please review the list of recommended formats for archiving). After uploading to archive.org, obtain the permanent URL for each upload by clicking on the ‘Show All’ link after uploading, selecting the preferred format by right mouse clicking (CTRL & Click on a Mac) and selecting ‘Copy link address’ and then paste it into your website as an embedded link. More information on permanent URLs in archive.org can be found here. We strongly prefer this process to hosting submissions on Youtube, Vimeo, or Soundcloud.

 

Applications Independent of Websites

If an application (desktop or mobile) and not a website, you are welcome to provide a link to your GitHub repository, but please also include:

  • A zip of all your source code. Note: Some folks perform file format identification procedures previous to pursuing preservation. although it is not required to pursue identification, here is some information regarding format identification.  
  • A zip of the backend database (if a database-driven site).
  • A rudimentary readme file explaining software requirements (if relevant, e.g.: OS, Apache, MYSQL, PHP, Python, version, etc.), so the project can be reproduced
  • A screencast showing how the application works. Note: This is more difficult on phones. See:
  • Windows
  • iPhone
  • iPhone
  • Android

Please upload all of these files when you deposit to Academic Works.

Note: Please contact the library at libraryweb@gc.cuny.edu if you have any questions or need help using WebRecorder.

Best Practices for Developing your Website

The Archive-It crawler has limited functionality (typically most interactive elements (searching and sophisticated javascripting (most embedded media and timelines) ) do not typically archive).  For producing the best archival results, we suggest the following best practices:

  • If possible, please contact the library via libraryweb@gc.cuny.edu early in the project’s development.

  • Make sure the site is built with proper architecture. Each page on the site should have a unique URL.

  • Please, whenever possible, host media content locally and do not point to third party sites. Content includes video, audio, code (scripts, css, etc.)

  • If unable to host media locally, please host on archive.org (see above).

  • Please delete or modify robots.txt file to allow for crawling. You can develop and test using the google tester.

  • Note:  Websites with nested javascript, generally do not archive well.

  • Note:   Real Time Protocol (RTP) that streams audio and video do not archive very well.

  • If you embed a streaming video, please embed only YouTube videos and each video should only appear once on the entire site or the crawler will not capture it. Vimeo embeds are crawled, but only one Vimeo video can be embedded on each page.

  • ARCGIS and StoryMaps do not archive.

  • Scalar sites do not archive.

  • Do not use Flash.

  • Search is not captured. Note: If archiving searches is important, you can collect URLs of what you assume to be the most popular search result pages and add to pages or a page on your site and the crawler  might be able to capture theses searches. Test search result URLs in a different browser. 

  • Interactivity is often not preserved in the Internet Archive, so you might not want it to be the primary focus of your website. Note: if interactivity is important, you might want to build a static, rather than dynamic site, and screen video capture the interactive aspects of the site and then post the video to somewhere on the site. However, do not let the archive-it crawler's limited functionality constrain you, because WebRecorder might be able to capture the interactivity that archive-it cannot.

  • At the time of submitting/depositing your project and graduating, the library will ask you to stop working on your project, so they can captured a snapshot of the project on the day that you deposited the written component of your work. The library will inform you once captured, so, if desired, you can continue to work on your digital project.

More reading:

Stanford Libraries Best Practices

Columbia University's Best Practices

LOC Guide

Smithsonian Guide

5 Tips for Creating Preservable Websites