Skip to Main Content

Research Guides

Digital Preservation: Archiving Common Web Platforms and Software

Archiving Websites and Software Projects

Note:   If your software is unique (not Wordpress, Omeka, etc.), create a .zip or .tar file containing source code and a readme file explaining software requirements (e.g. OS, Apache, MYSQL, PHP, Python versions) and building instructions. This zip or tar file can supplement your thesis or dissertation deposit. 

If you want to archive your website beyond merely creating a WARC file, for fuller reproducibility, here is a list of the most common platforms and associated unique files and directories that are suggested to archive. 
 

Wordpress:

  • Download the MYSQL database
  • Download the /wordpress/wp-contents/themes/ directory
  • Download the /wordpress/wp-contents/uploads directory
  • Make a list of all plugins (with version numbers and hyperlinks) and include in documentation
  • Zip the database, themes and uploads directories with documentation
  • Include MYSQL, PHP and Wordpress (include link https://wordpress.org/about/history/) versions in documentation

Omeka:

  • Download the MYSQL database
  • Download the /omeka/themes/ directory
  • Download the /omeka/files/ directory
  • Make a list of all plugins (with version numbers and hyperlinks) and include in documentation
  • Include MYSQL, PHP and Omeka (https://github.com/omeka/Omeka/releases) versions in documentation
  • Zip the database, themes and files directories with documentation

 

Scalar:

  • Download the MYSQL database
  • Download the /scalar/yourdirectory/media directory
  • Include MYSQL, PHP and Scalar versions in documentation
  • Zip the database, media directory with documentation

 

HTML:

  • Zip all html, css, media (jpg, wav, mp4, etc.)


Mobile (tablets and phones):

  • Create a screencast or recording showing how the application works to use as a supplemental file. See:
  • Windows
  • iPhone
  • iPhone
  • Android

 

Software Standards:

Successful archiving of software and data is dependent upon good data management practices. Here are a few things to remember:

  • Every project should have its own root directory

  • Separate folders for data, images, and scripts

  • Comment the code. Include a readme with (a file manifest and codebook), installation instructions with requirements, dependencies, operating instructions, copyright and licensing info, contact info, known bugs, troubleshooting, acknowledgments, and news.  A File Manifest,  a simple listing of files and directories. 

  • Include a Codebook  with the following info: 

  • Copyright considerations: If hosting or distributing, ensure that you ‘own’ or have permission (creative commons) to use others’ work. 

  • Do not encrypt or compress files.