ISPS strives to provide members of the scholarly community with access to files associated with scholarly studies for the purpose of replication (hereafter, Replication Files), for all studies conducted by ISPS-affiliated researchers. Access to these files allows members of the scholarly community to validate the existence of a specific set of data, to gain access to a specific set of data (when permitted), to replicate analyses, and to view additional materials associated with a given study, including high quality metadata.
Access to the ISPS Data Archive is provided at no cost and is granted for scholarship and research purposes only.
ISPS is committed to preserving your privacy when you use this website, whether you are a depositor or an end-user. Any information we collect in the course of your use of the ISPS website is used solely for purposes of the functioning of the ISPS Data Archive.
ISPS operates in accordance with the prevailing standards and practices of the digital preservation community including the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003) and the Data Documentation Initiative (DDI) standard. Accordingly, ISPS supports digital life-cycle management, interoperability, and preferred methods of preservation. Within ISPS, the Director, the Associate Director for Research, and the Data Archive staff contribute to the management of digital content at ISPS.
The majority of digital content in the ISPS Data Archive currently consists of social science research data from experiments, program files with the code for analyzing these data, requisite documentation to use and understand the data, and associated files. For ease of use, files in the ISPS Data Archive are organized by study. For example, all the relevant files for Gerber, Green, and Larimer’s (2008) APSR article, “Social Pressure and Voter Turnout: Evidence from a Large Scale Field Experiment” are grouped together and include datasets (Stata and Excel CSV), program files (Stata and R), output files (Stata and R), treatment materials (PDF), codebook files (XML), and metadata record (PDF).
ISPS wishes to acknowledge the considerable support from the Office of Digital Assets and Infrastructure at Yale, especially for providing consultation and guidance throughout the development of the ISPS Data Archive and for supporting the technical infrastructure through the Yale Digital Commons services.
Publicly available digital content in the ISPS Data Archive is licensed under a Creative Commons Attribution-Non Commercial-No Derivatives 3.0 United States Public License 2010, Yale University, New Haven, Connecticut, United States. Exclusions:
- Restricted-access files held in the ISPS Data Archive are excluded from this license.
- Third party content that is used, with permission, by ISPS, is excluded from this license.
The ISPS Data Archive respects the intellectual property rights and other proprietary rights of others. The ISPS Data Archive may, in appropriate circumstances and at its discretion, remove certain files or disable access to files that appear to infringe the copyright or other intellectual property rights of others. Notice to Copyright Holders: The creators of the ISPS website have made every effort to secure permission to use the works of others on the Research section of the ISPS website. Any use of others’ works on this site is the result of either explicit permission from the copyright owner, a good faith belief (following investigation) that the work is in the public domain, or a fair use for purposes of research and scholarship under copyright laws. See 17 U.S. Code §107. Our goal is to make information and resources available to the community; we have no intent to offend anyone’s ownership rights in intellectual property. If you are a copyright claimant with regard to any work on this site, and you object to our use of it, please contact Limor Peer, Assistant Director for Research at the Institution for Social and Policy Studies.
Q: Why are some files inaccessible? A: Most files in the ISPS Data Archive are public-use files with no restrictions on their access. Replication Files are made available when, (1) the authors have allowed publication, (2) the files contain no confidential or identifying information, and (3) there are no additional restrictions by funders or other entities. Researchers may request access to restricted files by contacting ISPS .
Q: What software do I need to view data and replicate analyses? A: Datasets are available in ASCII format and program files are available in R for universal, non-proprietary access. We also identify software and version for each file, including:
- Stata (10.0)
- Microsoft Office Excel Worksheet and 97-2003 Worksheet
- R (2.9.1) — download packages (see more on R)
- Adobe Acrobat (8.1)
- XML (1.1)
Q: Are all browsers equally suited to view data files? A: Generally, yes; but note that Internet Explorer is proven to be best suited for opening .csv files in the correct format.
Q: Why can’t I see the title of some publication or data records? A: In Internet Explorer, you may need to clear your cache in order for records to display properly.
Q: Does the ISPS Data Archive keep the original version of the files that I submit for archiving A: Yes. The Archive keeps all original files as submitted by the data contributor offsite.
Q: I understand that in the process of preparing files for distribution on the ISPS website, some changes may be made to the Replication Files. Will I be notified about these changes A: Yes. In most cases, the distributed Replication Files are essentially identical to the original files deposited. When appropriate, ISPS Data Archive team converts files to other formats such as ASCII, R, and Portable Document Format (PDF), completes variable and label information, and recodes variables to ensure respondents’ anonymity. Our staff will generally contact you regarding any suggested changes after an initial assessment of your data collection. Regardless of changes made, the archive also keeps copies of all files in the form in which they were submitted.
Q: How does the ISPS Data Archive prepare a data collection for public release A: The data archiving process at ISPS consists of a series of steps. Upon receipt of a deposit, ISPS Data Archive team processes the submitted files to ensure that confidential information has not been included in the data, fill gap in documentation, prepare a record of preservation actions over time, and produce distribution versions of the files, which are disseminated via the ISPS website. Specifically, ISPS Data Archive team will (a) Create copies of the files and standardize file names according to ISPS Data Archive naming conventions, (b) Clean data files, including adding or clarifying variable- and value-labels, and ensuring that confidential information is not included in the data, (c) Confirm replication of published results, (d) Convert Data Files to ASCII when needed, (e) Convert program files to R when needed, (f) Create a codebook with variable and value labels in XML format and (g) Create a metadata file describing the study and associated files, (h) Invite you to review the files. See more about How to Archive?.
Q: Why archive? A: See Why Archive?
Q: Who can archive with the ISPS Data Archive? A: See Who can Archive?
Q: What to archive in the ISPS Data Archive? A: See What to Archive?
Open access to data has recently been on the agenda in the scientific and research community. For example, Science Commons, the Berlin Declaration, the A2K movement, and the OECD Principles and Guidelines for Access to Research Data From Public Funding all indicate that data should be shared as openly as possible.
General benefits of archiving and disseminating data with the ISPS Data Archive:
- Enable new discoveries and encourage open scientific inquiry by making data available for use by others.
- Promotes new research and allows for the testing of new or alternative methods.
- Preserve valuable data for the long term.
- Satisfy funder or institutional requirements for data sharing and retention.
- Enhance the competitiveness of grant proposals and impact of research by sharing data.
- Enables researchers to demonstrate continued use of the data after the original research is completed, which can influence funding agencies to provide further research money.
- Reduces costs by avoiding duplicate data collection efforts.
- Provides an important resource for training in research and teaching.
- Allows investigators and data owners to avoid the administrative tasks associated with external users and their queries.
- Replication Files are linked to a detailed description of the author’s study on the ISPS website.
- ISPS Data Archive staff prepares data and documentation files for dissemination in user-friendly formats and updates these formats as appropriate.
- ISPS Data Archive staff maintains permanent backups of the digital content of the Replication Files.
- ISPS Data Archive staff reviews files to determine whether any issues of confidentiality exist.
- ISPS Data Archive staff further reviews program files and creates identical files in R.
- ISPS Data Archive staff prepares metadata records, including searchable fields, to assist in locating Replication Files within the ISPS Data Archive.
- ISPS publicly announces the availability of data on the ISPS website and elsewhere.
The ISPS Data Archive is intended for use by social science researchers, policy-makers, and practitioners who are conducting or analyzing field (and other) experiments in various social science disciplines. Currently, Replication Files originate with ISPS-affiliated scholars. For inquiries please contact ISPS .
ISPS-affiliated authors and PIs areexpected to provide raw data and other information related to ISPS-supported research (e.g.,instructions, treatment manuals, questionnaires, software, details of procedures, etc.). Deposits should include all data and documentation necessary to independently read and interpret the data collection. To use the ISPS Data Archive, authors and PIs are *required* to deposit the following types of files:
- Data File(s)
- Program File(s)
- Publication Citation
- Link to publication
Other types of files are *encouraged* but not required:
- Output File(s)
- Study metadata
- Treatment Materials
- Supplementary Materials
Datasets are accorded a high priority for inclusion in the ISPS Data Archive when:
- The data are not available anywhere else, or are not likely to be available elsewhere in the future.
- The data are in the public domain.
- Copyright is clear.
- Copyright owners agree to ISPS Data Archive dissemination policies.
- The dataset adheres to standards for privacy and confidentiality.
- The technical documentation is complete.
- The data are in a format that facilitates ease of use.
Key instructions for preparing data and documentation are in the How to Archive section. For a discussion of best practice in preparing data for sharing, please refer to the Best Practices and Tips section; or contact us directly.
The original deposit includes the original files, the submitted Deposit Agreement Form <link>, and the submitted Study-Level and File-Level Metadata Form <link> received from depositors. Upon receipt of a deposit, ISPS Data Archive team processes the submitted files to ensure that confidential information has not been included in the data, fill gap in documentation, prepare a record of preservation actions over time, and produce distribution versions of the files, which are disseminated via the ISPS website. To prepare a data file for public access, authors and PIs should remove personal identifiers contained in variables that allow direct or indirect identification of individuals and include:
- Addresses, including ZIP codes
- Telephone numbers, including area codes
- Social Security numbers
- Other linkable numbers such as driver license numbers, certification numbers, etc.
- Detailed geographic information (e.g., state, county, or census tract of residence)
- Organizations (to which the respondent belongs)
- Educational institutions (from which the respondent graduated and year of graduation)
- Exact occupations
- Place where respondent grew up
- Exact dates of events (birth, death, marriage, divorce)
- Detailed income
- Offices or posts held by respondent
To archive with the ISPS Data Archive, please follow these steps:
- Contact Limor Peer, Associate Director for Research, for instructions on how to transfer files to the ISPS Data Archive (limor.peer(at)yale(dot)edu).
- Fill out the Study-Level and File-Level Metadata Form <link>.
- ISPS Data Archive team will (a) Create copies of the files and standardize file names according to ISPS Data Archive naming conventions, (b) Clean data files, including adding or clarifying variable- and value-labels, and ensuring that confidential information is not included in the data, (c) Confirm replication of published results, (d) Convert Data Files to ASCII when needed, (e) Convert program files to R when needed, (f) Create a codebook with variable and value labels in XML format and (g) Create a metadata file describing the study and associated files, (h) Invite you to review the files.
- Sign the ISPS Data Deposit Agreement Form <link>.
ISPS Data Archive team creates a metadata record describing the Replication Files as well as individual files. Metadata is the documentation required to describe and understand data and other files, and allows general use, cross-collection discovery, and interoperability. ISPS metadata records conform to the Data Documentation Initiative (DDI) requirements and include a minimal set of the internationally used Dublin Core metadata elements.
Metadata collected at the individual file level Data file number ISPS file ID Description A summary of the content of the file File size File format Access Whether users may download the file
The ISPS Data Archive currently allows only two levels of access: Public and Private. Public access means that any user can download a given file. Private access means that no user (including the data owner, author, or PI) can access a given file. Responsibility regarding the restriction of access to sensitive data ultimately resides with the data owner. Authors and PIs should notify ISPS Data Archive team if a data file contains confidential information or otherwise requires restricted access <link to deposit agreement>. Steps will be taken by ISPS Data Archive team to archive that data file with restricted access. The default access for newly created metadata records in the ISPS Data Archive is to be publicly viewable, while the data files themselves are either public or restricted, depending on author or PI request. Publicly accessible metadata records are discoverable by search engines, and linked to from other parts of the ISPS web site. Public data in the ISPS Data Archive are available for download directly from the ISPS website, in accordance with any access constraints set by the data owner. ISPS takes the confidentiality of individuals whose personal information may be part of archived data very seriously, and takes steps to protect confidential information, including:
- Instructions to authors and contributors regarding how to prepare a data file for public access.
- Rigorous review of all datasets to assess disclosure risk
- Recode variables if necessary to protect respondents’ confidentiality
- Limiting access to datasets where risk of disclosure remains high
- Training of staff and consultation with data producers to reduce disclosure risk (the possibility that a data record in a study can be linked to a specific person thereby revealing information about that person that otherwise would not be known); see more resources at ICPSR website.
Please submit the complete and up-to-date data file(s) that you used to generate results in your paper. Please also include weights and constructed variables if applicable.
- ASCII format is preferable (system files created in older versions of statistical packages may have limited readability and usability in the future). This format maximizes the potential for use across different software packages, as well as prospects for long-term preservation.
- A comma delimited file is easy to create – use StatTransfer or similar software to convert to an Excel CSV file.
- If you have a dataset in Stata or similar, please include it as well.
- If you’re working in R with an R dataset, please also generate a comma delimited file.
File naming conventions: The contents of the file should be easily identified from its name. The data file name should identify the author, publication (e.g., name of journal), year. For example: “Gerber_Green_Larimer_APSR_2008.dat”.
- If you have more than one data file (for example, if you have a data set for each experiment, or from various sources), please name each file clearly identifying either the number or type of experiment, geographic location, date, or data source in the name. For example “Gerber_Huber_APSR_2009_ExperimentA.dat”.
- Variable labels and value labels should clearly describe the information or question recorded in that variable (see more in “Codebook” below).
- When applicable, all identifying information should be removed from the records to ensure confidentiality.
- Please submit the relevant program file(s) that accompany the data file(s). Make sure you include all syntax that produces the tables and figures that appear in the published manuscript.
R format is preferable.
- If you have a syntax file in another program (e.g., a Stata .do file), please forward that as well.
Naming conventions: The contents of the file should be easily identified from its name. If you have one data file, please name it to identify the author, year, publication (e.g., name of journal). For example: “Gerber_Green_Larimer_APSR_2008.do”
- Make sure this name corresponds to the data file.
- If you have more than one program file (for example, if you have separate .do files for each table), please name each file clearly using the main name (as above) and short identifier (e.g., “Gerber_Huber_APSR_2009_table1.do”).
- Please submit an output file showing the results of using the program and data files.
- Please also include summary statistics (frequency distributions, means, etc.) of all variables. Unweighted frequency distribution should show both valid and missing cases.
- .txt or .log formats are preferable.
- The codebook is critical to the interpretation of your data and output files. The codebook should provide information about each variable, including variable label and value label (see more below). Each factor variable in the data collection should have a set of exhaustive, mutually-exclusive, and clearly defined codes.
- .txt format is preferable; other formats are acceptable.
For each variable, the following information should be provided:
- Location in the data file. Ordinarily, the order of variables in the documentation will be the same as in the file; if not, the position of the variable within the file should be indicated.
- Variable name and label. For example, “g2004: Voted in the general elections of 2004.”
- The exact question wording or the exact meaning of the datum. Sources should be cited for questions drawn from previous surveys or published work. For example, “q2: political leaning (exact Q wording: “Do you lean more toward the Democratic or Republican party?” source: ANES)”
- Universe information, i.e., from whom the data was actually collected. If this is a survey, documentation should indicate exactly who was asked the question. If a filter or skip pattern indicates that data on the variable were not obtained for all observations, that information should appear together with other documentation for that variable.
- Value labels. A clear label to interpret each of the codes assigned to the variable. For example, “g2004: 1=yes, 0=no.”
- Missing data codes. Codes assigned to represent data that are missing. Different types of missing data should have distinct codes. For example: “g2004: 9=system missing.”
- Imputation and editing information. Documentation should identify data that have been estimated or extensively edited.
- Details on constructed and weight variables. Datasets often include variables constructed using other variables. Documentation should include “audit trails” for such variables, indicating exactly how they were constructed, what decisions were made about imputations, and the like. Ideally, documentation would include the exact programming statements used to construct such variables. Detailed information on the construction of weights should also be provided.
- Variable groupings. For large datasets, it is useful to categorize variables into conceptual groupings.
Treatment and study materials:
- Electronic copies of materials used to administer the intervention (treatment). For example, mailings, transcripts of robo-calls, summary of curriculum, TV ads, audio files. Also include original instructions. The instructions should be presented in a way that, together with the design summary, conveys the protocol clearly enough that the design could be replicated by a reasonably skilled experimentalist.
- .txt or PDF formats are preferable. If multimedia format, please consult with Limor Peer (limor(dot)peer(at)yale(dot)edu).
Other supplementary documents:
- .txt or PDF formats are preferable.
This may include:
- Survey questionnaires, self-administered questionnaires
- Interview schedules
- Interviewer and coder instructions
- Data collection forms for transcribing information from records
- Paper tests and scales
- Screening forms
- Call-report forms
- Final project report, project summary, or other description of the project
- Informed Consent Statement
- Deposit original full data file – make it restricted if you need to.
- Allow public access to relevant data files.
- Label all variables and labels in the data file.
- Keep all original variables and recode variables in the syntax to create public datasets, or sub-datasets.
- Only include data in a data file; include figures or analyses in additional files.
- Consider aggregating data into fewer, larger files, rather than many small ones. It is more difficult and time consuming to manage many small files and easier to maintain consistency across data sets with fewer, larger files. It is also more convenient for other users to select a subset from a larger data file than it is to combine and process several smaller files. Very large files, however, may exceed the capacity of some software packages. In such cases, files might be grouped by data type, site, time period, measurement platform, investigator, method, or instrument. Alternatively, files can be compressed. Please contact Limor Peer at limor(dot)peer(at)yale(dot)edu.
- File names should be meaningful, and ideally, describe content, date range, geographic location, and version information. ISPS staff will modify file names to adhere to ISPS Data Archive naming conventions. These file names, in most cases, include the original file name.
- See Cornell University’s DataStaR document on preparing data for archiving.
- See University of Michigan’s ICPSR document on preparing data for archiving
- American Statistical Association, Data Access and Personal Privacy: Appropriate Methods of Disclosure Control
- Confidentiality and Data Access Committee (CDAC) forum for staff members of Federal statistical agencies
- Census Bureau Standard for Disclosure Review
- ICPSR, University of Michigan
- DataStaR, Cornell University
ISPS Data Archive Deposit Agreement Form
ISPS Data Archive Study-level and file-level Metadata Form <coming soon>