Data FAQs (preliminary version Ackert 9-19-10 with
menu, Mindy updated links 1-11-11)
Click to jump:
+ How are variable names constructed?
+ Were the same questions asked every year?
How do I know which questions were asked during a specific year?
+ What are the Carnegie Classifications?
+ What is the National Student Clearinghouse?
Which variables does the BHS obtain from the Clearinghouse?
+ What are the SEI and Industry Codes?
+ What is the College Success Foundation?
+ Are there are recorded variables in the BHS
+ What is the “Variable Lookup Table”?
+ Who has access to the "Variable Lookup
+ Are any of the data suppressed?
+ How are write-in responses handled?
+ Where do I find information about the data
including sampling frame, sample size, and response rates?
How are variable names constructed?
The BHS data set contains data from multiple
Much of the data is taken from three major survey instruments: the student
survey, given to students in their senior
year, the parent
survey, given to parents of high school seniors, and the follow-up
survey, given to students one year after their senior year in high
school. Variables created from these three surveys are given a prefix
that indicates the source of the data. The following three prefixes are
used for variables that come from survey questionnaires:
s- student survey
p- parent survey
f- follow-up survey
These prefixes are followed by a number that
the corresponding question number on the survey instrument. For
instance “s002” refers to question 2 on the student survey. Some
questions contain multiple parts. For example, question “s005_b”
to part b of question 5.
Other variables were created using
information from the
Carnegie Classifications, the National Student Clearinghouse, and the
College Success Foundation. For information on the naming conventions
for Carnegie Classification codes for college plans and college
attendance from the one-year follow-up and Clearinghouse data, please
consult the appropriate memo.
contain the prefix “gates” (there are only four College Success
See FAQs below for more information about the
Carnegie Classifications and Clearinghouse data.
^is this sentence necessary?
Were the same questions asked every year?
How do I know
which questions were asked during a specific year?
The student questionnaire was initially
2000. After this year, the questionnaire was modified slightly. Some
questions from the 2000 survey were maintained in the 2002-2005
surveys, but others were dropped or added. In 2005, some questions were
also added to the survey, and do not appear in earlier questionnaires.
An overview of question comparability between survey years is available
What are the Carnegie Classifications?
The Carnegie Classifications are a set of codes that
offer a succinct description of the characteristics of higher education
institutions in the United States. As the Carnegie Classification
website states, “…the Carnegie Classification has been the leading
framework for describing institutional diversity in U.S. higher
education. It has been widely used in the study of higher education,
both as a way to represent and control for institutional differences,
and also in the design of research studies to ensure adequate
representation of sampled institutions, students, or faculty.” In the
BHS data, the Carnegie Classifications are used to code post-secondary
institutions. The Carnegie Classifications were applied to write-in
data from the follow-up survey, such as college names, and to data
obtained from the National Student Clearinghouse.
For more on the Carnegie
Classifications, consult the
What is the National Student Clearinghouse?
variables does the BHS obtain from the Clearinghouse?
The National Student Clearinghouse is a
organization that provides information on post-secondary enrollment and
degree completion. The Clearinghouse provides an educational record
verification service that is used by major employers, student service
providers, insurance companies, credit issuers, and the U.S. Department
of Education. The Clearinghouse data allows the BHS team to examine
post-secondary enrollment patterns and graduation rates among the
survey respondents. All Clearinghouse data that refers to
post-secondary institutions is coded using the Carnegie Classifications.
Clearinghouse website and
What is the College Success Foundation?
The College Success Foundation (CSF)
Achievers (WSA) Scholarship program. The College Success Foundation
provided information on application to the WSA program, receipt of the
scholarship, and attendance at a WSA high school. These variables are
labeled with the prefix “gates.”
Are there are recorded variables in the BHS
The majority of variable recoding in the BHS
left to the analyst. The BHS team has posted memos with relevant SPSS
syntax for the most commonly recoded variables. Researchers working
with BHS data are highly encouraged to consult these memos.
The BHS team has recoded some variables in
the BHS data
include race variables (racewhit, raceafam, racenatv, raceoasn,
raceeasn, racecamb, raceviet, racefili, racenhopi, racehisp, racemexi,
racemiss) and immigrant generation
variables (gen1st, gen2nd, gen3rd). For more information on these
variables, please consult the following memos:
What is the “Variable Lookup Table”?
The Variable Lookup Table is similar to
can be found on
the websites of large-scale surveys. It has a search function that
allows the researcher to look up variables that are pertinent to a
specific area of inquiry. The Variable Lookup Table also contains a
description of the variable and the variable’s frequency distribution.
Who has access to the "Variable Lookup
Are any of the data suppressed?
Some of the BHS data is suppressed in order
anonymity. Any question involving a write-in field is recoded as a
variable. In addition, some questions did not become variables because
they would identify the respondent or had cell sizes that were too
How are write-in responses handled?
Any write-in text field in the
questionnaire has been
recoded to protect respondent anonymity. Post-secondary institutions
were recoded using Carnegie Classifications and occupational
information was recoded using census Occupation and Industry codes or
SEI codes. Write-ins for race were used to create racial codes and
classifications (see FAQ above regarding recoded variables). Place of
birth was also a write-in response, and was recoded into categories
that were used to create immigrant generation variables. To identify
which questions had write-in responses, please consult the survey