Dr. Stanislav (Stas) Kolenikov
A survey is a data product with a clearly specified target population (such as civilian noninstitutionalized adults 18+), a sampling frame (such phone numbers or street addresses), a sampling method (random digit dialing or stratified address-based sample), an instrument (a set of questions asked of the survey participant), and in most cases, an interaction with the sampled members of the population to receive answers to these questions.
The primary, if not the only, purpose of surveys is observation. Survey researchers aim to observe health behavior and outcomes in situ, and minimize the impact on the health behaviors that the mere fact of participating in a study may have on the participants. (C.f. clinical trials where the first action by the researcher al is to intervene in a health condition by assigning treatment; c.f. also mobile health work where the goal is often to shift participants towards leading a healthier lifestyle.)
Most if not all of the U.S. health surveys are government surveys. The federal surveys, available from the National Center for Health Statistics (https://www.cdc.gov/nchs) website, include:
- National Health Interview Survey (NHIS), an annual face-to-face survey of approximately 35,000 households containing about 87,000 persons (both adults and children)
- National Health and Nutrition Examination Survey (NHANES), a continuous in-person survey with a face-to-face interview and biological specimen data collection of about 5,000 persons each year, collected in about 15 locations across the U.S. each year
- National Immunization Survey (NIS), a survey that combines extensive phone screening for rare populations of children 19-35 months (about 8 million phone numbers are dialed per year to yield about 22,000 interviews) and adolescents 13-17 years of age, and administrative records (state Immunization Information Systems, IIS)
- Medical Expenditure Panel Survey (MEPS), sponsored by and available from AHRQ, a set of in- person interviews of families and individuals, phone or medical records abstraction data collection from the household survey participants’ medical providers, and mail/web interview with the participants’ insurance providers, including employers. MEPS draws its sample from the NHIS respondents, resulting in about 13,000 families and 33,000 individuals sampled from NHIS (about 1/3 of the original sample)
- National Survey of Drug Use and Health (NSDUH), an annual in-person survey of about 100,000 individuals on topics of illegal and prescription drug, alcohol and tobacco use, mental disorders and treatment, collected for the Substance Abuse and Mental Health Services Administration (SAMHSA)
Besides the continuously and regularly collected surveys, federal agencies have other data collection efforts that are infrequent or one-off in nature. E.g., the U.S. Department of Veterans Affairs collected Comprehensive Health Assessment Interview (CHAI) in 2018 on the effects of military service, deployment, and combat on the health and well-being of Veterans who served during Operation Enduring Freedom (OEF), Operation Iraqi Freedom (OIF), and Operation New Dawn (OND). CHAI included a self-report web survey of about 15,000 veterans and about 4,500 comparison civilians, and a neurocognitive assessment of a subsample of veterans.
At the state level, the major health survey data collection effort is Behavioral Risk Factor Surveillance System (BRFSS), co-sponsored by the U.S. states / District of Columbia / U.S. territories, Centers for Disease Control and Prevention (CDC), and other federal agencies. BRFSS surveys are collected on the phone; the surveys are contracted out by the states that often collect additional modules, such as childhood asthma, HPV, diabetes, or other health topics of interest to the states.
Several large cities have created their health surveys that are largely modeled after BRFSS in terms of study design, including phone mode of data collection, and major topics and survey questions. These include the Los Angeles County Health Survey, The New York City Community Health Survey, and the Healthy Chicago Survey.
Standalone in the health policy arena are surveys of public opinion and public knowledge and awareness of health policies. Kaiser Family Foundation, which is independent from Kaiser Permanente and Kaiser Industries, has been running a tracking poll on the public’s opinions, knowledge, and experiences with the U.S. health care system. About 1,200 people are interviewed over the phone each month for this poll.
Population surveys may be one of the few ways to reach and study the “last mile” individuals who are less attached with the major health delivery systems, e.g., the low double digit percentage of the adult population who don’t have insurance, or single digit percentage of the children population that are not vaccinated. Unfortunately, the problems are often confounded, in that those individuals tend to be of lower socio- economic status, and may not have the ways and means to respond to survey (e.g. lack internet access to respond to web surveys, or even lack phone service and be unreachable by surveys like BRFSS or NIS). They may also be less trusting of the “big government”, and be more likely to refuse to participate in those surveys.
One of the most statistically interesting – and challenging – aspects of health surveys is that their data are collected in ways that violate the i.i.d. assumptions that are taught in most statistics classes. Complex survey designs are really saving costs given that no register of the U.S. population exists that one could sample from, and/or given that interviewers often need to be sent to collect the data – or, as is the case with NHANES, two freight trucks with the lab equipment need to be stationed, and survey respondents be brought in for specimen data collection. All of those travel costs have to be factored in.
Minimizing the variance of estimates subject to the cost constraints has been the driving force of survey statistics since the seminal Neyman’ 1934 paper.) In face-to-face surveys like NHIS and NHANES, samples are collected in a multi-stage designs, where hierarchies of geographies are sampled in turn (e.g., counties, census tracts, housing units, households, and individuals in households). Statistical inference needs to be corrected for the clustered nature of the data, as well as unequal probabilities of selection. Even if families were to be sampled with the same probabilities, if one person is selected from a family, as is done in NHIS, then individuals in one-person families will be overrepresented relative to those in two- or three-person families. In phone surveys, individuals with both landline and cell phones, or with multiple cell phones, will be overrepresented vs. those who only have one cell phone and no landline. Most of modern software packages have implementations of the appropriate methods to analyze survey data that provide correct finite population inference, including library(survey) in R; svy suite in Stata; PROC SURVEY procedures in SAS. Support in SPSS is limited, and only available in the additionally purchased module. The current author is not aware of a full implementation of survey procedures in Python.
In terms of health policy statistics that is the focus of HPSS, health surveys provide indispensable information regarding variation in space, time, and across demographic groups that can still inform health policy. Government agencies use this information to allocate the existing resources across these dimensions, and will continue relying on health surveys in the foreseeable future. At ICHPS 2018, three major federal surveys (MEPS, NHANES and MCBS) presented their products, and explained how these survey data both inform the current decision making, and provide the backbone for the academic researchers’ own substantive work.
This brings us to the current challenges that surveys as data products face. One very highly visible challenge is that of growing nonresponse. Response rates have been going down in literally each and every survey that has been collected over sufficiently long period. What used to be the lowest possible bar on response rates in 1980s or 1990s, like 80%, is now a barely attainable goal that requires extensive and costly efforts. Currently, in-person surveys have response rates in the range from 50% to 80%; phone surveys, in the range from about 5% to about 30%; and mail surveys, from about 2% to 40% for the best designed surveys with multiple mailings. Response rates depend heavily on the topic and the length of the survey; the survey sponsor and the brand name of the organization that collects the data on the sponsor’s behalf; call protocols; survey languages and cultural accommodations; and many other study design components. Phone surveys, in particular, have faced a precipitous drop in response rates in the past two or so years, attributed to call blocking and screening on cell phones that is implemented at the call level by telecom providers and mobile operating systems.
Another challenge that health surveys are facing is that of human capital and staffing. The workforce of federal statistical agencies is ageing and retiring, but there is no new generation of survey statisticians to step in, as there are very few survey statisticians trained. We are fairly close to a very inverted age pyramid situation when the number of the ASA lifetime award winners affiliated with survey statistics , such as Founders Award (about 2 per year) and ASA Fellows (ranging from 3 to 8 per year), is comparable to the number of Ph.D.s trained in survey statistics (which tend to come from only four programs: Joint Program in Survey Methodology at University of Maryland; Program in Survey Methodology at University of Michigan; and two strong academic departments of statistics at Iowa State and Colorado State).
Demand for survey statisticians is nearly insatiable, with federal agencies alone seeking to fill about 50 vacancies per year according to www.usajobs.gov. Given roughly equal split of the survey statistics world between government, industry and academia, similar number of vacancies should be expected to exist in these other sectors, too. Organizations collecting survey data find themselves hiring statisticians trained in other fields, as well as quantitative social scientists, and retraining them to become survey experts, which is a process that is long and arduous. So – consider a career in survey statistics – you will find a plethora of opportunities and a chance to make a real impact.
For additional reading, see Korn & Graubard (1999) that covers issues of statistical analysis and inference, and edited volume by Johnson (2015) that covers all aspects of survey design, including sampling, measurement, field issues, special populations, and analysis.
References: Korn, E. L., and Graubard, B. I. (1999). Analysis of Health Surveys. Wiley: New York, NY.
Johnson, T. P. (2015). Handbook of Health Survey Methods. Wiley Handbooks in Survey Methodology, Wiley: New Hoboken, NJ.
Dr. Stas Kolenikov is Principal Survey Scientist at Abt Associates. He specializes in methods for the sampling design and analysis of complex survey data, with special emphasis on survey weighting. (The opinions and views represented here are the author’s own and do not reflect any group for which the author has an association.)