Perspectives on Data and Statistics in Health Policy

Jun 9
6 min read

Data and statistics are central to effective health policy by informing evidence-based decision-making and strategic planning. They can reveal disease patterns, utilization trends, and outcomes across populations to guide priority setting and resource allocation. Statistical analysis can evaluate interventions, compare policy options, and measure cost-effectiveness and value. Timely data can enable surveillance, early warning, and rapid response to emerging health threats. Transparent metrics strengthen accountability and public trust, and continuous monitoring can support learning and adjustment. Without rigorous data collection and sound statistical methods, health policies risk inaccuracy, inefficiency, bias, and unintended consequences for the public. For further insights, we interviewed Dr. Summer Han from Stanford University.

Robert: Could you start off by telling us a little bit about yourself and your day-to-day work at Stanford? What do you enjoy most about your job?

Summer: I am an Associate Professor at Stanford University with appointments in Medicine, Neurosurgery, and Epidemiology & Population Health, and I serve as the Director of the Cancer Data Science Shared Resources Core at the Stanford Cancer Institute. In this capacity, I lead the Han Lab, where we focus on the intersection of data science and oncology – specifically developing statistical and machine learning methods to improve cancer screening, risk prediction, and health policy. My day-to-day is a dynamic mix of methods development and clinical collaboration. One hour I might be working with a PhD student on a theoretical framework for handling missing data in multi-modal electronic health records (EHRs), and the next I am meeting with oncologists to discuss how to translate those findings into clinical guidelines. What I enjoy most is the “translation gap.” As statisticians, we often work in the abstract; but in my role, I get to see how rigorous quantitative methods can directly influence patient care and public health guidelines. There is a unique satisfaction in seeing a mathematical model evolve into a policy that actually saves lives.

Robert: What opportunities does Stanford offer for students or others interested in health policy?

Summer: Stanford offers a uniquely porous ecosystem for health policy. Because the School of Medicine, the Department of Statistics, and the Computer Science Department are all in close proximity, students can easily work at the intersection of these fields. For those interested in policy specifically, we have the Department of Health Policy and the Stanford Health Policy (SHP) Institute, which are hubs for interdisciplinary research. My own lab, for instance, sits in the School of Medicine, allowing trainees to access massive real-world datasets and work directly with clinicians who are asking policy-relevant questions. We encourage students to not just “crunch numbers,” but to understand the clinical and economic context of the data they are analyzing.

Robert: How would you describe the role of statistics in shaping and evaluating health policies?

Summer: Statistics provides a framework for asking and answering “what if” questions about policies. We can use designs and analytic methods to estimate causal effects of interventions, compare alternative policy scenarios through decision models, and monitor outcomes and disparities once policies are implemented. In my view, statistics is not just a set of tools for analyzing data; it structures policy debates around uncertainty, trade-offs between benefits and harms, and distributional impacts across different populations.

Robert: From your experience, can you give me an example where statistical analyses significantly influenced a health policy decision?

Summer: A prime example from my own field is the evolution of National Lung Cancer Screening guidelines. While large randomized trials – specifically the National Lung Screening Trial (NLST), which established that low-dose CT screening reduces lung cancer mortality – provided proof of principle, they are inherently limited. A single trial cannot empirically test every combination of age, smoking history, and screening frequency. To bridge this gap, the US Preventive Services Task Force (USPSTF) complemented this trial evidence with microsimulation modeling. In our work, we used these models to simulate the natural history of lung cancer across millions of virtual life histories. This allowed us to evaluate the counterfactual effects of hundreds of different screening strategies – essentially asking, “What would the population-level outcomes be if we implemented policy X versus policy Y?” – that were not observed in the actual trials. This application of causal inference via simulations provided critical additional evidence to support expanding eligibility in the 2013 and 2021 USPSTF recommendations, demonstrating how advanced statistical modeling can directly shape national coverage policy.

Robert: What are some effective strategies to communicate complex statistical results to non-expert stakeholders, such as policymakers and the general public?

Summer: I find that using “toy examples” combined with clear visualizations is the most effective strategy. For instance, when explaining causal inference methods like Inverse Probability Treatment Weighting (IPTW) to general public, I avoid the heavy math. Instead, I use a visual analogy: I explain that we are creating a “pseudo-population” to make fair comparisons. If we are comparing Treatment A to Treatment B, but the patients in Group A are older and sicker, we can’t compare them directly. I explain IPTW as “re-weighting” the data – giving more weight to the few young/healthy people in Group A and less weight to the overrepresented older people – so that the two groups look identical on the scale. Once they understand the concept of “balancing the scale” through a simple diagram, the statistical results become more intuitive.

Robert: From your perspective, what are the most common and reliable sources of health data for policymakers today?

Summer: We are in a transition period. The “gold standard” sources remain the large, curated population-based registries (like SEER for cancer) and national surveys (like NHANES), as they are designed to be representative. However, we are increasingly relying on real-world data (RWD) derived from EHRs and claims data. While these sources offer massive sample sizes and granularity, I hesitate to call them “reliable” without qualification, as they are prone to selection bias, missingness, and coding errors. The role of the statistician is becoming less about finding the data and more about cleaning and de-biasing these messy data sources so they can be reliably used for policy.

Robert: What ethical considerations arise when using data and statistical techniques to inform health policy decisions?

Summer: While we often think of ethics in terms of privacy, for a statistician, I believe reproducibility and rigor are ethical imperatives. When our models inform national guidelines that affect millions of lives, we have an ethical duty to ensure our findings are robust. This means performing diligent sensitivity analyses to check every assumption and evaluating how robust the findings are to changes in parameters. Furthermore, there is an ethical obligation to make code open and available. If a policy is based on a “black box” model that cannot be reproduced by other scientists, that is a failure of scientific integrity. Ensuring transparency is how we maintain the ethical trust placed in us by the public. Another pressing ethical issue today is algorithmic fairness and health equity. Models trained on historical data often encode historical biases. For example, in lung cancer risk prediction, many older models were developed using cohorts of heavy smokers who were predominantly white males. When we apply these models to diverse populations – such as Asian females who may develop lung cancer despite being non-smokers – the models often fail, leading to under-screening and delayed diagnosis for those groups. As statisticians, we have an ethical obligation to audit our models for performance disparities across race, gender, and socioeconomic status before suggesting they be used for policymaking. We must ensure that “optimizing population health” does not come at the expense of marginalized subgroups.

Robert: How is the increasing surge of big data, machine learning (ML), and artificial intelligence (AI) impacting the use of statistics in health policy today?

Summer: It is shifting the paradigm from “one-size-fits-all” to “precision policy.” Traditionally, policy has relied on broad averages (e.g., “screen everyone over age 50”). With AI and multi-modal data (combining imaging, genetics, and EHR), we can identify risk with much higher granularity. This allows for risk-stratified policies where resources are allocated more efficiently to those who need them most. However, this surge brings challenges. AI models are often “black boxes.” For policy, interpretability is crucial – we need to know why a model is flagging a population as high-risk to design the right intervention. The intersection of causal inference and machine learning is the new frontier – using ML to handle high-dimensional data while using statistical principles to ensure we are making valid causal claims.

Robert: What improvements would you like to see in the way statistics are integrated into the health policymaking process?

Summer: I would like to see statisticians involved at the conceptualization phase, not just the analysis phase. Often, we are brought in after the data is collected or the policy is drafted to “run the numbers.” If statisticians are involved from the start, we can design better data collection mechanisms that minimize bias and ensure the appropriate variables are captured to answer the policy question. Furthermore, I advocate for greater transparency and reproducibility in policy modeling. The code and assumptions behind major health policy decisions should be open-source and auditable, allowing the broader statistical community to validate and improve upon the evidence base.

Perspectives on Data and Statistics in Health Policy

Recent Posts

Comments