Brilliaz

Econometrics

Applying endogenous switching regression using machine learning first stages to correct for selection in program evaluations.

Endogenous switching regression offers a robust path to address selection in evaluations; integrating machine learning first stages refines propensity estimation, improves outcome modeling, and strengthens causal claims across diverse program contexts.

By Nathan Turner

August 08, 2025

In program evaluation, selection bias arises when treated and untreated groups differ in unobserved ways, leading to biased estimates of an intervention’s impact. Endogenous switching regression (ESR) provides a structured way to model these shifts by allowing the outcome equation to depend on a latent treatment choice, thereby capturing the interdependence between selection and outcomes. The classic ESR approach uses instrumental variables or exclusion restrictions to identify switching behavior. However, real-world data often present weak instruments and complex nonlinearity. Introducing machine learning first stages helps relax parametric assumptions, uncover richer predictors, and yield more accurate propensity scores that feed ESR estimation with sharper separation between treated and control potential outcomes.

The core idea is to blend flexible predictive models with structural equations that reflect economic decision processes. In the first stage, machine learning algorithms — such as gradient boosting, random forests, or neural nets — predict the probability of receiving the program while incorporating a broad set of covariates, including interactions and nonlinearities. This generated propensity box serves as an input to the second stage, where ESR translates observed choices into corrected outcome differences. The key is to maintain interpretability by constraining the machine learning layer to supply support variables rather than final effect estimates, thereby preserving the causal framework of the ESR specification.

High-dimensional prediction strengthens the selective participation framework with richer signals.

When applying ESR with ML-driven first stages, researchers must guard against overfitting and ensure that the predicted propensity captures genuine decision drivers rather than spurious correlations. Cross-validation, out-of-sample testing, and regularization help prevent leakage from the outcome model into the selection mechanism. Additionally, careful feature engineering—such as domain-specific proxies, policy eligibility indicators, and time-varying controls—can reveal the nuanced choices individuals make about participation. The resulting ESR then interprets the residual differences in outcomes after accounting for selection, enabling more credible counterfactual comparisons between treated and untreated groups under plausible assumptions about the latent switching process.

A critical benefit of this hybrid approach is resilience to model misspecification. Traditional ESR may falter when the switching mechanism interacts with unobservables in ways a simple linear specification cannot capture. By letting ML first stages model complex relationships, the estimator accommodates nonlinearity, heterogeneous effects, and high-dimensional covariates. The challenge is to maintain a coherent structural interpretation: the machine learning step informs the likelihood of treatment, while the ESR component translates this into corrected outcome estimates under a recognizable economic model of participation. Practitioners should report both predictive performance and structural diagnostics to demonstrate the robustness of their conclusions.

Heterogeneous switching insights guide targeted policy design and evaluation.

In practice, the first-stage model outputs must be aligned with the ESR’s identification strategy. If the same covariates influence both participation and outcomes, or if instruments are weak, the ESR estimates may still be biased. To mitigate this, researchers can employ orthogonalization techniques, where the ML predictions are residualized before entering the ESR equations. This step reduces the contamination of the outcome model by predictive features not orthogonal to treatment status. Sensitivity analyses, such as placebo tests or falsification checks, further validate that the estimated switching effect reflects the data-generating process rather than incidental correlations.

Another practical consideration is the interpretation of transfer effects across subgroups. ML-first stages often reveal heterogeneous participation patterns, suggesting that ESR should allow for subgroup-specific switches. By estimating separate ESR components for distinct populations, analysts can uncover differential selection dynamics and varying returns to the program. This granularity informs policy design, indicating not only whether a program works on average but for whom it is most effective. Transparent reporting of subgroup results, along with confidence intervals and falsification tests, helps ensure findings are actionable and credible to stakeholders.

Careful specification and validation ensure credible causal inferences in practice.

The theoretical underpinnings of ESR with ML first stages rest on simultaneous equations that acknowledge mutual dependence between treatment choice and outcomes. Conceptually, the model allows an individual’s outcome to reflect both the treatment’s direct effect and the selection process that led to treatment. Practically, researchers estimate a system where the first equation describes the probability of participation via ML predictions, while subsequent equations model outcome differentials conditional on the predicted participation. This approach yields corrected treatment effects that reflect what would happen if participation were altered, holding other factors constant and accounting for selection.

To implement this approach rigorously, one must specify the ESR structure carefully. The model should include a robust set of covariates that captures observed determinants of participation, as well as plausible exclusion restrictions that justify the latent switching mechanism. Diagnostic checks, such as balance tests and placebo outcomes, help confirm that the first-stage predictions balance covariates across treated and untreated groups after controlling for predicted participation. Ultimately, the ESR estimates illuminate the net effect of the program by adjusting for selection biases that standard regression or naïve comparisons overlook.

Transparent assumptions and careful communication underpin credible results.

Beyond estimation, researchers should emphasize generalizability. Endogenous switching models gain external value when applied to diverse contexts, populations, and program types. Cross-country or cross-sector applications test the resilience of the ML-informed ESR against different data-generating processes. When results replicate across settings, policymakers gain confidence that the method captures a stable mechanism by which selection biases distort measured effects. Documentation of data sources, model choices, and diagnostic outcomes is essential for replication, enabling other analysts to verify findings or adapt the approach to new evaluation challenges.

The interpretive burden also includes communicating assumptions clearly. Endogenous switching regression relies on the notion that unobserved factors influence both participation and outcomes in a systematic way. While ML stages reduce reliance on rigid functional forms, they do not eliminate the need for plausible economic reasoning. Analysts should articulate their exclusion restrictions, justify instrument choices, and describe how the latent switching mechanism maps onto real-world decision processes. Clear articulation of these elements strengthens the credibility of the causal claims drawn from ESR with machine-learned first stages.

Finally, the integration of ML and ESR invites a rigorous uncertainty assessment. Standard errors may need adjustment to reflect the two-stage estimation, and bootstrap methods can provide finite-sample refinements. Researchers should report variance decompositions to show how much of the uncertainty stems from prediction error in the first stage versus the structural ESR parameters. Monte Carlo simulations tailored to the data context help illustrate finite-sample properties and potential biases under misspecification. By presenting a transparent uncertainty profile, analysts offer a more nuanced interpretation of the corrected treatment effects and their policy implications.

As evaluation practice evolves, the combination of endogenous switching with machine learning first stages stands out for its balance of flexibility and rigor. It respects the theory-driven need to model selection while embracing data-driven tools to capture complex patterns. When implemented with careful design, validation, and transparent reporting, this approach yields robust, policy-relevant estimates of program impact across heterogeneous environments. The result is a more credible evidence base that supports informed decision-making and fosters trust in causal conclusions derived from observational data.

Estimating the role of expectations in macroeconomics by combining survey data and machine learning signal extraction.

By blending carefully designed surveys with machine learning signal extraction, researchers can quantify how consumer and business expectations shape macroeconomic outcomes, revealing nuanced channels through which sentiment propagates, adapts, and sometimes defies traditional models.

Get marketing news you’ll actually want to read