Useful resources on methods

Introductory Texts

These are some of the texts that helped me get started with quant methods. I still return to them regularly when I want a refresher. Essential.

Wooldridge, J. M. (2015). Introductory Econometrics: A modern approach. Nelson Education.

Angrist, J. D., & Pischke, J. S. (2014). Mastering ‘metrics: The path from cause to effect. Princeton University Press.

Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.

Imai, K. (2017). Quantitative social science: an introduction. Princeton University Press.

Gerber, A. S., & Green, D. P. (2012). Field experiments: Design, analysis, and interpretation. WW Norton.

Causal Inference

Rosenbaum, P. R. (2005) Observational study. In Everitt, B., & Howell, D. C. (Eds.). Encyclopedia of statistics in behavioral science (pp. 1809–1814). Link here.

Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society: Series A, 171(2), 481-502.

Shadish, W. R. (2010). Campbell and Rubin: A primer and comparison of their approaches to causal inference in field settings. Psychological Methods, 15(1), 3.

Imbens, G. W., & Rubin, D. B. (2015). A Classification of Assignment Mechanisms. In their book Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.

Imbens, G. W., & Rubin, D. B. (2015). A Brief History of the Potential Outcomes Approach to Causal Inference. In their book Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.

Rubin, D. B. (2008). For objective causal inference, design trumps analysis. The Annals of Applied Statistics, 808-840.

Steiner, P. M., Kim, Y., Hall, C. E., & Su, D. (2017). Graphical models for quasi-experimental designs. Sociological Methods & Research, 46(2), 155-188.

Surveys & Tutorials: Statistics & Modelling

Textbooks are long and expensive. Sometimes you just want a clear overview.

Steve Strand’s open access course on using regression methods in educational research. Link here.

McNeish, D., & Kelley, K. (2018). Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychological Methods.

Visualising Hierarchical Models. Link here.

Kreuter, F., & Valliant, R. (2007). A survey on survey statistics: What is done and can be done in Stata. Stata Journal, 7(1), 1.

Waldmann, E. (2018). Quantile regression: A short story on how and why. Statistical Modelling, 18(3-4), 203-218.

Landau, S. (2002). Using survival analysis in psychology. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1(4), 233-270.

Lei, P. W., & Wu, Q. (2007). Introduction to structural equation modeling: Issues and practical considerations. Educational Measurement: issues and practice, 26(3), 33-43.

Sterba, S. K. (2009). Alternative model-based and design-based frameworks for inference from samples to populations: From polarization to integration. Multivariate behavioral research, 44(6), 711-740.

Lee, Y. R., & Pustejovsky, J. E. (2023). Comparing random effects models, ordinary least squares, or fixed effects with cluster robust standard errors for cross-classified data. Psychological Methods.

Surveys & Tutorials: Evaluation methods

Textbooks are long and expensive. Sometimes you just want a clear overview.

Deaton, A., & Cartwright, N. (2017). Understanding and misunderstanding randomized controlled trials. Social Science & Medicine.

St Clair, T., & Cook, T. D. (2015). Difference-in-differences methods in public finance. National Tax Journal, 68(2), 319.

Olden, A., & Møen, J. (2022). The triple difference estimator. The Econometrics Journal, https://doi.org/10.1093/ectj/utac010

Hallberg, K., Williams, R., Swanlund, A., & Eno, J. (2018). Short Comparative Interrupted Time Series Using Aggregate School-Level Data in Education Research. Educational Researcher, 47(5), 295-306.

Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2019). A practical introduction to regression discontinuity designs: Foundations. Cambridge University Press.

Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2019). A practical introduction to regression discontinuity designs: Extensions. Cambridge University Press.

Lousdal, M. L. (2018). An introduction to instrumental variable assumptions, validation and estimation. Emerging Themes in Epidemiology, 15(1), 1-7.

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399-424…

…this survey article is twinned with this case study and tutorial article…

…Austin, P. C. (2011). A tutorial and case study in propensity score analysis: an application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate Behavioral Research, 46(1), 119-151.

Richwine, C., Luo, Q. E., Thorkildsen, Z., Chong, N. J., Morris, R., Barnow, B. S., & Pandey, S. K. (2022). Defining and assessing the value of canonical mixed methods research designs in public policy and public administration. Journal of Policy Analysis and Management.

Randomisation inference

Senn, S. (2012). Tea for three: of infusions and inferences and milk in first. Significance, 9(6), 30-33.

Rosenbaum, P. (2012). Observation and experiment. Harvard University Press. Part 1.

Rosenberger, W. F., Uschner, D., & Wang, Y. (2019). Randomization: The forgotten component of the randomized clinical trial. Statistics in Medicine, 38(1), 1-12.

Keele, L., McConnaughy, C., & White, I. (2012). Strengthening the experimenter’s toolbox: Statistical estimation of internal validity. American Journal of Political Science, 56(2), 484-499.

Young, A. (2019). Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results. The Quarterly Journal of Economics, 134(2), 557-598.

Heß, S. (2017). Randomization inference with Stata. The Stata Journal, 17(3), 630-651.

Rosenberger, W. F., & Lachin, J. M. (2015). Randomization in clinical trials: theory and practice. John Wiley & Sons.

Effect sizes

Baguley, T. (2009). Standardized or simple effect size: what should be reported?British Journal of Psychology, 100(3), 603–617.

Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253.

Simpson, A. (2017). The misdirection of public policy: comparing and combining standardised effect sizes. Journal of Education Policy, 32(4), 450–466.

Simpson, A. (2018). Princesses are bigger than elephants: effect size as a category error in evidence-based education. British Educational Research Journal, 44(5), 897–913.

Foster, C. (2023). A quotient effect size for educational interventions. International Journal of Research & Method in Education, 1-10. https://doi.org/10.1080/1743727X.2023.2182877

Ost, B., Gangopadhyaya, A., & Schiman, J. C. (2017). Comparing standard deviation effects across contexts. Education Economics, 25(3), 251-265.

Within-Study Comparisons

There are two ways to think about whether a research design will give you a causal effect. One is to use econometric theory to think about the identifying assumptions. This is currently the dominant approach. The other is to conduct empirical studies evaluating whether particular research designs reproduce the results from randomised controlled trials. The latter approach is now beginning to provide substantive guidance on when observational studies do isolate causal effects. Having a good pretest, for example, is worth its weight in gold; using “focal, local” matches helps; stable pretest trends make the analysts life much easier; and RDDs are in many ways as good as RCTs.

Ferraro, P. J., & Miranda, J. J. (2017). Panel data designs and estimators as substitutes for randomized controlled trials in the evaluation of public programs. Journal of the Association of Environmental and Resource Economists, 4(1), 281-317.

Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within‐study comparisons. Journal of Policy Analysis and Management, 27(4), 724-750.

Fortson, K., Gleason, P., Kopa, E., & Verbitsky-Savitz, N. (2015). Horseshoes, hand grenades, and treatment effects? Reassessing whether nonexperimental estimators are biased. Economics of Education Review, 44, 100-113.

Hallberg, K., Wong, V. C., & Cook, T. D. (2016). Evaluating Methods for Selecting School-Level Comparisons in Quasi-Experimental Designs: Results from a Within-Study Comparison.

St. Clair, T., Hallberg, K., & Cook, T. D. (2016). The validity and precision of the comparative interrupted time-series design: three within-study comparisons. Journal of Educational and Behavioral Statistics, 41(3), 269-299.

Hallberg, K., Wing, C., Wong, V., & Cook, T. (2014). Clinical trials and regression discontinuity designs. The Oxford Handbook of Quantitative Methods.

Chaplin, D. D., Cook, T. D., Zurovac, J., Coopersmith, J. S., Finucane, M. M., Vollmer, L. N., & Morris, R. E. (2018). The internal and external validity of the regression discontinuity design: A meta-analysis of 15 within-study comparisons. Journal of Policy Analysis and Management, 37(2), 403-429.

Weidmann, B., Miratrix, L. (2020). Lurking inferential monsters? Quantifying selection bias in evaluations of school programmes. Journal of Policy Analysis and Management link.

Coopersmith, J., Cook, T. D., Zurovac, J., Chaplin, D., & Forrow, L. V. (2022). Internal and external validity of the comparative interrupted time‐series design: A meta‐analysis. Journal of Policy Analysis and Management, 41(1), 252-277.

Common Errors and Misinterpretations

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350.

Wang, L. L., Watts, A. S., Anderson, R. A., & Little, T. D. (2013). Common fallacies in quantitative research methodology. In Masyn, K. E., Nathan, P., & Little, T. (Eds.). The Oxford Handbook of Quantitative Methods, Vol. 2: Statistical Analysis, 718.

Rutkowski, L., Gonzalez, E., Joncas, M., & Von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142-151.

Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science, 9(6), 641-651.

Senn, S. (2013). Seven myths of randomisation in clinical trials. Statistics in Medicine, 32(9), 1439-1450.

Berk, R. (2010). What you can and can’t properly do with regression. Journal of Quantitative Criminology, 26(4), 481-487.

Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation models. In Handbook of causal analysis for social research (pp. 301-328). Springer, Dordrecht.

@MartenvSmeden has an epic twitter thread on misconceptions here

Trafimow, D. (2022). A New Way to Think About Internal and External Validity. Perspectives on Psychological Science, 17456916221136117.

Measurement

A distinctive feature of the social sciences is that many variables of interest are not directly observable. Unless you are willing to restrict yourself to only conducting policy evaluations, this necessitates careful engagement with the science of measurement.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061-1071.

McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412.

Spector, P. E. (2014). Survey design and measure development. The Oxford Handbook of Quantitative Methods, Vol. 1, 170.

Preacher, K. J., & MacCallum, R. C. (2003). Repairing Tom Swift’s electric factor analysis machine. Understanding statistics: Statistical issues in psychology, education, and the social sciences, 2(1), 13-43.

Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321.

Sass, D. A., & Schmitt, T. A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45(1), 73-103.

Favero, N., & Bullock, J. B. (2014). How (not) to solve the problem: An evaluation of scholarly responses to common source bias. Journal of Public Administration Research and Theory, 25(1), 285-308.

Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting: the state of the art and future directions for psychological research. Developmental Review, 41, 71-90.

Weidman, A. C., Steckler, C. M., & Tracy, J. L. (2017). The jingle and jangle of emotion assessment: Imprecise measurement, casual scale usage, and conceptual fuzziness in emotion research. Emotion, 17(2), 267-295.

Cheema, J. R. (2014). A review of missing data handling methods in education research. Review of Educational Research, 84(4), 487-508.

White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: issues and guidance for practice. Statistics in Medicine, 30(4), 377-399.

The Role of Theory

Theory is essential for any evaluation that is not an RCT because it’s necessary to assess the plausibility of the identifying assumptions. A strong theory of selection can also help model the assignment mechanism properly when using propensity score matching. It’s also important for generalising the findings from RCTs, which generally have weak external validity.

Clarke, B., Gillies, D., Illari, P., Russo, F., & Williamson, J. (2014). Mechanisms and the evidence hierarchy. Topoi, 33(2), 339-360.

Illari, P. M., & Williamson, J. (2012). What is a mechanism? Thinking about mechanisms across the sciences. European Journal for Philosophy of Science, 2(1), 119-135.

Cook, T. D. (2014). Generalizing causal knowledge in the policy sciences: External validity as a task of both multiattribute representation and multiattribute extrapolation. Journal of Policy Analysis and Management, 33(2), 527-536.

Healy, K. (2017). Fuck nuance. Sociological Theory, 35(2), 118-127.

Van Lange, P. A. (2013). What we should expect from theories in social psychology: Truth, abstraction, progress, and applicability as standards (TAPAS). Personality and Social Psychology Review, 17(1), 40-55.

Cartwright, N. D. (2013). Evidence: for policy and wheresoever rigor is a must. Link here.

Michie, S., Richardson, M., Johnston, M., Abraham, C., Francis, J., Hardeman, W., … & Wood, C. E. (2013). The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine, 46(1), 81-95.

Scheel, A.M., Tiokhin L., Isager, P.M., & Lakens, D. (in press). Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. Perspectives on Psychological Science.

Trafimow, D. (2022). A New Way to Think About Internal and External Validity. Perspectives on Psychological Science, 17456916221136117.

Stats Visualisations

Sampling Distributions. Link here.

Visualising Hierarchical Models. Link here.

Random Assignment. Links here and here.

Seeing Statistical Theory. Link here.

Leckie, G., Charlton, C., & Goldstein, H. (2016). Communicating uncertainty in school value-added league tables. Centre for Multilevel Modelling, University of Bristol. URL: http://www.cmm.bris.ac.uk/interactive/uncertainty/.

Causal Inference Animated Plots. Link here.

R Psychologist stats visualizations. The Cohen’s D page is brilliant. Link here.

Regression and causality. Link here.

Marc Lajeunesse here

Sam Sims Quantitative Education Research