Three Smart Ways to Fix Missing Data (and Why One Wins)
Ever tried running an analysis only to find half your spreadsheet is blank? You’re not alone—missing data haunts nearly every researcher, data scientist, and machine learning model. But a new study from Northwestern University suggests a surprisingly powerful fix: borrow a trick from psychology called Item Response Theory (IRT).
Here are the top 3 takeaways from the research—each one worth a post of its own.
1. IRT Thinks Like a Psychologist — and That’s Its Secret
Instead of guessing what’s missing by copying nearby values (like k-nearest neighbors) or running regression chains (like MICE), IRT looks for patterns of behavior across all your variables—like a teacher spotting which questions a student is likely to miss. It then uses probability curves to fill in the blanks without “peeking” at the outcome variable (avoiding the circular bias many machine learning imputations make).
For Categorical Data, It Outperforms the Big Names
Across three test datasets—diamonds, rental housing, and heart disease—IRT beat or matched the leading methods (MICE, k-NN, and DataWig) in most conditions.
- Ordinal data (like rankings): IRT and DataWig tied for top performance.
- Nominal data (like city names): IRT led the pack.
- Binary data (like yes/no health indicators): All methods performed similarly, but IRT stayed consistent even when half the data went missing.
It’s Not Just Accurate—It’s Honest
Unlike deep learning models that rely on massive training data and sometimes use the outcome itself to predict what’s missing (a big no-no), IRT stays theory-driven. By grounding imputation in latent traits (think of a hidden “profile” behind the data), it keeps predictions transparent and replicable—a huge win for scientific reproducibility.
The Big Picture:
As machine learning dives deeper into healthcare, social science, and policy, the way we handle missing data can make or break the truth we uncover. This study shows that old-school psychometrics can outthink even modern AI when it comes to filling in the blanks responsibly.


