Cognitive science for AI diagnosis

Cognitive science has spent a century developing paradigms that turn opaque minds into measurable behavior: implicit-association tests, serial reproduction, rational analysis, psychophysics, theory-of-mind probes, and Marr’s levels of analysis. We adapt those instruments and run them on AI systems, producing diagnostics that are interpretable, comparable to human baselines, and sensitive to subtle failures.
Representative work. Explicitly unbiased LLMs still form biased associations (Bai et al., PNAS 2025) imports the IAT into LLMs. LLMs surpass human experts in predicting neuroscience results (Luo et al., Nature Human Behaviour 2024) uses rational-analysis style benchmarking. Failing to falsify (2026) tests confirmation bias in language-model rule discovery. What is a Number, That a Large Language Model May Know It? (TMLR 2025) probes numeric representation.