Interests

I believe an AI system can’t be called “intelligent” unless it can correctly handle the multitude of ways in which human users can interact with it. The road to this point is a long one, and entails understanding where and when an AI system will fail, so that we can identify potential system mistakes before they happen. This entails developing methods for uncovering bias in both models and datasets, developing techniques to generate challenging test cases, developing algorithms to uncover annotation mistakes, developing better evaluation metrics, etc.

Publications

1. Towards Fair Pay and Equal Work: Imposing View Time Limits in Crowdsourced Image Classification

Gordon Lim, Stefan Larson, Yu Huang, Kevin Leach
FLAIRS 2026

2. Spurious Cues in RVL-CDIP and Tobacco3482 Document Classification: The Case of ID Codes

Stefan Larson, Sharad Duwal, Brian Vilnrotter, Gayatri Chakkithara, Vedant Pedwal, Kevin Leach
ACM Symposium on Document Engineering (DocEng) 2025

3. Document Type Classification using File Names

Zhijian Li, Stefan Larson, Kevin Leach
ACM Symposium on Document Engineering (DocEng) 2025

4. Label Errors in the Tobacco3482 Dataset

Gordon Lim, Stefan Larson, Kevin Leach
WACV VisionDocs Workshop 2025

5. Robust Testing for Deep Learning using Human Label Noise

Gordon Lim, Stefan Larson, Kevin Leach
DeepTest 2025

6. De-Identification of Sensitive Personal Data in Datasets Derived from IIT-CDIP

Stefan Larson, Nicole Cornehl Lima, Santiago Pedroza Diaz, Amogh Manoj Joshi, Siddharth Betala, Jamiu Tunde Suleiman, Yash Mathur, Kaushal Kumar Prajapati, Ramla Alakraa, Junjie Shen, Temi Okotore, Kevin Leach
EMNLP 2024

7. Generating Hard-Negative Out-of-Scope Data with ChatGPT for Intent Classification

Zhijian Li, Stefan Larson, Kevin Leach
LREC-COLING 2024

8. Augraphy: A Data Augmentation Library for Document Images

Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman
ICDAR 2023

9. On Evaluation of Document Classification using RVL-CDIP

Stefan Larson, Gordon Lim, Kevin Leach
EACL 2023

10. Evaluating Out-of-Distribution Performance on Document Image Classifiers

Stefan Larson, Gordon Lim, Yutong Ai, David Kuang, Kevin Leach
NeurIPS D&B 2022

11. Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

Stefan Larson, Kevin Leach
SIGDIAL 2022

12. Exploring Out-of-Distribution Generalization in Text Classifiers Trained on Tobacco-3482 and RVL-CDIP

Stefan Larson, Navtej Singh, Saarthak Maheshwari, Shanti Stewart, Uma Krishnaswamy
Document Images and Language Workshop (DIL) at ICDAR 2021

13. LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction

Jacob Solawetz, Stefan Larson
EACL 2021

14. Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods

Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld
COLING 2020

15. Iterative Feature Mining for Constraint-Based Data Collection to Increase Data Diversity and Model Robustness

Stefan Larson, Anthony Zheng, Anish Mahendran, Rishi Tekriwal, Adrian Cheung, Eric Guldan, Kevin Leach, Jonathan K. Kummerfeld
EMNLP 2020

16. Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Datasets

Stefan Larson, Eric Guldan, Kevin Leach
LREC 2020

17. An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Kevin Leach, Jonathan K. Kummerfeld, Michael A. Laurenzano, Lingjia Tang, Jason Mars
EMNLP 2019

18. Outlier Detection for Improved Data Quality and Diversity in Dialog Systems

Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars
NAACL 2019

Preprints

1. ShabbyPages: A Reproducible Document Denoising and Binarization Dataset

Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman
arXiv preprint, 2023

2. A Survey of Datasets for Intent Classification and Slot-Filling for Task-Oriented Dialog

Stefan Larson, Kevin Leach
arXiv preprint, 2022

Datasets

OOD data for RVL-CDIP

This colleciton is a companion dataset for RVL-CDIP, a popular document image classification benchmark. RVL-CDIP-N includes in-domain, out-of-distribution data. RVL-CDIP-O includes out-of-domain, out-of-distribution data. Both -O and -N datasets consist of documents found on DocumentCloud and websearch (e.g., Google and Bing).

OOS Intent Classification Dataset

This dataset targets the task of intent classification. It contains 150 “in-scope” system-supported intents across 10 domain areas, and notably includes a substantial number of “out-of-scope” samples to test out-of-distribution detection performance.