I believe an AI system can’t be called “intelligent” unless it can correctly handle the multitude of ways in which human users can interact with it. The road to this point is a long one, and entails understanding where and when an AI system will fail, so that we can identify potential system mistakes before they happen. This entails developing methods for uncovering bias in both models and datasets, developing techniques to generate challenging test cases, developing algorithms to uncover annotation mistakes, developing better evaluation metrics, etc.


PhD Computer Science - Vanderbilt University (ongoing)

MSE Electrical and Computer Engineering - University of Michigan

BSE Computer Engineering - University of Michigan


1. ShabbyPages: A Reproducible Document Denoising and Binarization Dataset

Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman
arXiv preprint, 2023

2. A Survey of Datasets for Intent Classification and Slot-Filling for Task-Oriented Dialog

Stefan Larson, Kevin Leach
arXiv preprint, 2022


1. Generating Hard-Negative Out-of-Scope Data with ChatGPT for Intent Classification

Zhijian Li, Stefan Larson, Kevin Leach

2. Augraphy: A Data Augmentation Library for Document Images

Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan Boarman
ICDAR 2023

3. On Evaluation of Document Classification using RVL-CDIP

Stefan Larson, Gordon Lim, Kevin Leach
EACL 2023

4. Evaluating Out-of-Distribution Performance on Document Image Classifiers

Stefan Larson, Gordon Lim, Yutong Ai, David Kuang, Kevin Leach
NeurIPS D&B 2022

5. Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

Stefan Larson, Kevin Leach

6. Exploring Out-of-Distribution Generalization in Text Classifiers Trained on Tobacco-3482 and RVL-CDIP

Stefan Larson, Navtej Singh, Saarthak Maheshwari, Shanti Stewart, Uma Krishnaswamy
Document Images and Language Workshop (DIL) at ICDAR 2021

7. LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction

Jacob Solawetz, Stefan Larson
EACL 2021

8. Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods

Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld

9. Iterative Feature Mining for Constraint-Based Data Collection to Increase Data Diversity and Model Robustness

Stefan Larson, Anthony Zheng, Anish Mahendran, Rishi Tekriwal, Adrian Cheung, Eric Guldan, Kevin Leach, Jonathan K. Kummerfeld
EMNLP 2020

10. Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Datasets

Stefan Larson, Eric Guldan, Kevin Leach
LREC 2020

11. An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Kevin Leach, Jonathan K. Kummerfeld, Michael A. Laurenzano, Lingjia Tang, Jason Mars
EMNLP 2019

12. Outlier Detection for Improved Data Quality and Diversity in Dialog Systems

Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars
NAACL 2019


OOD data for RVL-CDIP

This colleciton is a companion dataset for RVL-CDIP, a popular document image classification benchmark. RVL-CDIP-N includes in-domain, out-of-distribution data. RVL-CDIP-O includes out-of-domain, out-of-distribution data. Both -O and -N datasets consist of documents found on DocumentCloud and websearch (e.g., Google and Bing).

OOS Intent Classification Dataset

This dataset targets the task of intent classification. It contains 150 “in-scope” system-supported intents across 10 domain areas, and notably includes a substantial number of “out-of-scope” samples to test out-of-distribution detection performance.


1. Systems and Methods Implementing Data Query Language and Utterance Corpus Implements for Handling Slot-Filling and Dialogue Intent Classification Data in a Machine Learning Task-Oriented Dialogue System

US Patent No. 11,183,175

2. Systems and Methods for Mixed Setting Training for Slot Filling Machine Learning Tasks in a Machine Learning Task-Oriented Dialogue System

US Patent No. 11,043,208; 2021

3. Systems and Methods for Automatically Detecting and Repairing Slot Errors in Machine Learning Training Data for a Machine Learning-Based Dialogue System

US Patent No. 10,929,761; 2021

4. Systems and Methods for Constructing an Artificially Diverse Corpus of Training Data Samples for Training a Contextually-Biased Model for a Machine Learning-Based Dialogue System

US Patent No. 10,796,104; 2020

5. Systems and Methods for Automatically Configuring Training Data for Training Machine Learning Models of a Machine Learning-Based Dialogue System Including Seeding Training Samples or Curating a Corpus of Training Data Based on Instances of Training Data Identified as Anomalous

US Patent No. 10,679,150; 2020