Matthew Singer, Srijan Sengupta, Karl Pazdernik
Named Entity Recognition (NER) serves as a foundational component in many natural language processing (NLP) pipelines. However, current NER models typically output a single predicted label sequence without any accompanying measure of uncertainty, leaving downstream applications vulnerable to cascading errors. In this paper, we introduce a general framework for adapting sequence-labeling-based NER models to produce uncertainty-aware prediction sets. These prediction sets are collections of full-sentence labelings that are guaranteed to contain the correct labeling with a user-specified confidence level. This approach serves a role analogous to confidence intervals in classical statistics by providing formal guarantees about the reliability of model predictions. Our method builds on conformal prediction, which offers finite-sample coverage guarantees under minimal assumptions. We design efficient nonconformity scoring functions to construct efficient, well-calibrated prediction sets that support both unconditional and class-conditional coverage. This framework accounts for heterogeneity across sentence length, language, entity type, and number of entities within a sentence. Empirical experiments on four NER models across three benchmark datasets demonstrate the broad applicability, validity, and efficiency of the proposed methods.
Quantitative mode stability for the wave equation on the Kerr-Newman spacetime
Risk-Aware Objective-Based Forecasting in Inertia Management
Chainalysis: Geography of Cryptocurrency 2023
Periodicity in Cryptocurrency Volatility and Liquidity
Impact of Geometric Uncertainty on the Computation of Abdominal Aortic Aneurysm Wall Strain
Simulation-based Bayesian inference with ameliorative learned summary statistics -- Part I