paper

The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin

Rong Jiang, Cong Ma

The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin

Name: The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin
Author: Rong Jiang, Cong Ma

Rong Jiang, Cong Ma

paper2025-11-05English

Start Reading

Description

We study batched nonparametric contextual bandits under a margin condition when the margin parameter $α$ is unknown. To capture the statistical cost of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing $α$. We show that the optimal regret inflation grows polynomially with the horizon $T$, with exponent given by the value of a convex optimization problem that depends on the dimension, smoothness, and number of batches $M$. Moreover, the minimizer of this optimization problem directly prescribes the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to polylogarithmic factors. These results reveal a new adaptivity barrier: under batching, adaptation to an unknown margin parameter inevitably incurs a polynomial penalty, sharply characterized by a variational problem. Remarkably, this barrier vanishes once the number of batches exceeds order $\log \log T$; with only a doubly logarithmic number of updates, one can recover the oracle regret rate up to polylogarithmic factors.