סמינר של הפקולטה להנדסה ע"ש איבי ואלדר פליישמן

EE Seminar: Real price of bandit information in multiclass classification

05 במאי 2025, 13:00

חדר 011, בניין כיתות חשמל

EE Seminar: Real price of bandit information in multiclass classification

(The talk will be given in English)

Speaker: Dr. Alon Cohen

ECE, Tel Aviv University & Google

011 hall, Electrical Engineering-Kitot Building‏

Monday, May 5^th, 2025

13:00 - 14:00

Real price of bandit information in multiclass classification

Abstract

In bandit multiclass classification, examples arrive one at a time, the learner guesses their label and only observes whether the guess was correct. This limited feedback increases the sample complexity, a phenomenon termed "price of bandit information.” Classic works bound this increment by an additional factor of the number of labels, but whether this bound is tight remains an open question, with no nontrivial lower bound existing for this setting. In our work, we investigate the true price of bandit information.

Online Setting: Our first study focuses on the online setting, where the aim is to bound regret—the difference between the number of mistakes made by our algorithm and the best hypothesis in hindsight. Prior work shows that regret can be bounded by \sqrt{K T \log |H|}, where K is the number of labels, T is the time horizon and H is the finite hypothesis class. This is compared to \sqrt{T \log |H|} when labels are fully observed. We improve this bound to \min\{|H| + \sqrt{T}, \sqrt{K T \log |H|}\}, providing matching upper and lower bounds thus establishing tightness up to logarithmic factors. Our lower bounds indicate that regret scales with the number of labels K, confirming an unavoidable price of bandit information as an additional factor of K only in the non-asymptotic regime.

PAC Setting: In our second study, we address the PAC setting and propose an algorithm with a sample complexity of (poly(K) + 1/\epsilon^2)\log(|H|/\delta). We demonstrate the implementation of our algorithm in polynomial time, given an efficient empirical risk minimization algorithm over the hypothesis class. Surprisingly, we find that in the PAC setting, for sufficiently small \epsilon, there is no price for bandit information. Our result reveals a significant gap between the price of bandit feedback in terms of regret and sample complexity, challenging conventional expectations.

Short Bio

Alon Cohen is a senior lecturer at the School of EE at Tel-Aviv University as well as a research scientist in Google. He has received his PhD from IE&M at the Technion under the supervision of Prof. Tamir Hazan. His research interests revolve around reinforcement learning, online and statistical learning theory, and connections thereof.

הרישום לסמינר יבוצע באמצעות סריקת הברקוד למודל

Registration to the seminar will be done by scanning the barcode for the Moodle