EE Seminar: Regret Guarantees for Linear Contextual Stochastic Shortest Path
הרישום לסמינר יבוצע באמצעות סריקת הברקוד למודל (יש להיכנס לפני כן למודל, לא באמצעות האפליקציה)- הרישום מסתיים ב- 15:10
Registration to the seminar will be done by scanning the barcode for the Moodle (Please enter ahead to the Moodle, NOT by application)- Registration ends at 15:10
Electrical Engineering Systems Seminar
Speaker: Dor Polikar
M.Sc. student under the supervision of Dr. Alon Cohen
Sunday, 15th June 2025, at 15:00
Room 011, Kitot Building, Faculty of Engineering
Regret Guarantees for Linear Contextual Stochastic Shortest Path
Abstract
We define the problem of linear Contextual Stochastic Shortest Path (CSSP), where at the beginning of each episode, the learner observes an adversarially-chosen context that determines the MDP through a fixed but unknown linear function. The learner's objective is to reach a designated goal state with minimal expected cumulative loss, despite having no prior knowledge of the transition dynamics, loss functions, or the mapping from context to MDP.
In this work, we introduce LR-CSSP, an algorithm that achieves a regret bound of \tilde O(\sqrt{K d^4 |S| |A| B_*^3 \log(1/delta)/l_min}), where K is the number of episodes, d is the context dimension, l_min is the smallest instantaneous loss, S and A are the sets of states and actions, respectively, and B_* bounds the optimal cumulative loss.
Unlike in contextual finite-horizon MDPs, where limited knowledge primarily leads to higher losses and regret, in the CSSP setting, insufficient knowledge can also prolong episodes and may even lead to non-terminating episodes.
Our analysis reveals that LR-CSSP effectively handles continuous context spaces, while ensuring all episodes terminate within a reasonable number of time steps.