DS1 spectrogram: Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

2606.05967

Authors

Éloïse Berthier,Ziad Kobeissi

Abstract

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.

We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature.

We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.