ARTICLE Kuzborskij_MLJ_2016/IDIAP Fast Rates by Transferring from Auxiliary Hypotheses Kuzborskij, Ilja Orabona, Francesco Machine Learning 2016 In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks. We focus on a broad class of ERM-based linear algorithms that can be instantiated with any non-negative smooth loss function and any strongly convex regularizer. We establish generalization and excess risk bounds, showing that, if the algorithm is fed with a good combination of source hypotheses, generalization happens at the fast rate O(1/m) instead of the usual O(1/sqrt(m)). On the other hand, if the source hypotheses combination is a misfit for the target task, we recover the usual learning rate. As a byproduct of our study, we also prove a new bound on the Rademacher complexity of the smooth loss class under weaker assumptions compared to previous works.

<subfield code="a">ARTICLE</subfield>

</datafield>

<subfield code="a">Kuzborskij_MLJ_2016/IDIAP</subfield>

</datafield>

<subfield code="a">Fast Rates by Transferring from Auxiliary Hypotheses</subfield>

</datafield>

<subfield code="a">Kuzborskij, Ilja</subfield>

</datafield>

<subfield code="a">Orabona, Francesco</subfield>

</datafield>

<subfield code="p">Machine Learning</subfield>

</datafield>

</datafield>

<subfield code="a">In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks. We focus on a broad class of ERM-based linear algorithms that can be instantiated with any non-negative smooth loss function and any strongly convex regularizer. We establish generalization and excess risk bounds, showing that, if the algorithm is fed with a good combination of source hypotheses, generalization happens at the fast rate O(1/m) instead of the usual O(1/sqrt(m)). On the other hand, if the source hypotheses combination is a misfit for the target task, we recover the usual learning rate. As a byproduct of our study, we also prove a new bound on the Rademacher complexity of the smooth loss class under weaker assumptions compared to previous works.</subfield>

</datafield>

</record>

</collection>