Categorical_Metalearning.pdf -

Please enable JavaScript to view the full PDF

Categorical Metalearning Reformulate learner We begin by slightly reformulating the notion of a “learner” from [Fong et al., 2017]. In this paper, a learner is a tuple of functions (I, U, R) where: I :P ×A→B U :P ×A×B →P R:P ×A×B →A In practice, the update step for many learners involves two steps: 1. The computation of the gradient with respect to the current value of P 2. The computation of the updated value of P based on this gradient and the current value of P We can encode this intuition by slightly modifying the definition of a learner to be the tuple of functions (I, G, U, R) where: I :P ×A→B G:P ×A×B →P U :P ×P →P R:P ×A×B →A In keeping with this intuition, we will refer to G as the “gradient function” and U as the “update function”. Of course, G need not simply be a gradient computation. Metalearner We can use this additional function to get more information from learners and combine them in different ways. For example, we can now represent the notion of “learning to learn” or “metalearning”, where we learn the optimization function itself. In order to do this, we can define a notion of learner composition where the composition of learner A with learner B is a learner with an inference function equivalent to that of A and gradient/update functions defined based on B’s inference and update functions. To be specific, given the following learners: I :P ×A→B I0 : P 0 × P → P G:P ×A×B →P G0 : P 0 × P × P → P 0 U :P ×P →P U0 : P0 × P0 → P0 R:P ×A×B →A R : P0 × P × P → P We define their composition to be the following learner: I∗ : P 0 × P × A → B I ∗ (p0 , p, a) = I(p, a) G∗ : P 0 × P × A × B → P 0 × P G∗ (p0 , p, a, b) = (G0 (p0 , p, G(p, a, b)), I 0 (p0 , p)) U∗ : P0 × P × P0 × P → P0 × P U ∗ (p0 , p, g 0 , g) = (U 0 (p0 , g 0 ), U (p, g)) R∗ : P 0 × P × A × B → A R∗ (p0 , p, a, b) = R(p, a, b) References [Fong et al., 2017] Fong, B., Spivak, D. I., and Tuyéras, R. (2017). Backprop as functor: A compositional perspective on supervised learning. arXiv preprint arXiv:1711.10455.