Minttu Alakuijala

I am a Postdoctoral Researcher at Aalto University working on language-conditioned reinforcement learning (RL). I am working with Samuel Kaski and Pekka Marttinen, and I am a part of the Probabilistic Machine Learning and Machine Learning for Health groups.

Previously, I obtained my PhD from Google Research (through the CIFRE scheme), Inria and Ecole Normale Superieure in Paris. My PhD research focused on RL and learning from demonstration for robotic manipulation tasks. I defended my thesis on Autonomous and Weakly-Supervised Learning for Robotic Manipulation in December 2022 and I was advised by Cordelia Schmid, Jean Ponce and Julien Mairal. At Inria, I was a member of the Willow and Thoth teams.

Email  |  CV  |  Google Scholar  |   |   | 

profile photo

My objective is to advance the scalability and sample-efficiency of RL. Specific topics I am interested in include representation learning, policy priors, multi-task learning, self-supervised RL and learned reward functions. I am particularly motivated by approaches that are applicable across environments and tasks, including but not limited to robotics.

As policy state inputs as well as pre-training data, I have worked extensively with image and video modalities; I am also increasingly interested in incorporating language modelling and building language-conditioned interactive agents.

Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid
ICRA, 2023
project page | arXiv

We learn dense reward functions for robotic manipulation by learning about distances in state space from videos of humans only, using self-supervised contrastive and regression objectives.

Residual Reinforcement Learning from Demonstrations
Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid
RSS 2020 Workshop on Advances & Challenges in Imitation Learning for Robotics
project page | arXiv

Starting from a small number of task demonstrations on a robot arm, we learn an initial base policy and a task-relevant, low-dimensional representation space through behavioral cloning, which is then autonomously improved through residual reinforcement learning using images, proprioceptive inputs and sparse rewards only.

Discovering Actions by Jointly Clustering Video and Narration Streams Across Tasks
Minttu Alakuijala, Julien Mairal, Jean Ponce, Cordelia Schmid
CVPR 2020 Workshop on Learning from Instructional Videos
video | poster

Using only weak supervision from the timing of narration and visual information, we segment narrated tutorial videos into k action classes or background. We use a discriminative clustering objective together with an inconsistency penalty which encourages the timing and order of actions in the visual stream to match that of the narration stream in each video.

Design and source code from Jon Barron's website.