Research
My objective is to advance the scalability and sample-efficiency of RL. Specific topics I am interested in include representation learning, policy priors, multi-task learning, self-supervised RL and learned reward functions. I am particularly motivated by approaches that are applicable across environments and tasks, including but not limited to robotics.
As policy state inputs as well as pre-training data, I have worked extensively with image and video modalities; I am also increasingly interested in incorporating language modelling and building language-conditioned interactive agents.
|
|
Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala,
Gabriel Dulac-Arnold,
Julien Mairal,
Jean Ponce,
Cordelia Schmid
ICRA, 2023
project page
|
arXiv
We learn dense reward functions for robotic manipulation by learning about distances in state space from videos of humans only, using self-supervised contrastive and regression objectives.
|
|
Residual Reinforcement Learning from Demonstrations
Minttu Alakuijala,
Gabriel Dulac-Arnold,
Julien Mairal,
Jean Ponce,
Cordelia Schmid
RSS 2020 Workshop on Advances & Challenges in Imitation Learning for Robotics
project page
|
arXiv
Starting from a small number of task demonstrations on a robot arm, we learn an initial base policy and a task-relevant, low-dimensional representation space through behavioral cloning, which is then autonomously improved through residual reinforcement learning using images, proprioceptive inputs and sparse rewards only.
|
|
Discovering Actions by Jointly Clustering Video and Narration Streams Across Tasks
Minttu Alakuijala,
Julien Mairal,
Jean Ponce,
Cordelia Schmid
CVPR 2020 Workshop on Learning from Instructional Videos
video
|
poster
Using only weak supervision from the timing of narration and visual information, we segment narrated tutorial videos into k action classes or background. We use a discriminative clustering objective together with an inconsistency penalty which encourages the timing and order of actions in the visual stream to match that of the narration stream in each video.
|
|