Research
There are considerable mutual benefits to be gained from integrating ideas from reinforcement learning (RL) and language models. LLMs possess a real-world knowledge base that is vast but mostly static, and not specific enough for many environments. Moreover, decoding is typically greedy, without planning or lookahead. RL, on the other hand, formalizes learning from interaction, trading off exploration and short-term reward maximization, and planning with the future in mind.
My objective is twofold: I aim to equip RL agents with world knowledge and transferable skills that models like LLMs and VLMs have learned from internet-scale text and vision corpora. Furthermore, I am looking to teach LLMs to think and plan more explicitly – using utilities such as task decomposition and code – and to learn from embodied interaction like RL agents.
|
|
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala,
Reginald McLean,
Isaac Woungang,
Nariman Farsad,
Samuel Kaski,
Pekka Marttinen,
Kai Yuan
CoRL 2024 Workshop on Language and Robot Learning
arXiv
We train language-conditioned robotic reward functions from actor-agnostic data, and outperform 5 existing methods with a novel combination of contrastive and sequential ranking objectives, together ensuring both smooth and accurate rewards.
|
|
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search
Nicola Dainese,
Matteo Merler,
Minttu Alakuijala,
Pekka Marttinen
NeurIPS 2024
arXiv
We propose to model RL environments with code written by an LLM, propose a method to improve code generation for this task and show how to plan with code world models.
|
|
Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala,
Gabriel Dulac-Arnold,
Julien Mairal,
Jean Ponce,
Cordelia Schmid
ICRA, 2023
project page
|
arXiv
We learn dense reward functions for robotic manipulation by learning about distances in state space from videos of humans only, using self-supervised contrastive and regression objectives.
|
|
Residual Reinforcement Learning from Demonstrations
Minttu Alakuijala,
Gabriel Dulac-Arnold,
Julien Mairal,
Jean Ponce,
Cordelia Schmid
RSS 2020 Workshop on Advances & Challenges in Imitation Learning for Robotics
project page
|
arXiv
Starting from a small number of task demonstrations on a robot arm, we learn an initial base policy and a task-relevant, low-dimensional representation space through behavioral cloning, which is then autonomously improved through residual reinforcement learning using images, proprioceptive inputs and sparse rewards only.
|
|
Discovering Actions by Jointly Clustering Video and Narration Streams Across Tasks
Minttu Alakuijala,
Julien Mairal,
Jean Ponce,
Cordelia Schmid
CVPR 2020 Workshop on Learning from Instructional Videos
video
|
poster
Using only weak supervision from the timing of narration and visual information, we segment narrated tutorial videos into k action classes or background. We use a discriminative clustering objective together with an inconsistency penalty which encourages the timing and order of actions in the visual stream to match that of the narration stream in each video.
|
|