Reinforcement Learning Demo

While AGI focuses on reasoning, Reinforcement Learning (RL) is essential for fine-tuning behavioral policies through trial and error.

The Problem: Grid World

An agent must reach a goal G while avoiding an obstacle X.

S . .
. X .
. . G

The RL Loop in MeTTa

We represent the Q-values (state-action values) as Atoms in the AtomSpace that the system updates over time.

;; (QValue State Action Value)
(QValue (0 0) Right 0.1)
(QValue (0 0) Down 0.05)

;; Update Rule (Simplified)
(= (update-q $s $a $new_v)
   (do (remove-atom (QValue $s $a $_))
       (add-atom (QValue $s $a $new_v))))

Hybrid Operation

In our AGI experiments, we use Reasoning to Guide RL:

Instead of starting with random actions, the agent uses its Reasoning Engine to search for a likely path.
It then uses RL to optimize the "speed" and "smoothness" of that path.

graph LR
    Logic[Symbolic Plan] -->|Initial Guess| RL[RL Learner]
    RL -->|Performance Feedback| Logic
    RL -->|Fine-tuned Policy| Action[World Action]

Observation

By combining Logic and RL, the agent reaches the goal in 50% fewer steps than a pure RL agent that has to explore everything randomly.

Next: Memory Experiments

Reinforcement Learning Demo

Reinforcement Learning Demo

The Problem: Grid World

The RL Loop in MeTTa

Hybrid Operation

Observation

On this page