From a few train grid pairs, infer a general transformation rule…
Lessons from competing in ARC AGI 2
TNG Big Techday 26
2026-05-22
Debsankha Manik

Dynamical systems, graph theory; data science × discrete optimisation.
Loves teaching.
Bernhard Altaner

Thermodynamics and information processing in complex systems.
Has local GlaDOS in his smart home.
Supported by Nicolas Berg – co-host of the AGI Munich meetup.
Thank you to Somayeh Vojdani & Tariq Baig-Meininghaus for initial discussions and continued encouragement.
Thanks to lambda.ai for providing free credits that helped support this research.
input pixel grids into corresponding output gridsFrom a few train grid pairs, infer a general transformation rule…
…and apply it to a test grid!
Raymond Catell, 1948, defined two aspects of general intelligence
General intelligence combines knowledge with adaptation!
⏺ ARC-AGI 1: Not saturated; multiple paradigms used; reasoning models give good results, but expensive
▲ ARC-AGI 2: barely any scores beyond noise (~3%) level



Given the following training input-output pairs, find the correct output for the given test input.
011 000
000 -> 000
000 011
222 000
020 -> 222
000 020
500
500 -> ?
000
Let me think step by step. The first example seems to be a reflection. The second example is not a reflection. It seems that clusters of numbers are actually moving downwards until they hit the last row. By that logic the answer is:
000
500
500
Hypothesis
Even though o3 cannot solve puzzles with ARC AGI 2 level difficulty, it might be able to create difficult puzzles.

Lesson 1
Non-reasoning models are necessary for creativity, diversity and naturalness.
Lesson 2
Tool calling (e.g. code execution) makes reasoning models more reliable.

Solve this puzzle. The training examples and test inputs are already loaded in the interpreter.
Let me think. The transformation seems to be a reflection. Let me verify with code.
def transform(grid):
...
assert transform(train0_inp) == train0_out
AssertionError
There is a mismatch. Perhaps the transformation is gravity related? Let me test.
def transform(grid):
...
assert transform(train0_inp) == train0_out
No output
Here is the transform function that solves the puzzle:
def transform(grid):
...


The power of text
Text is a surprisingly rich medium that can be used to teach about other sensory modes.
Solve this puzzle. The training examples and test inputs are already loaded in the interpreter.
Let me think. The transformation seems to be a reflection. Let me verify with code.
def transform(grid):
...
assert transform(train0_inp) == train0_out
AssertionError
There is a mismatch. Perhaps the transformation is gravity related? Let me test.
def transform(grid):
...
assert transform(train0_inp) == train0_out
No output
Here is the transform function that solves the puzzle:
def transform(grid):
...


ChatML
<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>
Harmony
<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
city: string,
unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
ChatML
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
Harmony
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
ChatML
<|im_start|>assistant
</think>
User wants Tokyo weather. Need to call get_weather function.
</think>
Harmony
<|start|>assistant<|channel|>analysis<|message|>User wants Tokyo weather. Need to call get_weather function.<|end|>
ChatML
I'll check the current weather in Tokyo for you.
Harmony
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
ChatML
<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
Harmony
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
ChatML
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response>
Harmony
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
ChatML
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>
Harmony
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>
gpt-oss-120b with the right harness.
python) sandboxUsing GPT 5.5 XHigh
........WW
........WW
.......GWW
.......GWW
.......GWW
.....G.GWW
.....G.GWW
....RG.GWW
....RG.GWW
.RRRRGGGWWWhat do you think? Do LLMs have fluid intelligence?
Thank you for your attention!
Any Questions?