Atari-GPT

Welcome to Atari-GPT

Atari-GPT introduces a novel benchmark to evaluate the capabilities of multimodal large language models (LLMs) as low-level controllers in Atari games. This groundbreaking research explores their potential in dynamic, visually rich environments and creates a benchmark designed to test their capabilities.

Explore the Highlights

Investigate how models like GPT-4V, Gemini and Claude perform in Atari games.
Learn about challenges in spatial reasoning and visual understanding.
Discover the potential applications of LLMs beyond traditional tasks.

Want to read more? You can read the full paper here.

Watch GPT-4o Play Atari!

Want to see more? Watch all the LLMs play Atari here!

GPT-4o Gameplay

How to Cite

If you find our work useful, please use the following citation:

        @misc{waytowich2024atarigptinvestigatingcapabilitiesmultimodal,
            title={Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games}, 
            author={Nicholas R. Waytowich and Devin White and MD Sunbeam and Vinicius G. Goecks},
            year={2024},
            eprint={2408.15950},
            archivePrefix={arXiv},
            primaryClass={cs.AI},
            url={https://arxiv.org/abs/2408.15950}
        }

You can also find our paper on arXiv.