Welcome to Atari-GPT
Atari-GPT introduces a novel benchmark to evaluate the capabilities of multimodal large language models (LLMs) as low-level controllers in Atari games. This groundbreaking research explores their potential in dynamic, visually rich environments and creates a benchmark designed to test their capabilities.
Explore the Highlights
- Investigate how models like GPT-4V, Gemini and Claude perform in Atari games.
- Learn about challenges in spatial reasoning and visual understanding.
- Discover the potential applications of LLMs beyond traditional tasks.
Want to read more? You can read the full paper here.
Watch GPT-4o Play Atari!
Want to see more? Watch all the LLMs play Atari here!
GPT-4o Gameplay
How to Cite
If you find our work useful, please use the following citation:
@misc{waytowich2024atarigptinvestigatingcapabilitiesmultimodal, title={Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games}, author={Nicholas R. Waytowich and Devin White and MD Sunbeam and Vinicius G. Goecks}, year={2024}, eprint={2408.15950}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2408.15950} }
You can also find our paper on arXiv.