Building a Chess Analysis App (Part 1) - Exploring the Chess.com API with Python

January 20, 2026 3 minute read

Introduction

As part of my Streamlit Chess Analysis project, one of the first challenges was fetching and processing game data from Chess.com. While Chess.com provides Published Data API, working with it directly presents several challenges: navigating monthly archive pagination, handling edge cases in game timing data, and structuring JSON responses for analysis.

This post explores how I built the ChessDataFetcher class to handle these challenges. I’ll use real data from my account as a working example, to demonstrate the system in action.

The Chess.com API: Structure and Challenges

Chess.com organises game data into monthly archives. The ChessDataFetcher class encapsulates this pipeline:

Chess.com API -> Fetch Games -> Validate Data -> Process -> DataFrame

Architecture: From API to Analysis

The fetcher provides three key capabilities:

1. Archive Discovery

The first step is discovering what data exists:

from src.data_fetcher import ChessDataFetcher

fetcher = ChessDataFetcher()
archives = fetcher.get_available_archives("RhysLWells")

This queries https://api.chess.com/pub/player/{username}/games/archives and returns monthly archive URLs:

Month	Archive URL
2025-09	https://api.chess.com/pub/player/rhyslwells/games/2025/09
2025-10	https://api.chess.com/pub/player/rhyslwells/games/2025/10
2025-11	https://api.chess.com/pub/player/rhyslwells/games/2025/11
2025-12	https://api.chess.com/pub/player/rhyslwells/games/2025/12
2026-01	https://api.chess.com/pub/player/rhyslwells/games/2026/01

2. Bulk Fetching

Rather than manually requesting each archive, fetch_all_games() automates the process:

all_games = fetcher.fetch_all_games("RhysLWells", limit_months=12)

This method:

Calls get_available_archives() and iterates through each monthly archive
Implements rate limiting (0.5s delay between requests)
Returns a complete list of game dictionaries

Running this on my account: 264 games fetched across 123 days (2025-09-15 to 2026-01-16). The API provides metadata for each game stored in JSON, including time control classification (blitz, bullet, rapid), player ratings, results, and PGN data.

3. Data Validation

Raw API data isn’t always clean. During development, I discovered games with negative durations (derived from PGNs). This would corrupt any time-based analysis. I found three main edge cases causing this:

Midnight Crossings:Game starts at 23:50:00, ends at 00:10:00
Timezone Issues: Durations > 24 hours
Missing Data: No start time in PGN headers

The validate_game_durations() method addresses these issues.

Processing: From JSON to DataFrame

The final step is transforming raw data into a structured format suitable for analysis. The process_and_save() method transforms (validated) raw API responses into structured pandas DataFrames:

df = fetcher.process_and_save("RhysLWells", all_games, mode='json')

The process_and_save() method performs the following steps:

Processing steps:

Parse each game’s metadata (ratings, results, openings, ECO codes)
Calculate results from user’s perspective (win/loss/draw)
Validate game durations using the logic above
Remove duplicates (based on timestamp + opponent)

And builds a DataFrame with the following columns:

Extracted columns:

Category	Fields
Temporal	date, timestamp
Players	user_color, opponent, user_rating, opponent_rating
Results	result, result_label
Opening	opening, eco, eco_url
Metadata	time_control, game_url, game_duration_seconds, pgn

Performance: Processing 264 games takes approximately less that 15 seconds, primarily limited by API rate constraints.

Initial Insights

With clean, validated data, we can extract interesting patterns:

Opening Repertoire

From the opening data, we see the most frequently played openings:

Opening	Games	%
Nimzowitsch Defense 2.D4 E6 3.Nf3 D5 4.E5	15	5.7%
French Defense Knight Variation	10	3.8%
Philidor Defense 3.Bc4	8	3.0%
French Defense Advance Variation	8	3.0%
Italian Game	8	3.0%

For myself, French Defense appears frequently across multiple variations, so it’s a key part of my repertoire.

Performance by Color

From the results data, we can summarise performance by color:

Color	Win Rate	W	L	D	Total
White	45.1%	56	69	7	132
Black	49.6%	63	64	5	132

Design Principles

Two key decisions I made to ensure reliability where:

API Etiquette: We identify the application via User-Agent headers

HEADERS = {"User-Agent": "Chess Analysis Dashboard (Python/requests)"}

This helps Chess.com track API usage and contact developers if issues arise.

Rate Limiting: A 0.5-second delay between requests prevents overwhelming the API

time.sleep(0.5)  # Respect API constraints

This avoids potential IP bans while maintaining reasonable fetch times (264 games in ~15 seconds).

Conclusion

The ChessDataFetcher class solves the complexity of working with the Chess.com API by handling archive discovery, rate limiting, data validation, and structuring automatically. With this data in hand I can proceed to further analysis and prediction in the chess analysis dashboard.

Share on

Twitter Facebook LinkedIn