Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration

Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration

Β·

3 min read

NBA Statistics PipelineπŸ€

πŸš€ Introduction

This project is an NBA Statistics Pipeline that fetches NBA team statistics from the SportsData API and stores them in AWS DynamoDB. The project also implements structured logging using AWS CloudWatch, enabling efficient monitoring and debugging.

This project was built to demonstrate my proficiency in AWS services, Python, API integrations, and Infrastructure as Code (IaC).

πŸ›  Tech Stack

  • Python (Data processing, API requests, logging)

  • AWS DynamoDB (NoSQL database for storing NBA stats)

  • AWS CloudWatch (Logging & monitoring)

  • Boto3 (AWS SDK for Python)

  • Docker (Containerization)

  • EC2 Instance (Compute environment for development)

🎯 Features

  • Fetches real-time NBA statistics from the SportsData API

  • Stores team stats in AWS DynamoDB

  • Structured logging with AWS CloudWatch

  • Error handling and logging with JSON structured logs

  • Uses environment variables for sensitive credentials

  • Implements batch writing for efficiency

πŸ“Έ Snapshots

  • API Response Sample

    Screenshot 2025-01-31 114651

  • DynamoDB Table Data

    Screenshot 2025-01-31 115147

  • CloudWatch Logs (Structured logs for monitoring)

    Screenshot 2025-01-31 121713

  • Terminal Output (Successful execution of the pipeline)

    Screenshot 2025-01-31 122345

πŸ— Project Architecture

└── nba-stats-pipeline
    β”œβ”€β”€ src
    β”‚  β”œβ”€β”€ __init__.py
    β”‚  β”œβ”€β”€ nba_stats.py
    β”‚  β”œβ”€β”€ lambdafunction.py
    β”œβ”€β”€ requirements.txt      # Dependencies
    β”‚   β”œβ”€β”€ .env              # Environment variables
    β”‚   β”œβ”€β”€ Dockerfile        # Containerization setup (if applicable)
    β”œβ”€β”€ README.md             # Project documentation

πŸš€ Step-by-Step Guide to Building the NBA Stats Pipeline

4️⃣ Launch EC2 Instance and SSH Into It

ssh -i "nba-stats-pipeline.pem" ubuntu@ec2-18-212-173-76.compute-1.amazonaws.com

1️⃣ Clone the Repository

git clone https://github.com/onlyfave/nba-stats-pipeline.git
cd nba-stats-pipeline

1️⃣ Install Python3

Python3 is required to run the project.

sudo apt update
sudo apt install python3

1️⃣ Install Pip

On most systems, pip comes pre-installed with Python3. To verify, run:

pip3 --version

If you don't have pip installed, use the following command:

sudo apt install python3-pip

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Set Up Environment Variables

Create a .env file with the following content:

SPORTDATA_API_KEY=your_api_key
DYNAMODB_TABLE_NAME=nba-player-stats
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1

4️⃣ CD Into the Folder Containing the Pipeline

cd src

4️⃣ Run the Pipeline

python3 nba_stats.py

πŸ“Š Sample Data Format

[
  {
    "TeamID": 1,
    "TeamName": "Los Angeles Lakers",
    "Wins": 25,
    "Losses": 15,
    "PointsPerGameFor": 112.5,
    "PointsPerGameAgainst": 108.3
  }
]

πŸ— Deployment (Optional: Dockerized Version)

To run this project inside a Docker container:

docker build -t nba-stats-pipeline .
docker run --env-file .env nba-stats-pipeline

πŸ”₯ Key Takeaways

  • AWS Expertise: Used DynamoDB & CloudWatch for data storage & monitoring

  • DevOps Skills: Managed credentials, logging, and error handling efficiently

  • Cloud-Native Thinking: Designed a cloud-based ETL pipeline

πŸ“Œ Next Steps

  • Implement Lambda Functions for automated execution

  • Deploy using AWS ECS or Kubernetes

  • Integrate with Grafana for real-time data visualization

πŸ“’ Connect With Me

πŸš€ LinkedIn | 🐦 Twitter/X

Β