Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration
NBA Statistics Pipelineπ
π Introduction
This project is an NBA Statistics Pipeline that fetches NBA team statistics from the SportsData API and stores them in AWS DynamoDB. The project also implements structured logging using AWS CloudWatch, enabling efficient monitoring and debugging.
This project was built to demonstrate my proficiency in AWS services, Python, API integrations, and Infrastructure as Code (IaC).
π Tech Stack
Python (Data processing, API requests, logging)
AWS DynamoDB (NoSQL database for storing NBA stats)
AWS CloudWatch (Logging & monitoring)
Boto3 (AWS SDK for Python)
Docker (Containerization)
EC2 Instance (Compute environment for development)
π― Features
Fetches real-time NBA statistics from the SportsData API
Stores team stats in AWS DynamoDB
Structured logging with AWS CloudWatch
Error handling and logging with JSON structured logs
Uses environment variables for sensitive credentials
Implements batch writing for efficiency
πΈ Snapshots
API Response Sample
DynamoDB Table Data
CloudWatch Logs (Structured logs for monitoring)
Terminal Output (Successful execution of the pipeline)
π Project Architecture
βββ nba-stats-pipeline
βββ src
β βββ __init__.py
β βββ nba_stats.py
β βββ lambdafunction.py
βββ requirements.txt # Dependencies
β βββ .env # Environment variables
β βββ Dockerfile # Containerization setup (if applicable)
βββ README.md # Project documentation
π Step-by-Step Guide to Building the NBA Stats Pipeline
4οΈβ£ Launch EC2 Instance and SSH Into It
ssh -i "nba-stats-pipeline.pem" ubuntu@ec2-18-212-173-76.compute-1.amazonaws.com
1οΈβ£ Clone the Repository
git clone https://github.com/onlyfave/nba-stats-pipeline.git
cd nba-stats-pipeline
1οΈβ£ Install Python3
Python3 is required to run the project.
sudo apt update
sudo apt install python3
1οΈβ£ Install Pip
On most systems, pip comes pre-installed with Python3. To verify, run:
pip3 --version
If you don't have pip installed, use the following command:
sudo apt install python3-pip
2οΈβ£ Install Dependencies
pip install -r requirements.txt
3οΈβ£ Set Up Environment Variables
Create a .env
file with the following content:
SPORTDATA_API_KEY=your_api_key
DYNAMODB_TABLE_NAME=nba-player-stats
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
4οΈβ£ CD Into the Folder Containing the Pipeline
cd src
4οΈβ£ Run the Pipeline
python3 nba_stats.py
π Sample Data Format
[
{
"TeamID": 1,
"TeamName": "Los Angeles Lakers",
"Wins": 25,
"Losses": 15,
"PointsPerGameFor": 112.5,
"PointsPerGameAgainst": 108.3
}
]
π Deployment (Optional: Dockerized Version)
To run this project inside a Docker container:
docker build -t nba-stats-pipeline .
docker run --env-file .env nba-stats-pipeline
π₯ Key Takeaways
AWS Expertise: Used DynamoDB & CloudWatch for data storage & monitoring
DevOps Skills: Managed credentials, logging, and error handling efficiently
Cloud-Native Thinking: Designed a cloud-based ETL pipeline
π Next Steps
Implement Lambda Functions for automated execution
Deploy using AWS ECS or Kubernetes
Integrate with Grafana for real-time data visualization