This repository contains a full-cycle data engineering project that implements a modern ELT (Extract, Load, Transform) pipeline. The system automates the ingestion, transformation, and visualization of stock market data for NVIDIA (NVDA) and Apple (AAPL).
The goal of this project is to eliminate manual data collection and provide investors with automated, reliable financial insights. It processes 360 days of historical data to track trends, momentum, and volatility.
- Automated Data Ingestion: Orchestrated by Airflow to fetch data from the Yahoo Finance API.
- Cloud Data Warehousing: Scalable storage and processing using Snowflake.
- Data Modeling: Robust transformation layers built with dbt (Data Build Tool).
- Financial Analytics: Automatic calculation of 20/50/200-day Moving Averages and RSI (Relative Strength Index).
- Interactive BI Dashboards: Visual analysis via Apache Superset.
- Data Source:
yfinanceAPI - Data Warehouse: Snowflake
- Orchestration: Apache Airflow
- Transformation: dbt (Data Build Tool)
- Visualization: Apache Superset
- Language: Python, SQL
- Extract & Load: A Python-based Airflow DAG extracts raw stock data and loads it into the
RAWschema in Snowflake. - Transform (dbt):
- Staging: Cleaning and type-casting raw data.
- Intermediate: Calculating technical indicators (MA, RSI).
- Analytics: Creating final views optimized for BI consumption.
- Visualize: Apache Superset connects to the Snowflake analytics layer to generate real-time charts.
The pipeline generates several key financial visualizations:
- Price vs. Moving Averages: Identifies "Golden Cross" or "Death Cross" patterns using 20, 50, and 200-day windows.
- RSI Analysis: Monitors 14-day RSI levels to identify overbought (>70) or oversold (<30) conditions.
- Volatility Analysis: Box plots showcasing the distribution and max closing prices to assess market risk.
- Snowflake: Create a database and the necessary
RAWandANALYTICSschemas. - Airflow:
- Set up a Snowflake Connection (
snowflake_conn). - Place the DAG scripts in your
/dagsfolder.
- Set up a Snowflake Connection (
- dbt:
- Configure your
profiles.ymlto point to your Snowflake warehouse. - Run
dbt buildto execute models and tests.
- Configure your
- Superset:
- Connect the Snowflake SQLAlchemy URI.
- Import the dashboard JSON or create charts from the
TRANSFORM_STOCK_METRICStable.
- Jie Heng
- Savitha Vijayarangan
Developed as part of the Applied Data Intelligence curriculum at San Jose State University.