Skip to content

SunnyJaneH/lab2-StockAnalysis-dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock Price Analytics System (ELT Pipeline)

This repository contains a full-cycle data engineering project that implements a modern ELT (Extract, Load, Transform) pipeline. The system automates the ingestion, transformation, and visualization of stock market data for NVIDIA (NVDA) and Apple (AAPL).

🚀 Project Overview

The goal of this project is to eliminate manual data collection and provide investors with automated, reliable financial insights. It processes 360 days of historical data to track trends, momentum, and volatility.

Key Features

  • Automated Data Ingestion: Orchestrated by Airflow to fetch data from the Yahoo Finance API.
  • Cloud Data Warehousing: Scalable storage and processing using Snowflake.
  • Data Modeling: Robust transformation layers built with dbt (Data Build Tool).
  • Financial Analytics: Automatic calculation of 20/50/200-day Moving Averages and RSI (Relative Strength Index).
  • Interactive BI Dashboards: Visual analysis via Apache Superset.

🛠️ Tech Stack

  • Data Source: yfinance API
  • Data Warehouse: Snowflake
  • Orchestration: Apache Airflow
  • Transformation: dbt (Data Build Tool)
  • Visualization: Apache Superset
  • Language: Python, SQL

🏗️ Architecture

  1. Extract & Load: A Python-based Airflow DAG extracts raw stock data and loads it into the RAW schema in Snowflake.
  2. Transform (dbt):
    • Staging: Cleaning and type-casting raw data.
    • Intermediate: Calculating technical indicators (MA, RSI).
    • Analytics: Creating final views optimized for BI consumption.
  3. Visualize: Apache Superset connects to the Snowflake analytics layer to generate real-time charts.

📊 Analytics & Insights

The pipeline generates several key financial visualizations:

  • Price vs. Moving Averages: Identifies "Golden Cross" or "Death Cross" patterns using 20, 50, and 200-day windows.
  • RSI Analysis: Monitors 14-day RSI levels to identify overbought (>70) or oversold (<30) conditions.
  • Volatility Analysis: Box plots showcasing the distribution and max closing prices to assess market risk.

🔧 Setup & Installation

  1. Snowflake: Create a database and the necessary RAW and ANALYTICS schemas.
  2. Airflow:
    • Set up a Snowflake Connection (snowflake_conn).
    • Place the DAG scripts in your /dags folder.
  3. dbt:
    • Configure your profiles.yml to point to your Snowflake warehouse.
    • Run dbt build to execute models and tests.
  4. Superset:
    • Connect the Snowflake SQLAlchemy URI.
    • Import the dashboard JSON or create charts from the TRANSFORM_STOCK_METRICS table.

👥 Contributors

  • Jie Heng
  • Savitha Vijayarangan

Developed as part of the Applied Data Intelligence curriculum at San Jose State University.

About

This project implements a full data analytics pipeline using Snowflake, Airflow, dbt, and Superset to analyze stock data for NVIDIA (NVDA).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages