Skip to content
View SangjunRyu's full-sized avatar

Block or report SangjunRyu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SangjunRyu/README.md

Sangjun You

Data Engineer | Distributed Systems | Streaming & Cloud Architecture

Currently working as a Data Engineer in IT Operations at Mirae Asset Life Insurance, building and optimizing enterprise-scale data systems for business intelligence and operational decision-making.

I focus on:

  • Designing end-to-end data pipelines (Batch + Streaming)
  • Building distributed systems with Kafka & Spark
  • Cloud-native architecture on AWS
  • Production-grade BI systems for executive stakeholders
  • Data modeling, performance tuning, and reliability engineering

πŸ“Œ LinkedIn
https://www.linkedin.com/in/상쀀-유-a29442257/


πŸ’Ό Work Experience

Mirae Asset Life Insurance β€” IT Operations & Data Engineer

Feb 2025 – Present

Enterprise Data Architecture & BI Engineering

  • Design and maintain enterprise data marts and DW pipelines
  • Develop and optimize batch ETL workflows for insurance and financial datasets
  • Provide executive-level BI dashboards for C-level decision-making
  • Ensure data integrity, performance optimization, and governance compliance
  • Troubleshoot and optimize production batch jobs and database workloads

Core Focus Areas

  • Large-scale relational data modeling
  • Analytical query performance tuning
  • Secure data masking and access control
  • Production-grade pipeline reliability

πŸš€ Featured Project: Clinical Search Data Pipeline

Lambda Architecture | Kafka β†’ Spark β†’ S3 β†’ PostgreSQL

Repository:
https://github.com/SangjunRyu/clinical-search-data-pipeline


Overview

Designed and implemented an end-to-end Lambda Architecture data pipeline processing over 5.2M clinical search log events (TripClick dataset).

The system combines:

  • πŸ“¦ Batch layer for daily, consistent analytics (T+1)
  • ⚑ Speed layer for near real-time dashboards (5-minute micro-batch)
  • ♻️ Immutable raw storage for replay and reprocessing
  • 🐳 Fully containerized distributed infrastructure (Docker-based)

This project simulates a production-style hybrid architecture used in real-world data platforms.

High-Level Architecture

Ingestion

Web Servers β†’ Kafka (Event Streaming)

Batch Layer (Accuracy)

Kafka β†’ S3 (Archive Raw) β†’ Spark ETL β†’ PostgreSQL (Batch Marts)

Speed Layer (Low Latency)

Kafka β†’ Spark Structured Streaming β†’ PostgreSQL (Realtime Marts)

Serving

PostgreSQL β†’ Apache Superset Dashboards

Orchestration

Apache Airflow (Pipeline Automation & Scheduling)


☁ Cloud Infrastructure Project: AWS 3-Tier Architecture

Repository:
https://github.com/SangjunRyu/AWS-3tier-Architecture

Designed a scalable 3-tier architecture including:

  • EC2 + Load Balancer
  • Reverse Proxy (Apache)
  • Prometheus & Grafana monitoring
  • K6 load testing
  • S3 log archiving

Validated scalability under concurrent simulated traffic.


πŸš’ Fire Emergency Response Data Platform

Repository:
https://github.com/SangjunRyu/Cloud9-Final-Project

  • Batch + real-time analytics on emergency response times
  • AWS Glue ETL + Lambda streaming ingestion
  • SNS alert integration
  • Data-driven optimization of 7-minute golden-time target

πŸ›  Technical Stack

Programming

Python, Java, SQL, C++

Data Engineering

Apache Kafka
Apache Spark (Batch & Streaming)
Apache Airflow
ETL Pipeline Design
Data Modeling
Event-Driven Architecture

Cloud & DevOps

AWS (EC2, S3, Lambda, Glue, IAM, VPC)
Docker
Kubernetes
Prometheus
Grafana
CI/CD

Databases

Oracle PostgreSQL
MySQL
DynamoDB


πŸŽ“ Education

Bachelor of Engineering
Computer Science & Electronic Engineering
Chung-Ang University, Seoul
GPA: 4.21 / 4.5

  • TOEFL: 89 (June 2023)
  • Exchange Program: University of Turku, Finland (Dec 2023 – June 2024)
    • Participated in software development.

🌍 Career Objective

Seeking opportunities in:

  • Global Tech Companies
  • Cloud-native Data Engineering roles
  • Distributed Systems & Streaming Infrastructure teams

I aim to build scalable, fault-tolerant, and intelligent data systems at global scale.

Pinned Loading

  1. clinical-search-data-pipeline clinical-search-data-pipeline Public

    Python

  2. Cloud9-Final-Project Cloud9-Final-Project Public

    Data pipeline project for firestation goldentime

    Jupyter Notebook

  3. AWS-3tier-Architecture AWS-3tier-Architecture Public

    3-Tier web service architecture built with AWS, featuring monitoring (Prometheus & Grafana), load testing (K6), and dynamic/static service separation.

  4. Use-Eyetracking-to-control-IOT-devices Use-Eyetracking-to-control-IOT-devices Public

    23λ…„ μ€‘μ•™λŒ€ν•™κ΅ λ‹€ν•™μ œμΊ‘μŠ€ν†€ λŒ€νšŒ μΆœν’ˆμž‘ν’ˆ

    Python

  5. Red-Horsewagon Red-Horsewagon Public

    꼬꼬마 κ²Œμž„κ°œλ°œ μœ λ‹ˆν‹° ν”„λ‘œμ νŠΈ

    ShaderLab

  6. neetcode-submissions-q42irjg1 neetcode-submissions-q42irjg1 Public

    My NeetCode.io problem submissions

    Python