Data Engineer | Distributed Systems | Streaming & Cloud Architecture
Currently working as a Data Engineer in IT Operations at Mirae Asset Life Insurance, building and optimizing enterprise-scale data systems for business intelligence and operational decision-making.
I focus on:
- Designing end-to-end data pipelines (Batch + Streaming)
- Building distributed systems with Kafka & Spark
- Cloud-native architecture on AWS
- Production-grade BI systems for executive stakeholders
- Data modeling, performance tuning, and reliability engineering
π LinkedIn
https://www.linkedin.com/in/μμ€-μ -a29442257/
Feb 2025 β Present
- Design and maintain enterprise data marts and DW pipelines
- Develop and optimize batch ETL workflows for insurance and financial datasets
- Provide executive-level BI dashboards for C-level decision-making
- Ensure data integrity, performance optimization, and governance compliance
- Troubleshoot and optimize production batch jobs and database workloads
- Large-scale relational data modeling
- Analytical query performance tuning
- Secure data masking and access control
- Production-grade pipeline reliability
Lambda Architecture | Kafka β Spark β S3 β PostgreSQL
Repository:
https://github.com/SangjunRyu/clinical-search-data-pipeline
Designed and implemented an end-to-end Lambda Architecture data pipeline processing over 5.2M clinical search log events (TripClick dataset).
The system combines:
- π¦ Batch layer for daily, consistent analytics (T+1)
- β‘ Speed layer for near real-time dashboards (5-minute micro-batch)
- β»οΈ Immutable raw storage for replay and reprocessing
- π³ Fully containerized distributed infrastructure (Docker-based)
This project simulates a production-style hybrid architecture used in real-world data platforms.
Web Servers β Kafka (Event Streaming)
Kafka β S3 (Archive Raw) β Spark ETL β PostgreSQL (Batch Marts)
Kafka β Spark Structured Streaming β PostgreSQL (Realtime Marts)
PostgreSQL β Apache Superset Dashboards
Apache Airflow (Pipeline Automation & Scheduling)
Repository:
https://github.com/SangjunRyu/AWS-3tier-Architecture
Designed a scalable 3-tier architecture including:
- EC2 + Load Balancer
- Reverse Proxy (Apache)
- Prometheus & Grafana monitoring
- K6 load testing
- S3 log archiving
Validated scalability under concurrent simulated traffic.
Repository:
https://github.com/SangjunRyu/Cloud9-Final-Project
- Batch + real-time analytics on emergency response times
- AWS Glue ETL + Lambda streaming ingestion
- SNS alert integration
- Data-driven optimization of 7-minute golden-time target
Python, Java, SQL, C++
Apache Kafka
Apache Spark (Batch & Streaming)
Apache Airflow
ETL Pipeline Design
Data Modeling
Event-Driven Architecture
AWS (EC2, S3, Lambda, Glue, IAM, VPC)
Docker
Kubernetes
Prometheus
Grafana
CI/CD
Oracle
PostgreSQL
MySQL
DynamoDB
Bachelor of Engineering
Computer Science & Electronic Engineering
Chung-Ang University, Seoul
GPA: 4.21 / 4.5
- TOEFL: 89 (June 2023)
- Exchange Program: University of Turku, Finland (Dec 2023 β June 2024)
- Participated in software development.
Seeking opportunities in:
- Global Tech Companies
- Cloud-native Data Engineering roles
- Distributed Systems & Streaming Infrastructure teams
I aim to build scalable, fault-tolerant, and intelligent data systems at global scale.

