Data Engineering Portfolio

Dipesh Luitel

B.Sc. CSIT · Tribhuvan University · 2026 Graduate

Languages
Python SQL (T-SQL) Pandas
Data Engineering
Apache Kafka ETL Pipelines Medallion Architecture Data Warehousing
Infrastructure
Docker SQL Server PostgreSQL Git
Featured Projects
Project 01

Real-Time Crypto Streaming Platform

Live BTC & ETH price pipeline with Apache Kafka and Docker
dipeshluitel/crypto-data-platform

A production-style real-time data pipeline that streams live cryptocurrency prices from the CryptoCompare API into Apache Kafka, processes them with a Python consumer, and persists the output for downstream analysis — all containerised with Docker Compose for a one-command setup.

Pipeline Architecture
CryptoCompare API Kafka Producer crypto_prices topic
Kafka Consumer JSON Files + OHLCV CSV
Kafka Message Schema
Field Type Description
coin string BTC or ETH symbol
price float Current USD price
timestamp float Unix timestamp of fetch
Key Engineering Decisions
  • Coin-keyed Kafka messages ensure consistent topic partitioning per asset, enabling parallel consumer scaling
  • Producer publishes every 5 seconds — balancing API rate limits with near-real-time freshness
  • Historical OHLCV ingestion (30 days) decoupled from streaming path, allowing independent replay
  • Docker Compose bundles Kafka + Zookeeper — zero-config local setup for collaborators
  • Separate streaming/ and batch/ folders reflect clear separation of processing modes
Technologies
Apache Kafka Python Docker Zookeeper Pandas REST API JSON / CSV
Project 02

SQL Data Warehouse

Medallion Architecture on SQL Server — Bronze → Silver → Gold
dipeshluitel/SQL-Data-Warehouse-Project

A production-ready data warehouse built on Microsoft SQL Server using the Medallion Architecture pattern. Raw CRM and ERP data (CSV) flows through three distinct layers — each with well-defined responsibilities — into business-ready Gold tables optimised for reporting and analytics.

Medallion Layers
Bronze · Raw Silver · Clean Gold · Business
Layer Responsibilities
  • Bronze — Raw ingestion from CRM & ERP CSV files; preserves source-of-truth without transformation
  • Silver — Deduplication, null handling, type corrections, and standardisation across source systems
  • Gold — Aggregated, business-logic-applied tables ready for dashboards and decision-making
Engineering Highlights
  • Stored procedures (load_bronze, load_silver, load_gold) encapsulate each layer's load logic — clean and replayable
  • Consistent naming convention (sourcesystem_entity) enforced across all tables for clear data lineage
  • Full load (Truncate & Load) strategy — simple, auditable, and suitable for batch refresh cycles
  • Separate tests/ folder demonstrates awareness of data quality validation
  • Architecture documented with Draw.io diagrams for clear onboarding and handoff
Technologies
SQL Server T-SQL Stored Procedures Medallion Architecture ETL Draw.io