real projects · hybrid · intermediate
Data Ingestion Pipelines
Move files through validation, transforms, and quarantine lanes with observability baked in.
5 weeks · 34 guided hours · weekday evenings · 16,500 THB (informational)
Tool stack
Pythonpandaspyarrow
Description
Bridgemesh pipeline labs emphasize parquet-friendly transforms, checksum gates, and bilingual column naming conventions. You will compare batch windows against Songkran holiday quiet periods to practice realistic scheduling conversations.
What is included
- Deterministic transforms with unit-tested edge cases
- Quarantine folders with human-readable reasons.json
- Checksum and row-count reconciliation reports
- Partitioning strategies for humid-market retail spikes
- Lightweight orchestration without heavyweight platforms
- Cost notes for cloud storage choices
Outcomes
- Stand up a three-stage pipeline with documented rollback
- Present metrics your data stakeholders can trust
- Ship a mentor-reviewed incident replay from a failed ingest
FAQ
Is Spark included?
No—this track stays within single-node Python ergonomics.
Hardware expectations?
16GB RAM recommended for larger parquet labs; cloud notebooks available with usage caps.
Honest limitation?
We do not tune warehouse engines; focus stays on ingestion hygiene.
Experience notes
“Quarantine JSON idea landed in our finance upload lane the week after capstone.”