Data & AI

Build a Modern Enterprise-Grade Data Lakehouse with Open Source Tools

This 3-day intensive course guides participants through the creation of a modern, enterprise-grade data lakehouse architecture using only open-source technologies. The course adopts a fully hands-on, learning-by-doing approach, where participants build a complete, end-to-end data platform from scratch.

From ingestion and transformation to data storage, quality assurance, querying, and visualization, learners will design and deploy their own modular data stack centered on data lakehouse principles, combining the scalability of data lakes with the structure and governance of data warehouses.

Throughout the course, participants will work on practical, scenario-based projects using datasets from the Luxembourg open data ecosystem, simulating real-life challenges such as combining diverse data sources, maintaining data quality, and ensuring version control.

By the end of the training, participants will have gained real-world experience with industry-leading open-source tools including Airbyte, Apache Airflow, dbt, Soda, MinIO, Apache Iceberg, Nessie, Dremio, and Apache Superset. The course emphasizes governance, scalability, and cost-efficiency, equipping learners with the skills to implement future-proof, vendor-independent data platforms.

Content

Data ingestion using Airbyte to files, database and REST APIs, ans open data sources
Storage using Apache Iceberg & MinIO
Data modeling & transformation with dbt
Pipeline orchestration with Apache AirFlow
Data quality control with Soda
Data versioning and branching with Nessie
SQL-based querying via Dremio
Interactive analytics with Apache Superset
Local orchestration via Docker Compose
Hands-on project work using datasets from the open data ecosystem

Learning Outcomes

Upon completion of the course, learners will be able to:

Understand the architecture of a modern open-source data stack
Set up and operate ingestion pipelines using Airbyte
Build transformation workflows with dbt
Implement data versioning using Nessie and Apache Iceberg
Validate and test data using Soda
Query and visualize data using Dremio, Apache Superset
Deploy a fully functional data platform using Docker Compose
Evaluate when and how to use these tools in a real enterprise context

Training Method

Instructor-led, fully hands-on
Learning-by-doing, guided project work
A fictional enterprise scenario, powered by open data sources, serves as the backbone for building a real-world data lakehouse

Certification

Certificate of Participation

Prerequisites

Familiarity with SQL and basic command-line usage
(Optional) Some exposure to data pipelines and concepts such as ELT/ETL
(Optional) Familiarity with Python and Docker is helpful

Planning and location

Edition starting the 09/12/2025

Session 1

09/12/2025 - Tuesday
09:00 - 17:00

Session 2

11/12/2025 - Thursday
09:00 - 17:00

Session 3

12/12/2025 - Friday
09:00 - 17:00

ESCO Skills

ESCO Occupations

Available Edition(s):

09/12/2025 - 12/12/2025

This combination does not exist.

Add to cart

84.00 € 84.0 EUR 84.00 €

84.00 €

Your trainer(s) for this course

Bruno WOZNIAK

See trainer's courses.

Seasoned tech executive turned expert in digital transformation, innovation, and data-driven sense-making, I am driven by a passion for giving back and empowering others through practical learning and I create hands-on training experiences that focus on real-world application. With near-real-life cases and mini-projects, I ensure participants gain actionable insights and skills they can immediately put into practice.