Data Engineering Capstone

About this Course

In this capstone course, you'll get a chance to apply the latest data engineering approaches through completion of a hands-on data engineering project. You'll analyze and explore solutions to complex problems commonly found in the real-life application of Apache Spark’s data processing ecosystem — problems that require comprehensive and specialized knowledge, and where basic techniques would be suboptimal.

WHAT YOU’LL LEARN

How to design and implement a data lake for a multichannel retail organization in Azure Data Lake and Azure Databricks using a multi-hop, medallion architecture
Ways to efficiently and performantly ingest, transform and land big data workloads using Apache Spark
How to build a feature data set for a machine learning model
Diagnosis and tuning of common performance pitfalls in Spark jobs
How to design, orchestrate and curate data sets based on business requirements

GET HANDS-ON EXPERIENCE

Explore and transform semi-structured data sets at real scale in Azure Databricks using Apache Spark
Write Airflow DAGs to orchestrate common data pipeline operations
Use open-source Delta Lake to manage your data storage and perform common DDL operations

Quarter 1

Introduction to Data Engineering

Quarter 2

Building the Data Pipeline

Quarter 3

Data Engineering Capstone

Certificate in Building Modern Data Systems

Discover this program

Approved by the UW Paul G. Allen School of Computer Science & Engineering.

View this program's advisory board.