About this Course
In this capstone course, you'll get a chance to apply the latest data engineering approaches through completion of a hands-on data engineering project. You'll analyze and explore solutions to complex problems commonly found in the real-life application of Apache Spark’s data processing ecosystem — problems that require comprehensive and specialized knowledge, and where basic techniques would be suboptimal.
WHAT YOU’LL LEARN
- How to design and implement a data lake for a multichannel retail organization in Azure Data Lake and Azure Databricks using a multi-hop, medallion architecture
- Ways to efficiently and performantly ingest, transform and land big data workloads using Apache Spark
- How to build a feature data set for a machine learning model
- Diagnosis and tuning of common performance pitfalls in Spark jobs
- How to design, orchestrate and curate data sets based on business requirements
GET HANDS-ON EXPERIENCE
- Explore and transform semi-structured data sets at real scale in Azure Databricks using Apache Spark
- Write Airflow DAGs to orchestrate common data pipeline operations
- Use open-source Delta Lake to manage your data storage and perform common DDL operations