About this Course
This course focuses on the methods used to acquire, store and process data for downstream analysis. You'll analyze and compare available technologies to make informed decisions as data engineers. You’ll also explore the modern cloud data platform — building systems that handle data from ingestion, through storage, processing and, ultimately, serving.
What You’ll Learn
- How a data lake (like Delta Lake, which is part of the Spark ecosystem) can enhance the usability of your organization’s data
- Batch and streaming processing using Spark, Flink and other processing tools
- How to use Kafka to enable low-latency and real-time processing
- The unified log model and the ways the log notion recur in support of building robust, fault-tolerant distributed data systems
- Data acquisition, data governance and modeling techniques
Get Hands-On Experience
- Organize and store data in a data lake and handle updates and changes to your data
- Use Spark to connect to different data sources and process batch and streaming data
- Design, build and integrate a complete end-to-end data pipeline to support a realistic business case