Introduction to Data Engineering

collapse

Course Details

This course can only be taken as part of the Certificate in Big Data Technologies.

Get Program Details

About this Course


In this course, you'll get an introduction to the fundamental building blocks of big data engineering. You'll learn the foundational concepts of distributed computing, distributed data processing, data management and data pipelines. You'll also survey a variety of available data stack technologies and learn how to run a data processing workflow through a commonly used platform.

What You'll Learn

  • The fundamentals of modern big data stacks, their uses, advantages and limitations
  • How functional programming ideas help with building and using systems to store and process big data 
  • The foundations of the Hadoop ecosystem and its emerging successors like Spark
  • The ins and outs of big data processing via multiple paradigms, both storage-bound and in-memory (Spark, Spark SQL, Delta Lake, Hive, SQL) 
  • The origins, uses and limitations of NoSQL stores (HBase, Redis, Elasticsearch, Cassandra, graph-processing systems, etc.) 

GET HANDS-ON EXPERIENCE 

  • Apply contemporary distributed computing frameworks to the storage, processing and analysis of large data sets 
  • Use the MapReduce model and the Spark framework on big data problems 
  • Apply principles of functional programming to data storage and analysis 

Program Overview

This course is part of the Certificate in Big Data Technologies.

  Get our email newsletter with career tips, event invites and upcoming program info.       Sign Up Now