Data engineering course

Master the basics of data engineering, designing systems and infrastructure for effective data management.

Framework

Data Engineering is one of the most important techniques in the twenty-first century, and it is a must to build scalable, efficient and fault-tolerant data pipelines.

Program Breakdown

Who is this for?

This course is designed for data engineers who want to master the tools and techniques used in modern data engineering. We'll cover HDFS as the foundation of big data, and data processing with Hive, Impala, and Spark. Further, we will investigate advanced data storage technologies such as HBase and Kudu and finish the training with some job-orchestration tools such as Airflow.

Program goal - What you will take away from the course

You'll learn how to design and build robust data pipelines that can handle large volumes of data, perform complex data transformations, and integrate with a wide range of data sources and destinations. With hands-on exercises and real-world examples, you'll gain practical experience working with these powerful tools and be ready to apply your new skills to your own data engineering projects.

image

Topics covered

Introduction to data engineering and big data 

Learn the fundamentals of data engineering and big data to efficiently handle data processing and analysis.

Hadoop Distributed File System (HDFS) and Hadoop ecosystem tools 

Gain knowledge about the Hadoop Distributed File System and Hadoop ecosystem tools used for managing and processing large volumes of data.

Apache Hive and Impala for SQL-based data processing 

Discover Apache Hive and Impala for enabling SQL-based data processing in big data environments.

Apache HBase and Apache Kudu for NoSQL data storage 

Learn about Apache HBase and Apache Kudu for implementing NoSQL data storage in distributed environments.

Apache Spark for distributed data processing and analytics 

Explore Apache Spark as a powerful framework for distributed data processing and analytics in big data scenarios.

Airflow for data pipeline orchestration and scheduling 

Master Airflow for efficiently orchestrating and scheduling data pipelines, ensuring smooth data flow.

Best practices for data engineering development, deployment, and monitoring 

Understand the best practices for data engineering project development, deployment, and monitoring to achieve optimal results.

Meet the Creators

Matthias

Matthias Baumann

Chief Technology Officer & Principal Big Data Solutions Architect Lead, Ultra Tendency

Marvin

Marvin Taschenberger

Professional Software Architect, Ultra Tendency

Hudhaifa

Hudhaifa Ahmed

Senior Lead Big Data Developer & Berlin Territory Manager, Ultra Tendency

Unlock the Ultra Tendency program to help your team to deliver meaningful impact today.  

Frequently Asked Questions