Big Data Systems (WT 2020/21)

Prof. Dr. Tilmann Rabl

The amount of data that can be generated and stored in academic and industrial projects and applications is increasing rapidly. Big data analytics technologies have established themselves as a solution for big data challenges to the scalability problems of traditional database systems. The vast amounts of new data that is collected, however, usually is not as easily analyzed as curated, structured data in a data warehouse is. Typically, these data are noisy, of varying format and velocity, and need to be analyzed with techniques from statistics and machine learning rather than pure SQL-like aggregations and drill-downs. Moreover, the results of the analyses frequently are models that are used for decision making and prediction. The complete process of big data analysis is described as a pipeline, which includes data recording, cleaning, integration, modeling, and interpretation.

In this lecture, we will discuss big data systems, i.e., infrastructures that are used to handle all steps in typical big data processing pipelines. We will learn about data center infrastructure and scale-out software systems. The software discussed will cover the full big data stack, i.e., distributed file systems, Map Reduce, key value stores, stream processing, graph processing, ML systems.

Successor of this series: Big Data Systems (WS 2021/22)
Predecessor of this series: Big Data Systems (WT 2019/20)

Introduction

Welcome

Prof. Dr. Tilmann Rabl

Date: October 2, 2020
Language: English
Duration: 00:02:09

Welcome	00:02:09

Introduction

Prof. Dr. Tilmann Rabl

Date: November 3, 2020
Language: English
Duration: 01:18:38

Introduction	01:18:38

Big Data Stack

Introduction

Prof. Dr. Tilmann Rabl

Date: November 6, 2020
Language: English
Duration: 00:14:49

Introduction	00:14:49

The Big Data Stack

Prof. Dr. Tilmann Rabl

Date: November 6, 2020
Language: German
Duration: 00:10:15

The Big Data Stack	00:10:15

Google's Big Data Stack

Prof. Dr. Tilmann Rabl

Date: November 6, 2020
Language: German
Duration: 00:19:42

Google's Big Data Stack	00:19:42

Open Source Big Data Stack

Prof. Dr. Tilmann Rabl

Date: November 6, 2020
Language: German
Duration: 00:15:59

Open Source Big Data Stack	00:15:59

Further Evolution

Prof. Dr. Tilmann Rabl

Date: November 6, 2020
Language: German
Duration: 00:07:22

Further Evolution	00:07:22

MapReduce I

Introduction

Prof. Dr. Tilmann Rabl

Date: November 16, 2020
Language: English
Duration: 00:06:23

Introduction	00:06:23

Map - Sort - Reduce

Prof. Dr. Tilmann Rabl

Date: November 16, 2020
Language: English
Duration: 00:14:47

Map - Sort - Reduce	00:14:47

Sorting in Detail

Prof. Dr. Tilmann Rabl

Date: November 16, 2020
Language: English
Duration: 00:16:25

Sorting in Detail	00:16:25

MapReduce Architecture

Prof. Dr. Tilmann Rabl

Date: November 16, 2020
Language: English
Duration: 00:21:51

MapReduce Architecture	00:21:51

MR Algorithms

Prof. Dr. Tilmann Rabl

Date: November 16, 2020
Language: English
Duration: 00:11:26

MR Algorithms	00:11:26

Data Center and Cloud Computing

Introduction

Prof. Dr. Tilmann Rabl

Date: November 23, 2020
Language: English
Duration: 00:24:27

Introduction	00:24:27

Virtualization

Prof. Dr. Tilmann Rabl

Date: November 23, 2020
Language: English
Duration: 00:16:29

Virtualization	00:16:29

Scheduling

Prof. Dr. Tilmann Rabl

Date: November 23, 2020
Language: English
Duration: 00:20:53

Scheduling	00:20:53

Cloud Computing

Prof. Dr. Tilmann Rabl

Date: November 23, 2020
Language: English
Duration: 00:14:12

Cloud Computing	00:14:12

Cloud Applications

Prof. Dr. Tilmann Rabl

Date: November 23, 2020
Language: English
Duration: 00:12:00

Cloud Applications	00:12:00

Distributed File Systems

Basics of File Systems

Prof. Dr. Tilmann Rabl

Date: November 25, 2020
Language: English
Duration: 00:22:37

Basics of File Systems	00:22:37

Network File System

Prof. Dr. Tilmann Rabl

Date: November 25, 2020
Language: English
Duration: 00:16:52

Network File System	00:16:52

Google File System

Prof. Dr. Tilmann Rabl

Date: November 25, 2020
Language: English
Duration: 00:20:13

Google File System	00:20:13

Hadoop Distributed File System

Prof. Dr. Tilmann Rabl

Date: November 25, 2020
Language: English
Duration: 00:11:03

Hadoop Distributed File System	00:11:03

Erasure Coding

Prof. Dr. Tilmann Rabl

Date: November 25, 2020
Language: English
Duration: 00:09:46

Erasure Coding	00:09:46

(Big Data) File Format

Prof. Dr. Tilmann Rabl

Date: November 25, 2020
Language: English
Duration: 00:13:57

(Big Data) File Format	00:13:57

MapReduce II

Prof. Dr. Tilmann Rabl

Date: November 30, 2020
Language: English
Duration: 00:13:12

MapReduce II	00:13:12

Guest Lecture

Relational & Big-Data Processing in the Enterprise - Bridging the Gap

Dr. Alexander Böhm

Date: January 6, 2021
Language: English
Duration: 00:57:17

Relational & Big-Data Processing in the Enterprise - Bridging the Gap	00:57:17

Graph Databases

Pedro Silva

Date: January 7, 2021
Language: English
Duration: 02:59:18

Graph Databases	02:59:18

Machine Learning Systems

Introduction

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:18:29

Introduction	00:18:29

Machine Learning Models

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:15:58

Machine Learning Models	00:15:58

ML System Stack

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:28:02

ML System Stack	00:28:02

Language Abstraction & System Architectures

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:18:46

Language Abstraction & System Architectures	00:18:46

SystemML

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:28:52

SystemML	00:28:52

Execution Strategies

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:44:08

Execution Strategies	00:44:08

Data-Parallel Parameter Server

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:22:05

Data-Parallel Parameter Server	00:22:05

Federated Machine Learning

Prof. Dr. Tilmann Rabl

Date: January 18, 2021
Language: German
Duration: 00:13:48

Federated Machine Learning	00:13:48

Modern Hardware

Modern Hardware I

Prof. Dr. Tilmann Rabl

Date: January 26, 2021
Language: English
Duration: 00:19:31

Modern Hardware I	00:19:31

Data Processing on GPUs

Ilin Tolovski

Date: January 26, 2021
Language: English
Duration: 00:37:26

Data Processing on GPUs	00:37:26

Intro to Persistent Memory I

Lawrence Benson

Date: January 26, 2021
Language: English
Duration: 00:10:23

Intro to Persistent Memory I	00:10:23

Intro to Persistent Memory II

Lawrence Benson

Date: January 26, 2021
Language: English
Duration: 00:20:25

Intro to Persistent Memory II	00:20:25

A Brief Introduction to RDMAs

Pedro Silva

Date: January 26, 2021
Language: English
Duration: 00:22:00

A Brief Introduction to RDMAs	00:22:00

Benchmarking & Measurement

Introduction

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:08:49

Introduction	00:08:49

Back of the Envelope Calculation

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:12:06

Back of the Envelope Calculation	00:12:06

Measurements & Metrics

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:14:22

Measurements & Metrics	00:14:22

Some Statistics

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:47:45

Some Statistics	00:47:45

Benchmarks

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:09:28

Benchmarks	00:09:28

Sort Benchmarks

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:16:25

Sort Benchmarks	00:16:25

BigBench / TPCx-BB - Big Data Benchmark

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:13:48

BigBench / TPCx-BB - Big Data Benchmark	00:13:48

Fair Benchmarking

Prof. Dr. Tilmann Rabl

Date: February 1, 2021
Language: English
Duration: 00:12:00

Fair Benchmarking	00:12:00