Big Data Systems (WT 2019/20)

Prof. Dr. Tilmann Rabl

The amount of data that can be generated and stored in academic and industrial projects and applications is increasing rapidly. Big data analytics technologies have established themselves as a solution for big data challenges to the scalability problems of traditional database systems. The vast amounts of new data that is collected, however, usually is not as easily analyzed as curated, structured data in a data warehouse is. Typically, these data are noisy, of varying format and velocity, and need to be analyzed with techniques from statistics and machine learning rather than pure SQL-like aggregations and drill-downs. Moreover, the results of the analyses frequently are models that are used for decision making and prediction. The complete process of big data analysis is described as a pipeline, which includes data recording, cleaning, integration, modeling, and interpretation.

In this lecture, we will discuss big data systems, i.e., infrastructures that are used to handle all steps in typical big data processing pipelines.

Successor of this series: Big Data Systems (WT 2020/21)

Lectures

Introduction

Prof. Dr. Tilmann Rabl

Date: October 15, 2019
Language: English
Duration: 01:15:49

Introduction	01:15:49
Big Data	00:24:36
Data Science	00:20:58
Course Logistics	00:30:15

Database Systems Recab

Prof. Dr. Tilmann Rabl

Date: October 22, 2019
Language: English
Duration: 01:29:32

Database Systems Recab	01:29:32
Announcements	00:07:15
Relational Databases	00:21:30
ER Model, Relational Schema and Instance	00:09:41
Normal Forms	00:13:02
Delete Anomaly	00:14:43
Structured Querry Language	00:23:21

RDBMS Internals

Prof. Dr. Tilmann Rabl

Date: October 24, 2019
Language: English
Duration: 01:12:54

RDBMS Internals	01:12:54
Memory Hierarchy	00:13:08
Bottom Up	00:11:26
Access Methods	00:20:32
Hashing	00:17:47
Query Processing	00:10:01

Big Data Stack

Prof. Dr. Tilmann Rabl

Date: November 5, 2019
Language: English
Duration: 01:30:37

Big Data Stack	01:30:37
Quick Survey	00:11:41
Big Data Stack and its Evolution	00:22:36
Applications	00:00:00
Big Data Processing Evolution	00:56:20

Benchmarking und Measurement

Prof. Dr. Tilmann Rabl

Date: November 12, 2019
Language: English
Duration: 01:27:04

Benchmarking und Measurement	01:27:04
Why Measure?	00:26:20
Basic Terminology	00:28:01
Sample vs Population	00:12:58
Paired Observation	00:19:45

Benchmarks

Prof. Dr. Tilmann Rabl

Date: November 14, 2019
Language: English
Duration: 01:27:59

Benchmarks	01:27:59
Announcements and Recap	00:11:14
Benchmarks	00:16:45
TPC to the rescue!	00:14:38
BigBench	00:22:05
Data Centers	00:23:17

Cloud Computing

Prof. Dr. Tilmann Rabl

Date: November 20, 2019
Language: English
Duration: 01:28:53

Cloud Computing	01:28:53
Virtualization	00:29:07
Scheduling	00:32:21
Cloud Services	00:27:25

Distributed File Systems

Prof. Dr. Tilmann Rabl

Date: November 26, 2019
Language: English
Duration: 01:29:13

Distributed File Systems	01:29:13
File Systems	00:30:33
Network File System	00:13:47
Google File System	00:22:21
Hadoop Distributed File System	00:22:32

Map Reduce

Prof. Dr. Tilmann Rabl

Date: November 28, 2019
Language: English
Duration: 01:21:21

Map Reduce	01:21:21
MapReduce	00:20:05
Shuffling / Sorting Stage	00:21:51
Multi-Phase MMS	00:21:39
Fault Tolerance	00:17:46

Map Reduce 2

Prof. Dr. Tilmann Rabl

Date: December 3, 2019
Language: English
Duration: 01:28:01

Map Reduce 2	01:28:01
Map Reduce Stack	00:22:57
SQL on MR	00:26:00
Apache Spark	00:39:04

Wide Column Stores

Prof. Dr. Tilmann Rabl

Date: December 10, 2019
Language: English
Duration: 01:26:35

Wide Column Stores	01:26:35
Key-Value Stores	00:18:37
Data Model Design Principles	00:19:03
Common Properties of kV-Stores	00:21:25
Three-Ühase Commit	00:10:53
CAP Theorem - Overview	00:16:37

Key Value Stores

Prof. Dr. Tilmann Rabl

Date: December 12, 2019
Language: English
Duration: 01:30:15

Key Value Stores	01:30:15
Data Storage	00:31:13
Distributed Architecture	00:39:24
BigTable / HBase	00:19:38

Key Value Stores & Stream Processing Systems I

Prof. Dr. Tilmann Rabl

Date: December 17, 2019
Language: English
Duration: 01:25:38

Key Value Stores & Stream Processing Systems I	01:25:38
BigTable / HBase	00:35:32
Cassandra	00:55:01
Stream Processing	00:30:37

Databases On Modern Hardware

Dr. Sebastian Breß

Date: January 9, 2020
Language: English
Duration: 01:25:10

Databases On Modern Hardware	01:25:10
Traditional Database Systems	00:10:12
In-Memory Databases On Modern Hardware	00:18:54
Performance Limitation of Modern Processors	00:17:09
Hazards	00:00:00
Prediction	00:00:00

Stream Processing Systems I - Part 2

Prof. Dr. Tilmann Rabl

Date: January 9, 2020
Language: English
Duration: 01:30:26

Stream Processing Systems I - Part 2	01:30:26
Processing Windows	00:12:31
Windowed Join	00:28:34
Efficient Window Aggregation	00:16:22
Wat makes a system a Stream processing system	00:32:59

Ad-hoc Stream Querry Processing & Stream Processing Systems 1

Prof. Dr. Tilmann Rabl , Jeyhun Karimov

Date: January 14, 2020
Language: English
Duration: 01:28:26

Ad-hoc Stream Querry Processing & Stream Processing Systems 1	01:28:26
Challenges	00:25:04
AJoin Architecture	00:14:59
Join Reordering	00:20:13
Apache Storm	00:28:10

Machine Learning Systems - Introduction

Prof. Dr. Tilmann Rabl

Date: January 16, 2020
Language: English
Duration: 01:24:35

Machine Learning Systems - Introduction	01:24:35
Motivation	00:26:13
ML Systems - Overview	00:37:08
Stack of ML Systems	00:21:14

Machine Learning Systems - Introduction - Part 2

Prof. Dr. Tilmann Rabl

Date: January 23, 2020
Language: English
Duration: 01:26:14

Machine Learning Systems - Introduction - Part 2	01:26:14
Announcements	00:02:27
Stack of ML Systems	00:18:42
Language Abstractions & System Architectures	00:27:20
Execution Strategies	00:03:56
Data Parallel Execution	00:20:53
Task Parallel Execution	00:06:48
Data-Parallel Parameter Server	00:06:08