5th Workshop on Big Data Benchmarking (2014)

Various Lecturers

The objective of the WBDB workshops is to make progress towards development of industry standard application-level benchmarks for evaluating hardware and software systems for big data applications.

To make progress towards a big data benchmarking standard, the workshop will explore a range of issues including:

Data features: New feature sets of data including, high-dimensional data, sparse data, event-based data, and enormous data sizes.
System characteristics: System-level issues including, large-scale and evolving system configurations, shifting loads, and heterogeneous technologies for big data and cloud platforms.
Implementation options: Different implementation options such as SQL, NoSQL, Hadoop software ecosystem, and different implementations of HDFS.
Workload: Representative big data business problems and corresponding benchmark implementations. Specification of benchmark applications that represent the different modalities of big data, including graphs, streams, scientific data, and document collections.
Hardware options: Evaluation of new options in hardware including different types of HDD, SSD, and main memory, and large-memory systems, and new platform options that include dedicated commodity clusters and cloud platforms.
Synthetic data generation: Models and procedures for generating large-scale synthetic data with requisite properties.
Benchmark execution rules: E.g. data scale factors, benchmark versioning to account for rapidly evolving workloads and system configurations, benchmark metrics.
Metrics for efficiency: Measuring the efficiency of the solution, e.g. based on costs of acquisition, ownership, energy and/or other factors, while encouraging innovation and avoiding benchmark escalations that favor large inefficient configuration over small efficient configurations.
Evaluation frameworks: Tool chains, suites and frameworks for evaluating big data systems.
Early implementations: Of the Deep Analytics Pipeline or BigBench and lessons learned in benchmarking big data applications.
Enhancements: Proposals to augment these benchmarks, e.g. by adding more data genres (e.g. graphs), or incorporating a range of machine learning and other algorithms, will be entertained and are encouraged.

Session 1

Welcome & Introduction to WBDB

Chaitan Baru , Dr. Matthias Uflacker

Date: August 5, 2014
Language: English
Duration: 00:13:02

Welcome & Introduction to WBDB	00:13:02
Chaitan Baru - Welcome	00:07:06
Matthias Uflacker - Introducing the HPI	00:05:56

An Approach to Benchmarking Industrial Big Data Applications

Umesh Dayal

Date: August 5, 2014
Language: English
Duration: 00:42:07

An Approach to Benchmarking Industrial Big Data Applications	00:42:07
Evolution of Big Data	00:10:04
Benchmarking Industrial Big Data Applications	00:16:16
Processing Requirements	00:11:36
Conclusion	00:04:11

In-Memory Processing in Healthcare and Life Sciences

Dominik Bertram

Date: August 5, 2014
Language: English
Duration: 00:29:08

In-Memory Processing in Healthcare and Life Sciences	00:29:08
Introducing the SAP Innovationcenter	00:09:58
Selected SAP HANA Usage Scenarios	00:17:19
Integration with Electronic Medical Record	00:01:51

Session 2

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?

Berni Schiefer

Date: August 5, 2014
Language: English
Duration: 00:22:04

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?	00:22:04
Benchmarking	00:10:46
Conclusion	00:06:45
Summary	00:04:33

Extending The OLTP-Bench Framework for Big Data Systems

Djelell Eddine Difallah

Date: August 5, 2014
Language: English
Duration: 00:21:08

Extending The OLTP-Bench Framework for Big Data Systems	00:21:08
Motivation	00:10:10
All in Al	00:10:19
Conclusion	00:00:39

Session 3

LDBC: Linked Data Benchmark Council

Andrey Gubichev

Date: August 5, 2014
Language: English
Duration: 00:28:07

LDBC: Linked Data Benchmark Council	00:28:07
What is the LDBC	00:11:01
Database Benchmark Design	00:16:05
Conclusion	00:01:01

SQL on Hadoop Benchmark

Prof. Dr. Tilmann Rabl

Date: August 5, 2014
Language: English
Duration: 00:07:01

SQL on Hadoop Benchmark	00:07:01
Motivation	00:06:19
Conclusion	00:00:42

Benchmarking Virtualized Hadoop Clusters

Todor Ivanov

Date: August 5, 2014
Language: English
Duration: 00:18:59

Benchmarking Virtualized Hadoop Clusters	00:18:59
Motivation	00:16:28
Summary	00:02:31

Towards A Complete BigBench Implementation

Prof. Dr. Tilmann Rabl

Date: August 5, 2014
Language: English
Duration: 00:15:41

Towards A Complete BigBench Implementation	00:15:41
The BigBench Proposal	00:07:10
Scaling	00:03:48
Metric	00:04:43

Session 5

A TU Delft Perspective on Benchmarking Big Data in the Data Center

Alexander Iosup

Date: August 6, 2014
Language: English
Duration: 00:39:57

A TU Delft Perspective on Benchmarking Big Data in the Data Center	00:39:57
What is Cloud Computing	00:10:44
The Challenge	00:17:06
The data deluge	00:10:53
Conclusion	00:01:14

BW-EML SAP Standard Application Benchmark

Heiko Gerwens

Date: August 6, 2014
Language: English
Duration: 00:22:36

BW-EML SAP Standard Application Benchmark	00:22:36
SAP Standard Application Benchmarks	00:10:55
Data Model and Querries	00:09:04
Data Distribution	00:02:37

FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics

André Petermann

Date: August 6, 2014
Language: English
Duration: 00:19:03

FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics	00:19:03
Transactional Data Objects	00:07:39
Footbroker	00:06:52
Implementation and Scaling	00:04:32