Tools and techniques for massive data analysis

Day 1 (December 15th 2014)

Big Data Challenges
Introduction to Map-Reduce and Hadoop (Hadoop, HDFS, Yarn)
Data analytics - other approaches (Pig, Hive, Pregel, etc.)

Set-up the working environment (Docker, Hadoop Streaming, MRJob, NGS Example)

Day 2 (December 16th 2014)

Day 2 Setup
In memory Map-Reduce: the case of Apache Spark
Matrix-matrix product using Hadoop
Submitting Hadoop jobs on PICO

External links

Source code
Cineca's Docker repository
Cineca PICO's User Guide
Suggested readings