Tools and techniques for massive data analysis
Day 1 (December 15th 2014)
Big Data Challenges
Introduction to Map-Reduce and Hadoop (Hadoop, HDFS, Yarn)
Data analytics - other approaches (Pig, Hive, Pregel, etc.)
Set-up the working environment (Docker, Hadoop Streaming, MRJob, NGS Example)
Day 2 (December 16th 2014)
Day 2 Setup
In memory Map-Reduce: the case of Apache Spark
Matrix-matrix product using Hadoop
Submitting Hadoop jobs on PICO
External links
Source code
Cineca's Docker repository
Cineca PICO's User Guide
Suggested readings