SIKS/BiGGrid Advanced Tutorial on Big Data
November 30 & December 1, 2011
University of Twente, The Netherlands
The School for Information and Knowledge Systems SIKS and the dutch e-science grid BiG Grid organize a new two-day tutorial on “Big Data” at the University of Twente. The tutorial is on top of some exciting new developments in cloud computing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, Microsoft, and Facebook. The course is about processing terabytes of data on large clusters, and discusses several core computer science topics adapted for large data centers, such as new file systems (Google File System and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new ‘noSQL’ databases (BigTable, Cassandra and Dynamo).
A major part of the tutorial consists of hand-on experience. Students will solve real large-scale data analysis problems of their choice on a cluster of machines. Students will get access to the Big Grid Hadoop test cluster, providing 20 cores for MapReduce and 100TB diskspace for HDFS. Students are encouraged to bring their own data, and present their results at the end of the second day. The organization will provide several public datasets, such as Wikipedia, the ENRON dataset, White House visitor records, Genome data, a large web crawl, and more.
The tutorial will be given in English and is part of the educational program for SIKS-Ph.D. students. Although the course is primarily intended for SIKS-Ph.D. students, other participants are not excluded. However, their number will be restricted and depends on the number of SIKS-Ph.D. students taking the course. Registration details will be posted on the SIKS Big Data course site.
Course coordinators: Djoerd Hiemstra (UT), Evert Lammerts (SARA), Arjen de Vries (CWI/TUD)
Keynote lecture by Jimmy Lin (University of Maryland)