Syllabus - CS378 - Big Data Programming
Spring 2015
MW 9:30 - 11:00 WAG 214
Unique: 52022Description
The map-reduce programming paradigm is a fundamental tool used in processing large data sets, and is supported in current tools such as Hadoop and MongoDB. Apache Spark offers another programming paradigm for processing large data sets. In this course you will gain an understanding of the concepts embodied in map-reduce, and will investigate how map-reduce is used to address various problems in processing and analyzing large data sets. This course will explore map-reduce as implemented in Hadoop, as well as the associated distributed file system (HDFS). In this course you will gain an understanding of the concepts offered and supported in Spark, and will investigate how to apply these concepts to address various problems including those you addressed using map-reduce.
Objectives
Upon completing this course, the student will be able to design and implement map-reduce programs for various large data set processing tasks, and will be able to deisgn and implement programs using Apache Spark.
Prerequisites
Data structures, Java programming experience.
Textbooks
Required: MapReduce Design Patterns, by Donald Miner and Adam Shook
O'Reilly Media
Print ISBN: 978-1-4493-2717-0 | ISBN 10: 1-4493-2717-6
Ebook ISBN: 978-1-4493-4197-8 | ISBN 10: 1-4493-4197-7
Required: Learning Spark, by Holden Karau, Andy Konwinsky, Patrick Wendell, Matei Zaharia
O'Reilly Media
Print ISBN: 978-1-4493-5862-4 | ISBN 10: 1-4493-5862-4
Ebook ISBN: 978-1-4493-5860-0 | ISBN 10: 1-4493-5860-8
Recommended: Hadoop: The Definitive Guide, 3rd Edition, by Tom White
O'Reilly Media/Yahoo Press
Print ISBN: 978-1-4493-1152-0 | ISBN 10: 1-4493-1152-0
Ebook ISBN: 978-1-4493-1151-3 | ISBN 10: 1-4493-1151-2