[Github]Apache Spark 2 for Beginners

Lisrelchen

1098

收藏 2017-08-22

Apache Spark 2 for Beginners

本帖隐藏的内容

https://github.com/PacktPublishing/Apache-Spark-2-for-Beginners

This is the code repository for Apache Spark 2 for Beginners, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. ##Instructions and Navigations All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

Detailed installation steps (software-wise)

The steps should be listed in a way that it prepares the system environment to be able to test the codes of the book. ###1. Apache Spark: a. Download Spark version mentioned in the table
b. Build Spark from source or use the binary download and follow the detailed instructions given in the pagehttp://spark.apache.org/docs/latest/building-spark.html
c. If building Spark from source, make sure that the R profile is also built and the instructions to do that is given in the link given inthe step b.
###2. Apache Kafka a. Download Kafka version mentioned in the table
b. The “quick start” section of the Kafka documentation gives the instructions to setup Kafka.http://kafka.apache.org/documentation.html#quickstart
c. Apart from the installation instructions, the topic creation and the other Kafka setup pre-requisites have been covered in detail in the chapter of the book

The code will look like the following:

Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Spark 2.0.0 or above is to be installed on at least a standalone machine to run the code samples and do further activities to learn more about the subject. For Spark Stream Processing, Kafka needs to be installed and configured as a message broker with its command line producer producing messages and the application developed using Spark as a consumer of those messages.

##Related Products