[精品图书]Storm Real-Time Processing Cookbook

ReneeBK

8452

收藏 2014-08-10

Efficiently process unbounded streams of data in real time

Overview

Learn the key concepts of processing data in real time with Storm
Concepts ranging from Log stream processing to mastering data management with Storm
Written in a Cookbook style, with plenty of practical recipes with well-explained code examples and relevant screenshots and diagrams

In Detail

Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.

The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.

What you will learn from this book

Create a log spout
Consume messages from a JMS queue
Implement unidirectional synchronization based on a data stream
Execute disaster recovery on a separate AWS region

Approach

A Cookbook with plenty of practical recipes for different uses of Storm.

Who this book is written for

If you are a Java developer with basic knowledge of real-time processing and would like to learn Storm to process unbounded streams of data in real time, then this book is for you.

Product Details

Paperback: 254 pages
Publisher: Packt Publishing (May 13 2013)
Language: English
ISBN-10: 1782164421
ISBN-13: 978-1782164425
http://www.packtpub.com/big-data-and-business-inteliigence/storm-real-time-processing-cookbook
http://www.amazon.com/Real-Time-Processing-Cookbook-Quinton-Anderson/dp/1782164421

本帖隐藏的内容

Packt.Storm Real-time Processing Cookbook 2013.pdf
大小:(2.03 MB)

只需: 20 个论坛币马上下载

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

Nicolle

2014-8-10 21:14:25

Creating an association rules model in R

提示: 作者被禁止或删除内容自动屏蔽

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

pensifia

2014-8-10 22:25:20

Creating Recommendation Engine

A recommendation engine makes intelligent guesses as to what a customer may want to buy based on previous lists of products, which has been made famous by leaders such as Amazon. These lists may be from a current selection within the context of the current session. The list of products may be from previous purchases by the particular customer, and it may even simply be the products that the customer has viewed within a given session. Whichever approach you choose, the training data and scoring data during operational phases must follow the same principles.

In this recipe, we will use the association rules model from the previous recipe to create a recommendation engine. The concept behind the engine is that lists are supplied as asynchronous inputs and recommendations are forwarded as asynchronous outputs where applicable.

Tip

There are product combinations that aren't strongly supported by the model; in these cases, no recommendation is emitted. If you need a recommendation for every single input, you could choose to emit a random recommendation when there is no strongly supported recommendation, or you could choose to improve your model through better and generally larger training datasets.

How to do it…

Start by creating a Maven project called arules-topology and add the following dependencies:
复制代码
Next, create a main topology class called RecommendationTopology using the idiomatic Storm main method. For this recipe, we will be receiving the product list as a JSON array on a Kafka topic. We will therefore need to coerce the byte array input into a tuple containing two separate values, one being the transaction ID and the other being the list of products, as shown in the following lines of code:
复制代码
We will also need to publish the output message using the Kafka partition persist. The recommendation and transaction ID need to be coerced into a single value consisting of a JSON array as follows:
复制代码
We then need to define the topology as described here:
复制代码
The Storm-R project's standard function supports only a known input array size. This works for most use cases; however, for the association case, the input size will vary for each tuple. It is therefore necessary to override the execute function to cater for this particular case as shown here:
复制代码
These elements are all that is required to create the recommendation engine. You can now start your topology in local mode from Eclipse. In order to test it, a test script is provided with the chapter code bundle named sendSelection.py. This takes a single parameter, which is the number of transactions, to publish onto the queue as follows:
复制代码
You can view the output recommendations by issuing the following command from the Kafka command line:
复制代码