The term “big data” refers to data sets so large and complex that traditional tools, like relational databases, are unable
to process them in an acceptable time frame or within a reasonable cost range. Problems occur in sourcing, moving,
searching, storing, and analyzing the data, but with the right tools these problems can be overcome, as you’ll see in
the following chapters. A rich set of big data processing tools (provided by the Apache Software Foundation, Lucene,
and third-party suppliers) is available to assist you in meeting all your big data needs.
In this chapter, I present the concept of big data and describe my step-by-step approach for introducing each
type of tool, from sourcing the software to installing and using it. Along the way, you’ll learn how a big data system can
be built, starting with the distributed file system and moving on to areas like data capture, Map Reduce programming,
moving data, scheduling, and monitoring. In addition, this chapter offers a set of requirements for big data
management that provide a standard by which you can measure the functionality of these tools and similar ones.
附件列表