Online Courses: Big Data Projects and Data Science Pipelines

1633

收藏 2016-07-18

Building Distributed Pipelines for Data Science using Kafka, Spark, and CassandraLearn how to introduce a distributed data science pipeline in your organization

[size=1.5]

[size=1.4]September 19, 21 & 23, 2016

9:00AM – 11:00AM PDT
(5:00PM – 7:00PM BST)

[size=1.25]Register now

Tweet

[size=1.5]

Building a distributed pipeline is a huge—and complex—undertaking. If you want to ensure yours is scalable, has fast in-memory processing, can handle real-time or streaming data feeds with high throughput and low-latency, is well suited for ad-hoc queries, can be spread across multiple data centers, is built to allocate resources efficiently, and is designed to allow for future changes, join Andy Petrella and Xavier Tordoir for this immensely practical hands-on course.

What you’ll learn—and how you can apply it

By the end of this course, you’ll have a solid understanding of:

The most important technologies for a distributed pipeline, when they should be used—and how
How to integrate scalable technologies into your company’s existing data architecture
How to build a successful, scalable, elastic, distributed pipeline using a lean approach

This course is for you if…

You’re a data scientist with experience with data modeling, business intelligence, or a traditional data pipeline and need to deal with bigger or faster data
You’re a software or data engineer with experience in architecting solutions in Scala, Java, or Python and you need to integrate scalable technologies in your company’s architecture

Prerequisites:

Intermediate knowledge of an object-oriented language and basic knowledge of a functional programming language, as well as basic experience with a JVM
Understanding of classic web architecture and service-oriented architecture
Basic understanding of ETL, streaming data, and distributed data architectures
Intermediate understanding of Docker and UNIX, as well as some basic knowledge about networks (IP, DNS, SSH, etc.)

About your instructors

Andy Petrella is a mathematician turned into a distributed computing entrepreneur, in addition to being a Scala and Spark trainer. Andy participated in many projects built using Spark, Cassandra, and other distributed technologies, in various fields including geospatial, IoT, automotive, and smart cities projects.

Xavier Tordoir started his career as a researcher in experimental physics, focused on data processing. He took part in projects in finance, genomics, and software development for academic research, working on time series, prediction of biological molecular structures and interactions, and applied machine learning methodologies. He developed solutions to manage and process data distributed across data centers.