【精品文章】Top Coursera Data Science Specializations

2794

收藏 2016-07-15

There are more MOOC learning options for Data Scientists today than ever. Take a tour of Coursera's 8 Data Science specializations, with exclusive insight from program coordinators and course instructors.

By Matthew Mayo.

Coursera has been a favorite learning platform for aspiring and practicing data scientists for a number of years, with quality courses such as Mining Massive Datasets, Introduction to Data Science, andMachine Learning having long been standouts. In early 2014, Coursera began introducing specializations, tracks of multiple courses, in a number of areas of study, with a single data science specialization existing from the very beginning. As the number of specializations steadily increases, Fall 2015 brings a number of additional offerings in the data science realm, giving prospective learners myriad options for pursuing data science education.

This post will examine the 8 current Coursera data science and data science-related specialization offerings, and provide some additional insight directly from the specializations' instructors and coordinators. While these specializations cover more "traditional" data science topics such as the Hadoop, Python, and R ecosystems, there are now options for those interested in more niche topics, such as "executive" data science, bioinformatics, and machine learning. It should also be noted that, like all Coursera material, all course material is freely-accessible, but if you are interested in course or specialization certificates fees do apply. Capstone projects are, however, only accesible to paying students having completed all of the prior specialization coursework.

Note: KDnuggets gets absolutely no royalties from Coursera - this list is presented only to help our readers evaluate interesting courses and specializations.

Machine Learning Specialization, University of Washington

The University of Washington's Machine Learning Specialization was developed in conjunction with Dato and got underway with its first session in September. It uses Python in all courses, and so an understanding of the language is useful prior to enrolling. A number of the common Python machine learning tools are used throughout the specialization, and there is flexibility to try out Dato's proprietary GraphLab Create in the first course, with academic licenses available for all students interested in expanding their set of tools.

The specialization consists of the following courses:

▪  Machine Learning Foundations: A Case Study Approach
▪  Regression
▪  Classification
▪  Clustering & Retrieval
▪  Recommender Systems & Dimensionality Reduction
▪  Machine Learning Capstone: An Intelligent Application with Deep Learning

The course list indicates that a solid base for machine learning is provided. A blog post outlining the specialization is a good starting point for anyone looking to better understand the approach taken.

I caught up with specialization instructors Emily Fox and Carlos Guestrin, who answered the following questions for us.

What distinguishes your data science specialization from the others currently available via Coursera?

Most machine learning courses, including the current ML courses on Coursera, take a "bottom up" approach: they start from the foundations of "what are probabilities of events?" and "how do we estimate them?", then cover basic ML models and optimization algorithms, and eventually get to more advanced ML methods. Rarely, do these courses cover how these methods are used in real-world problems and the practical issues associated with them.

We take an alternative approach building on case studies. We start each section by defining an end-to-end case study of how ML has impact on real-world applications. We then dig in to how ML is used in these applications. Finally, we describe the models and algorithms used to make this possible. We call this a "case study approach" that provides hands on experience in ML. All courses include hands-on exercises involving real-world applications.

Our goal is to make the specialization accessible to folks with no ML or Stats background. We start from the basics of how ML is used. However, by the time learners get to their capstone project in the 6th course, they will build and deploy a real intelligent application that uses deep learning on image and text data to provide a whole new type of recommender system.

Relative to the Data Science Specialization on Coursera, we focus more on machine learning, which involves building intelligent applications that learn from data to form real-time predictions. Data science, on the other hand, typically focuses on analyzing a single dataset at depth. The ML Specialization also focuses on what it takes to deploy these techniques in production and at scale.

What 2 or 3 concepts or technologies does your specialization focus on the most?

Case-study approach: how ML is used in the real-world and what are the ML techniques that make it possible.
Basics to state-of-the-art ML: we cover the foundational topics, but build up to state-of-the-art methods, such as deep learning and boosted trees, using these case studies.
Deployment and scalability: ML doesn't end in a performance curve for a paper. Our learners actually build and deploy applications that use ML.

How does the specialization compare to similar course(s) at your university, if at all?

There are no courses currently at UW that cover the material in this specialization. Most of our UW courses are targeted at graduate students and advanced undergraduates looking for the theoretical foundations of ML, assuming prior background in statistics. This specialization is focused on helping learners go from only having some programming background to having a deep understanding of what it takes to build an intelligent application that uses ML in the real-world.

What else would you like people to know about your specialization?

For students who already have ML background, there are probably other courses that will be more appropriate to start with. However, for those who want to take a hands-on approach and really learn ML in practice, this is the specialization for you. :)

Big Data Specialization, UCSD

The University of California, San Diego's Big Data Specialization[url=]was developed alongside[/url]Splunk. It is another new upstart specialization which got underway this Fall, and focuses mainly on what first comes to mind when you think Big Data: the Hadoop/Spark ecosystem. It does, however, have some other topics thrown in as well, including hot topics such as graph analytics and machine learning.

The specialization contains the following courses:

▪  Introduction to Big Data
▪  Hadoop Platform and Application Framework
▪  Introduction to Big Data Analytics
▪  Machine Learning With Big Data
▪  Introduction to Graph Analytics
▪  Big Data - Capstone Project

Specialization coordinator Natasha Balac was kind enough to provide some further insight for us, answering the following questions.

What distinguishes your data science specialization from the others currently available via Coursera?

This is the only Big Data Focused specialization on the platform.

What 2 or 3 concepts or technologies does your specialization focus on the most?

We teach Hadoop Based frameworks and related technologies. We cover the Hadoop "Zoo" from MapReduce to Spark, from data wrangling to predictive analytics on very large data.

How does the specialization compare to similar course(s) at your university, if at all?

There are very few Big Data courses at University level in general - more are emerging slowly. The Big Data Specialization is a set of 5 courses covering basic and advance Big Data topics. The technology is changing so rapidly - it is almost a full time job just to keep up!!! ;-)

What else would you like people to know about your specialization?

We take pride in presenting difficult or technically dense material in a simple easy to understand ways - come check it out and learn with us!

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

oliyiyi

2016-7-15 22:33:26

ohns Hopkins is a major player in the data science MOOC space, offering 3 specialization tracks though Coursera. Each specialization covers a particular approach to data science, and each is profiled below. You can see additional interview feedback from the programs' coordinator Jeff Leek further below.

Data Science Specialization, Johns Hopkins University

The Johns Hopkins' University's Data Science Specialization is the original flagship data science track offered by Coursera. Being offered in conjunction with SwiftKey and Yelp, this specialization centers on the R programming language and its ecosystem. The program promotes practicality yet has an academic slant as well, manifested in its emphasis on the reproducibility of data science research.

Containing the following 9 courses (plus capstone), it is also the most extensive offering available.

▪  The Data Scientist's Toolbox
▪  R Programming
▪  Getting and Cleaning Data
▪  Exploratory Data Analysis
▪  Reproducible Research
▪  Statistical Inference
▪  Regression Models
▪  Practical Machine Learning
▪  Developing Data Products
▪  Data Science Capstone

Executive Data Science Specialization, Johns Hopkins University

Johns Hopkins' Executive Data Science Specialization is offered in conjunction with Zillow and DataCamp, and consists solely of one week courses. The specialization focuses on readying management for leveraging data science and interacting with data scientists, consisting of the following courses:

▪  A Crash Course in Data Science
▪  Building a Data Science Team
▪  Managing Data Analysis
▪  Data Science in Real Life
▪  Executive Data Science Capstone

Genomic Data Science Specialization, Johns Hopkins University

Johns Hopkins' Genomic Data Science Specialization is the first foray into a biological data science specialization for both Coursera and Johns Hopkins. The program focuses on using the command line, Python, R, Bioconductor, and Galaxy, and consists of these courses:

▪  Introduction to Genomic Technologies
▪  Genomic Data Science with Galaxy
▪  Python for Genomic Data Science
▪  Command Line Tools for Genomic Data Science
▪  Algorithms for DNA Sequencing
▪  Bioconductor for Genomic Data Science
▪  Statistics for Genomic Data Science
▪  Genomic Data Science Capstone

I was able to ask Johns Hopkins specializations coordinator Jeff Leek a few questions about the entirety of their data science tracks, and he provided the following insight.

What distinguishes your data science specialization from the others currently available via Coursera?

We currently offer 3 Specializations on Coursera: Data Science, Executive Data Science, and Genomic Data Science. These programs each have unique aspects:

Data Science - This is the first, most comprehensive (9 courses), largest (2 million+ enrollers, 1,000+ completers), and most science-driven data science Specialization on Coursera.

Executive Data Science - This is the only data science Specialization specifically designed for managers of data scientists. All the courses are designed to fit into a busy managers schedule (1 week courses) and all are on demand. We are also designing a cool interactive capstone experience with Zillow for this one.

Genomic Data Science - This is the only program covering Genomic Data Science on the Coursera platform. This is a major area of growth with interest in personalized medicine increasing by the day. The course covers the tools needed to understand and analyze data from next generation sequencing.

What 2 or 3 concepts or technologies does your specialization focus on the most?

Data Science - covers the spectrum of data science problems from Git/Github, to R, to specific tools/packages for data cleaning, inference, and machine learning. This course is largely R based since R is the most widely used language for data science in the wild.

Executive Data Science - covers a crash course in the basics of what data science is, how to build a data science team, and how to manage that team to success.

Genomic Data Science - covers an introduction to genomic technologies, python, Galaxy, R, the command line, Bioconductor, and statistics for genomics. This course is designed to get a person "up to speed" on doing genomic data science.

How does the specialization compare to similar course(s) at your university, if at all?

Parts of these courses are incorporated into programs in the Biostatistics (http://www.jhsph.edu/departments/biostatistics/), computer science (https://www.cs.jhu.edu/), biology (http://www.bio.jhu.edu/) and computational biology (https://ccb.jhu.edu/) programs at JHU. But these specializations were designed specifically for the MOOC platform to be available and fit into the schedules of people taking courses online.

What else would you like people to know about your specialization?

We are really excited about making classes available to the world and hope that they will be useful for people getting into a new field, transitioning careers, or looking for a job.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

oliyiyi

2016-7-15 22:33:56

Bioinformatics Specialization, UCSD

Bioinformatics is an interdisciplinary field which uses select tools and techniques from mathematics, computer science, statistics, engineering, and other fields, to analyze biological data; from our perspective, we could say that bioinformatics is the intersection of data science and biology. UCSD's Bioinformatics Specialization is a first of its kind in the field, and looks like it could be of benefit not only to those coming from the world of biology to data science, but to the reverse as well.

This specialization is made up of the following courses:

▪  Finding Hidden Messages in DNA (Bioinformatics I)
▪  Genome Sequencing (Bioinformatics II)
▪  Comparing Genes, Proteins, and Genomes (Bioinformatics III)
▪  Deciphering Molecular Evolution (Bioinformatics IV)
▪  Genomic Data Science and Clustering (Bioinformatics V)
▪  Finding Mutations in DNA and Proteins (Bioinformatics VI)
▪  Bioinformatics Capstone: Big Data in Biology

Instructor Phillip Compeau provided us with the following detailed feedback.

What distinguishes your data science specialization from the others currently available via Coursera?

Our specialization is unique, not just as a data science specialization, but as a series of STEM MOOCs in general, in a few different ways. First, it has an enormous amount of production for a series of MOOCs, and is the result of a development team working for the past two years. For example, although our courses have lecture videos with very high production quality, the production of these lecture videos represents a very small component of our overall investment of time and resources (unlike most MOOCs, for which this is essentially the sole focus). Instead, our MOOCs are built upon the creation of an interactive textbook applying the principles of active learning. As soon as learners encounter a tricky concept, we ask them to stop and think about it before transitioning. We peppered the text with hundreds of exercises; some of these build learning, others are opportunities for learners to implement the bioinformatics algorithms that they encounter, and others allow them to apply these algorithms to real biological datasets. Each page of the interactive text is linked to its own discussion forum, and students have made thousands of posts over the last two years. Furthermore, an important part of the process of developing this interactive text was responding to student concerns. To do this, we mined through 8500 discussion forum posts and have made widescale changes to every single page of the interactive text, as well as creating FAQs and additional remedial learning modules to help address the most common errors encountered by learners. Pavel and I outlined our vision for what 21st century textbooks in STEM fields should look like in a recent Communications of the ACM Viewpoints article:http://m.cacm.acm.org/magazines/2015/10/192385-life-after-moocs/fulltext

Second, bioinformatics is inherently interdisciplinary, being at the intersection of computer science, biology, mathematics, and data science. As such, it attracts learners who arrive with varying strengths. It means that we have had to think about how to adapt the content for learners with these strengths. For example, our courses are currently divided into two main tracks: a "biologist track" and a "hacker track". All learners read the course interactive text, watch the course videos, and take quizzes. However, learners on the hacker track implement the algorithms that they encounter in the text; learners following the biologist track do not need to program but do need to learn how to apply existing software resources in bioinformatics. Accordingly, we also have a series of "Bioinformatics Application Challenges" for these learners in which they can learn how to apply some of this existing software while following a narrative that is tangential to what they have learned in the main text. For example, in the main text, learners see how researchers sequence genomes by solving a 300-year-old mathematical puzzle. The hacker track learners write their own algorithms (in the language of their choosing) to assemble genomes on their own; the biologist track learners have an Application Challenge walking them through how to use the popular SPAdes assembler to analyze the quality of an assembly for Staphylococcus.

What 2 or 3 concepts or technologies does your specialization focus on the most?

The greatest central theme of our Bioinformatics Specialization is the importance of being able to formulate a biological challenge as a precise computational problem. This is a skill that is often lacking in life science education. For example, when looking for the location in a bacterium's DNA where the bacterium starts replicating this DNA, we are essentially looking for a "hidden message" saying "start replication here!" But this problem makes no sense to a computer scientist, who needs to be told exactly what to look for. The interplay between learning new biological facts and using these facts to formulate increasingly robust computational problems is constant throughout our courses.

In terms of specific course content, each chapter of content addresses a central questions to modern biology such as "How Do We Assemble Genomes?", "Why Do We Still Not Have an HIV Vaccine?", and "How Do We Find Disease-Causing Mutations?" We see how approaches from a variety of technical topics such as graph theory, machine learning, and standard data science methods such as clustering algorithms can be applied to solve each central question. From a biological perspective, we have a heavy focus on biological sequence analysis and the methods needed to address it.

How does the specialization compare to similar course(s) at your university, if at all?

The course content covered in the Bioinformatics Specialization is identical to some of the coursework taken by students in the renowned Bioinformatics Ph.D. program at UC San Diego. Furthermore, the print companion of the course (Bioinformatics Algorithms: An Active Learning Approach) has already been adopted in about twenty universities, some of which offer flipped classes based on the book. This is another way in which we feel that our courses are unique, as the majority of online courses do not currently have the rigor of a course that one would take at a leading offline institution.

What else would you like people to know about your specialization?

We are very proud of partnering with Illumina (the leader in Genome Sequencing) to design a really interesting Capstone project (launching in the spring) based on their BaseSpace cloud platform, and Illumina is interested in interviewing students who excel in our Specialization. More generally, we think that our Specialization has excellent potential to be adopted by many university programs in bioinformatics around the world, and that it will be a great resource for biologists, computer scientists, and data scientists alike to add an important set of knowledge to their skillset in the rapidly growing biotech market. Learners in the latter two groups may have never even realized how relevant many classic approaches in CS and data science really are to modern biology, and the enormous demand for people who can bring these skillsets to biology, and we hope that our Specialization can help bridge this divide.

Data Mining Specialization, University of Illinois, Urbana-Champaign

The University of Illinois, Urbana-Champaign's Data Mining Specialization is foundational and theoretical in nature, covering the fundamentals of data mining without consideration to specific tools or languages. The specialization is made up of the following courses:

▪  Pattern Discovery in Data Mining
▪  Text Retrieval and Search Engines
▪  Cluster Analysis in Data Mining
▪  Text Mining and Analytics
▪  Data Visualization
▪  Data Mining Capstone

Data Science at Scale Specialization, University of Washington

The University of Washington's Big Data at Scale Specializationgrew out of their original Introduction to Data Science course, which has been offered a number of times of the past 3 years. The specialization covers the paradigms, the practice, and the professional aspects of performing data science. The courses included in this track are:

▪  Data Manipulation at Scale: Systems and Algorithms
▪  Practical Predictive Analytics: Models and Methods
▪  Communicating Results: Visualization, Ethics, Reproducibility
▪  Data Science at Scale - Capstone Project

With all of the data science MOOC options available today, it's difficult to know where to begin looking. We hope that this summary, at the very least, gives you some direction in narrowing down your numerous choices.

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群