There are more MOOC learning options for Data Scientists today than ever. Take a tour of Coursera's 8 Data Science specializations, with exclusive insight from program coordinators and course instructors.
By Matthew Mayo.
The University of Washington's Machine Learning Specialization was developed in conjunction with Dato and got underway with its first session in September. It uses Python in all courses, and so an understanding of the language is useful prior to enrolling. A number of the common Python machine learning tools are used throughout the specialization, and there is flexibility to try out Dato's proprietary GraphLab Create in the first course, with academic licenses available for all students interested in expanding their set of tools.Most machine learning courses, including the current ML courses on Coursera, take a "bottom up" approach: they start from the foundations of "what are probabilities of events?" and "how do we estimate them?", then cover basic ML models and optimization algorithms, and eventually get to more advanced ML methods. Rarely, do these courses cover how these methods are used in real-world problems and the practical issues associated with them.
We take an alternative approach building on case studies. We start each section by defining an end-to-end case study of how ML has impact on real-world applications. We then dig in to how ML is used in these applications. Finally, we describe the models and algorithms used to make this possible. We call this a "case study approach" that provides hands on experience in ML. All courses include hands-on exercises involving real-world applications.
Our goal is to make the specialization accessible to folks with no ML or Stats background. We start from the basics of how ML is used. However, by the time learners get to their capstone project in the 6th course, they will build and deploy a real intelligent application that uses deep learning on image and text data to provide a whole new type of recommender system.
Relative to the Data Science Specialization on Coursera, we focus more on machine learning, which involves building intelligent applications that learn from data to form real-time predictions. Data science, on the other hand, typically focuses on analyzing a single dataset at depth. The ML Specialization also focuses on what it takes to deploy these techniques in production and at scale.
Case-study approach: how ML is used in the real-world and what are the ML techniques that make it possible.
Basics to state-of-the-art ML: we cover the foundational topics, but build up to state-of-the-art methods, such as deep learning and boosted trees, using these case studies.
Deployment and scalability: ML doesn't end in a performance curve for a paper. Our learners actually build and deploy applications that use ML.
There are no courses currently at UW that cover the material in this specialization. Most of our UW courses are targeted at graduate students and advanced undergraduates looking for the theoretical foundations of ML, assuming prior background in statistics. This specialization is focused on helping learners go from only having some programming background to having a deep understanding of what it takes to build an intelligent application that uses ML in the real-world.
For students who already have ML background, there are probably other courses that will be more appropriate to start with. However, for those who want to take a hands-on approach and really learn ML in practice, this is the specialization for you. :)
The University of California, San Diego's Big Data Specialization[url=]was developed alongside[/url]Splunk. It is another new upstart specialization which got underway this Fall, and focuses mainly on what first comes to mind when you think Big Data: the Hadoop/Spark ecosystem. It does, however, have some other topics thrown in as well, including hot topics such as graph analytics and machine learning. This is the only Big Data Focused specialization on the platform.
We teach Hadoop Based frameworks and related technologies. We cover the Hadoop "Zoo" from MapReduce to Spark, from data wrangling to predictive analytics on very large data.
There are very few Big Data courses at University level in general - more are emerging slowly. The Big Data Specialization is a set of 5 courses covering basic and advance Big Data topics. The technology is changing so rapidly - it is almost a full time job just to keep up!!! ;-)
We take pride in presenting difficult or technically dense material in a simple easy to understand ways - come check it out and learn with us!

We currently offer 3 Specializations on Coursera: Data Science, Executive Data Science, and Genomic Data Science. These programs each have unique aspects:
Data Science - This is the first, most comprehensive (9 courses), largest (2 million+ enrollers, 1,000+ completers), and most science-driven data science Specialization on Coursera.
Executive Data Science - This is the only data science Specialization specifically designed for managers of data scientists. All the courses are designed to fit into a busy managers schedule (1 week courses) and all are on demand. We are also designing a cool interactive capstone experience with Zillow for this one.
Genomic Data Science - This is the only program covering Genomic Data Science on the Coursera platform. This is a major area of growth with interest in personalized medicine increasing by the day. The course covers the tools needed to understand and analyze data from next generation sequencing.
Data Science - covers the spectrum of data science problems from Git/Github, to R, to specific tools/packages for data cleaning, inference, and machine learning. This course is largely R based since R is the most widely used language for data science in the wild.
Executive Data Science - covers a crash course in the basics of what data science is, how to build a data science team, and how to manage that team to success.
Genomic Data Science - covers an introduction to genomic technologies, python, Galaxy, R, the command line, Bioconductor, and statistics for genomics. This course is designed to get a person "up to speed" on doing genomic data science.
Parts of these courses are incorporated into programs in the Biostatistics (http://www.jhsph.edu/departments/biostatistics/), computer science (https://www.cs.jhu.edu/), biology (http://www.bio.jhu.edu/) and computational biology (https://ccb.jhu.edu/) programs at JHU. But these specializations were designed specifically for the MOOC platform to be available and fit into the schedules of people taking courses online.
We are really excited about making classes available to the world and hope that they will be useful for people getting into a new field, transitioning careers, or looking for a job.
Our specialization is unique, not just as a data science specialization, but as a series of STEM MOOCs in general, in a few different ways. First, it has an enormous amount of production for a series of MOOCs, and is the result of a development team working for the past two years. For example, although our courses have lecture videos with very high production quality, the production of these lecture videos represents a very small component of our overall investment of time and resources (unlike most MOOCs, for which this is essentially the sole focus). Instead, our MOOCs are built upon the creation of an interactive textbook applying the principles of active learning. As soon as learners encounter a tricky concept, we ask them to stop and think about it before transitioning. We peppered the text with hundreds of exercises; some of these build learning, others are opportunities for learners to implement the bioinformatics algorithms that they encounter, and others allow them to apply these algorithms to real biological datasets. Each page of the interactive text is linked to its own discussion forum, and students have made thousands of posts over the last two years. Furthermore, an important part of the process of developing this interactive text was responding to student concerns. To do this, we mined through 8500 discussion forum posts and have made widescale changes to every single page of the interactive text, as well as creating FAQs and additional remedial learning modules to help address the most common errors encountered by learners. Pavel and I outlined our vision for what 21st century textbooks in STEM fields should look like in a recent Communications of the ACM Viewpoints article:http://m.cacm.acm.org/magazines/2015/10/192385-life-after-moocs/fulltext
Second, bioinformatics is inherently interdisciplinary, being at the intersection of computer science, biology, mathematics, and data science. As such, it attracts learners who arrive with varying strengths. It means that we have had to think about how to adapt the content for learners with these strengths. For example, our courses are currently divided into two main tracks: a "biologist track" and a "hacker track". All learners read the course interactive text, watch the course videos, and take quizzes. However, learners on the hacker track implement the algorithms that they encounter in the text; learners following the biologist track do not need to program but do need to learn how to apply existing software resources in bioinformatics. Accordingly, we also have a series of "Bioinformatics Application Challenges" for these learners in which they can learn how to apply some of this existing software while following a narrative that is tangential to what they have learned in the main text. For example, in the main text, learners see how researchers sequence genomes by solving a 300-year-old mathematical puzzle. The hacker track learners write their own algorithms (in the language of their choosing) to assemble genomes on their own; the biologist track learners have an Application Challenge walking them through how to use the popular SPAdes assembler to analyze the quality of an assembly for Staphylococcus.
The greatest central theme of our Bioinformatics Specialization is the importance of being able to formulate a biological challenge as a precise computational problem. This is a skill that is often lacking in life science education. For example, when looking for the location in a bacterium's DNA where the bacterium starts replicating this DNA, we are essentially looking for a "hidden message" saying "start replication here!" But this problem makes no sense to a computer scientist, who needs to be told exactly what to look for. The interplay between learning new biological facts and using these facts to formulate increasingly robust computational problems is constant throughout our courses.
In terms of specific course content, each chapter of content addresses a central questions to modern biology such as "How Do We Assemble Genomes?", "Why Do We Still Not Have an HIV Vaccine?", and "How Do We Find Disease-Causing Mutations?" We see how approaches from a variety of technical topics such as graph theory, machine learning, and standard data science methods such as clustering algorithms can be applied to solve each central question. From a biological perspective, we have a heavy focus on biological sequence analysis and the methods needed to address it.
The course content covered in the Bioinformatics Specialization is identical to some of the coursework taken by students in the renowned Bioinformatics Ph.D. program at UC San Diego. Furthermore, the print companion of the course (Bioinformatics Algorithms: An Active Learning Approach) has already been adopted in about twenty universities, some of which offer flipped classes based on the book. This is another way in which we feel that our courses are unique, as the majority of online courses do not currently have the rigor of a course that one would take at a leading offline institution.
We are very proud of partnering with Illumina (the leader in Genome Sequencing) to design a really interesting Capstone project (launching in the spring) based on their BaseSpace cloud platform, and Illumina is interested in interviewing students who excel in our Specialization. More generally, we think that our Specialization has excellent potential to be adopted by many university programs in bioinformatics around the world, and that it will be a great resource for biologists, computer scientists, and data scientists alike to add an important set of knowledge to their skillset in the rapidly growing biotech market. Learners in the latter two groups may have never even realized how relevant many classic approaches in CS and data science really are to modern biology, and the enormous demand for people who can bring these skillsets to biology, and we hope that our Specialization can help bridge this divide.
扫码加好友,拉您进群



收藏
