全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 LATEX论坛
2269 14
2016-08-11
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2016-8-11 10:11:33
A book to learn data science, data analysis and machine learning, suitable for all ages!


What does it cover?

This book covers common aspects in predictive modeling:

  • A. Data Preparation / Data Profiling
  • B. Selecting best variables (dataviz)
  • C. Assessing model performance
  • D. Miscellaneous

And it is heavly based on the funModeling package from the R language . Please install before starting :)

install.packages("funModeling")

  • Model creation consumes around 10% of almost any predictive modeling project; funModeling will try to cover remaining 90%.
  • It's not only the function itself, but the explanation of how to interpret results. This brings a deeper understanding of what is being done, boosting the freedom to use that knowledge in other situations regardless of the language.

Why a live book?

Hopefully this book barerly has an end, it will be updated periodically. Next planned chapter is a case study in predictive modeling. And you can contribute! below the github link.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-8-11 10:12:33
What is this about?

This chapter will cover three types of plots which aim to understand what are the most correlated numeric variables against a target variable.

  • Overview:
    • Analysis purpose: To identify if the input variable is a good/bad predictor through visual analysis.
    • General purpose: To explain the decision of including -or not- a variable to a model to a non-analyst person.

Constraint: Target variable must contain only 2 values. If it has NAvalues, they will be removed.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-8-11 10:25:15

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-8-11 10:25:49

Last two plots have the same data source, showing the distribution ofhas_heart_disease in terms of gender. The one on the left shows in percentage value, while the one on the right shows in absolute value.

How to extract conclusions from the plots? (Short version)

Gender variable seems to be a good predictor, since the likelihood of having heart disease is different given the female/male groups. it gives an order to the data.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-8-11 10:26:27
From 1st plot (%):
  • The likelihood of having heart disease for males is 55.3%, while for females is: 25.8%.
  • The heart disease rate for males doubles the rate for females (55.3 vs 25.8, respectively).

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群