LO
Explain why programming languages such as R and Python are the preferred environment for much of data science work
Load and explore data in R
Build predictive models using R with multiple regression, logistic regression, decision trees, and random forests
Create training and test datasets
Explain how R can be used for feature engineering
Recognize when a model may be overfitted.
Installing R
Learning R
R grammar
In R the assignment operator can be either ‘=’ or ‘<-’. The latter notation gives a sense of object creation and avoids confusion with equality.
R is case-sensitive
Data types
-numeric/character/logical
data structures
- vectors-lists-multi-dimensional-Matrices-Dataframes
Missing values
Installing packages in R
Data loading
Data exploration
Commonly used univariate analyses
-Histograms/Box plots/QQplots
Bivariate analyses
-Scatter plots/Scatter plot matrix
Scatter matrix
Multiple regression
Checking the assumptions of multiple regression
Logistic regression with R
Assessing the logistic regression