C3 data and informationLO
Explain how data and its sources are an asset to organizations, governments, and the lives of citizensExplain the distinction between data, information, knowledge, and wisdomExplain why data quality is important
Define and operationalize key data-quality attributes
Define attributes of datasets, such as missing values, outliers, and probability distributions.
data growth
Data summarization
Data quality
Production view of data quality
Intrinsic data quality: accuracy, objectivity, believability, reputation
Accessibility data quality: accessibility, access security
Contextual data quality: relevancy, value-added, timeliness, completeness, amount of data
Representational data quality: interpretability, ease of understanding, concise representation, consistent representation
data quality in six dimensions
accuracy / completeness / timeliness / Validity / Integrity / Consistency
Consumption view of data quality
Data characteristics
Data types
Variables
-Binary
-nominal
-ordinal
-interval
-ratio
Cardinality
Data distributions
The dangers of assuming normally distributed data
Outliers
An outlier is an observation that is distinctly different from the other observations.
- Procedural error
- Extraordinary event
- Extraordinary observation
- Unique combination of variables
Missing data
- Missing completely at random (MCAR)
- Missing at random (MAR)
- Missing not at random (MNAR)
- -- - - -missing value analysis and imputation- common skills