Syllabus objectives: Translate vague questions into analyzable ones; construct graphs; identify data types (structured/unstructured); handle missing data; apply uni/bivariate exploration.
Visualization power: Visuals convey complex information quickly (think world money distribution). Exploration favors speed and iteration; communication favors clarity and professionalism.
Workflow: Import data → transform/tidy → visualize → model (split train/test) → communicate in a report.
Graph types in ggplot2: Histograms for distribution/skew; bar plots for categories; box plots for quartiles/outliers; scatter plots for correlations and interactions.
Data handling: Comment on sources and ethics; fix missing values (MAR, hidden, MNAR) via removal, imputation, or new levels.
Feature engineering: Create new variables (logs, polynomials) for non-tree models.
Summary stats: Use mean, median, variance for insight; align with the data dictionary.
Communication tips: Clear labels, avoid ambiguity, show uncertainty, use complementary colors, persuade by showing data.
DataFest datasets for your own visualizations: huggingface.co/supersam7/datasets
"A Picture Speaks a Thousand Words."
Mastering Data Visualization, EDA, and ETL: A Complete Guide for Data Science Professionals
- Data Visualization: Transform complex data into clear insights with Tableau & Power BI
- Exploratory Data Analysis (EDA): Uncover hidden patterns and relationships
- Extract, Transform, Load (ETL): Clean and prepare real-world data efficiently
- Hands-on dashboards and visual storytelling techniques
Use the Arrow Keys to Switch Slides (Down and Right)