Step 1-2: Run EDA
1. Select Tables
Focus on critical tables (Customers, Orders) or tables with known issues.
2. Configure
- Sample Size: 10,000 rows
- Correlation Analysis: Enabled
- Advanced Stats: Enabled
Step 3: Review Metrics
Completeness Analysis
Review null percentages. Example: products.description has 45% nulls (Major Issue).
Uniqueness Analysis
Check for duplicates in ID columns and unique constraints.
Validity Analysis
Identify format issues (invalid emails), out-of-range values, and constraint violations.
Step 4-5: Identify Issues & Action Plan
| Issue | Priority | Action | Owner |
|---|---|---|---|
| Invalid emails in customers (PII) | Critical | Run validation script | Data Eng |
| Missing values in shipping address | High | Investigate source | Support |
| Duplicate User IDs | Critical | Deduplicate records | Data Eng |
| Inconsistent date formats | Medium | Standardize pipeline | Analytics |
Step 6: Export EDA Report
Download & Share
Export the HTML report with visualizations. Share with stakeholders to track improvements over time.
Success Criteria
- Data quality issues prioritized
- Action plan created with owners
- Baseline metrics captured