Exploratory Data Analysis (EDA)

Comprehensive data profiling and statistical analysis to understand your data quality before you document or migrate.

Overview

Our EDA engine runs advanced statistical profiling on your database tables to detect quality issues, null patterns, and distribution anomalies without lifting a finger.

Lightweight Profiling

We use efficient sampling (default 10k rows) to generate statistically significant insights without impacting production DB performance.

Privacy Preserving

Aggregates are calculated in-memory. Individual row data is discarded immediately after profiling.

Statistical Analysis

Column-Level Intelligence

Numeric Columns

  • Min, Max, Mean, Median
  • Standard Deviation
  • Quartiles (25%, 50%, 75%)
  • Outlier detection (IQR method)

Text Columns

  • Unique value counts
  • Most frequent values
  • Length distribution
  • Format consistency checks

Data Quality Metrics

Completeness

Percentage of non-null values. Spots columns that are mostly empty.

Uniqueness

Duplicate record detection and cardinality checks. Vital for Primary Key validation.

Validity

Regex-based format validation for emails, phone numbers, and zip codes.

Distribution

Skewness and kurtosis analysis to detect biased datasets.

Auto-Generated Visualizations

Sample Output Report

Col: age (INT)98.5% Complete
Min: 18Max: 84
Col: status (VARCHAR)4 Unique Values
Active
65%
Pending
20%
Suspended
10%
Archived
5%
DistributionCount

Visualizations are exported in the HTML report.

Common Use Cases

Use CaseBenefitRole
Migration PlanningIdentify columns with bad data before moving themData Engineer
ML Feature SelectionUnderstand distributions and correlationsData Scientist
Health CheckMonitor null rates over timeData Steward
DebuggingFind duplicate IDs causing constraint failuresDeveloper