Cardinality
Cardinality describes how unique the values in a column are. It also plays a role in defining how tables relate to each other. A high-cardinality column contains many unique values, while a low-cardinality column contains few unique values.
In depth
Think of a single column in a table. Cardinality equals its distinct value count. For example, email, UUID, and invoice ID usually have a unique value for almost every row (high cardinality). In comparison, status, plan tier, and a true/false flag have only a handful of options (low cardinality).
Cardinality also describes relationships between tables:
- One to one: Each employee has one badge ID.
- One to many: One customer, many orders.
- Many to many: Products and tags linked through a bridge table.
Many analytics tasks depend on these patterns. Grouping by a high‑cardinality column can clutter charts and slow queries. Grouping by a low‑cardinality column makes for cleaner comparisons. Join logic depends on relationship cardinality to prevent duplicate rows and inflated totals.
A simple way to compare columns is the cardinality ratio:
cardinality ratio = distinct_count / row_count
As a rule of thumb, ratios above 0.8 signal high cardinality. Ratios below 0.1 signal low cardinality.
Pro tip
Before you join tables or publish a metric, check distinct counts on the join keys. If a supposed one‑to‑many key shows many‑to‑many, fix it with a bridge table or a different grain.
Why Cardinality matters
- Query speed: High‑cardinality columns increase index size and slow performance.
- Data quality: Faulty relationship cardinality creates duplicates and causes double counting.
- Clear visuals: High‑cardinality dimensions crowd charts. Low‑cardinality dimensions make trends more obvious.
- Storage and compression: Some databases compress or index low‑cardinality columns more efficiently.
Cardinality - In practice
- High cardinality examples: “user_id”, “email”, “session_id”, “invoice_id”.
- Low cardinality examples: “status” {open, closed}, “plan” {free, pro, enterprise}, “is_active” {true, false}.
- Recommended actions:
- Use top N, binning, or an “Other” bucket when charting high‑cardinality fields.
- Use approximate distinct functions for very large datasets when exact counts are expensive and even unusable (you are trading perfect accuracy for speed and performance).
- Validate relationship cardinality after joins by counting duplicates of the primary key.
Cardinality and PowerMetrics
In PowerMetrics:
- When creating or exploring a metric, pick dimensions with manageable cardinality for default views. Save high‑cardinality fields for filters (prevent users from applying segmentation).
- Use top N and grouping to keep charts readable (especially helpful when a dimension contains thousands of unique values).
- In data models, confirm the join keys form the expected one‑to‑many pattern. If not, adjust the model or add a bridge to avoid double counting.
- Track distinct counts over time for keys like “user_id” to catch unexpected spikes.
Related terms
Member
A member, in the context of data, is a specific, unique value within a dimension that represents an individual entity, category, or attribute in your data. Think of it as one item on a long list—like “Q1 2025” in a Time dimension or “Blue T-Shirt” in a Product dimension.
Read moreData Warehouse
A data warehouse is a centralized repository that stores and organizes structured data from multiple sources. It’s optimized for reporting and analysis, enabling businesses to get a unified view of their historical and current data.
Read moreData Visualization
Data visualization is the representation of data and information as charts, diagrams, pictures, or tables so you can read patterns and spot outliers quickly, accurately, and precisely. Think of it as a map for your numbers: it makes relationships and surprises visible at a glance.
Read more