Cardinality

Cardinality describes how unique the values in a column are. It also plays a role in defining how tables relate to each other. A high-cardinality column contains many unique values, while a low-cardinality column contains few unique values.

In depth

Think of a single column in a table. Cardinality equals its distinct value count. For example, email, UUID, and invoice ID usually have a unique value for almost every row (high cardinality). In comparison, status, plan tier, and a true/false flag have only a handful of options (low cardinality).

Cardinality also describes relationships between tables:

One to one: Each employee has one badge ID.
One to many: One customer, many orders.
Many to many: Products and tags linked through a bridge table.

Many analytics tasks depend on these patterns. Grouping by a high‑cardinality column can clutter charts and slow queries. Grouping by a low‑cardinality column makes for cleaner comparisons. Join logic depends on relationship cardinality to prevent duplicate rows and inflated totals.

A simple way to compare columns is the cardinality ratio:

cardinality ratio = distinct_count / row_count

As a rule of thumb, ratios above 0.8 signal high cardinality. Ratios below 0.1 signal low cardinality.

Pro tip

Before you join tables or publish a metric, check distinct counts on the join keys. If a supposed one‑to‑many key shows many‑to‑many, fix it with a bridge table or a different grain.

Why Cardinality matters

Query speed: High‑cardinality columns increase index size and slow performance.
Data quality: Faulty relationship cardinality creates duplicates and causes double counting.
Clear visuals: High‑cardinality dimensions crowd charts. Low‑cardinality dimensions make trends more obvious.
Storage and compression: Some databases compress or index low‑cardinality columns more efficiently.

Cardinality - In practice

High cardinality examples: “user_id”, “email”, “session_id”, “invoice_id”.
Low cardinality examples: “status” {open, closed}, “plan” {free, pro, enterprise}, “is_active” {true, false}.
Recommended actions:

Use top N, binning, or an “Other” bucket when charting high‑cardinality fields.
Use approximate distinct functions for very large datasets when exact counts are expensive and even unusable (you are trading perfect accuracy for speed and performance).
Validate relationship cardinality after joins by counting duplicates of the primary key.

Cardinality and PowerMetrics

In PowerMetrics:

When creating or exploring a metric, pick dimensions with manageable cardinality for default views. Save high‑cardinality fields for filters (prevent users from applying segmentation).
Use top N and grouping to keep charts readable (especially helpful when a dimension contains thousands of unique values).
In data models, confirm the join keys form the expected one‑to‑many pattern. If not, adjust the model or add a bridge to avoid double counting.
Track distinct counts over time for keys like “user_id” to catch unexpected spikes.

Related terms

Member

A member, in the context of data, is a specific, unique value within a dimension that represents an individual entity, category, or attribute. Think of a member as an item in a list—like “Q1 2025” in a list of time dimensions or “Blue T-Shirt” in a list of product dimensions.

Data Warehouse

A data warehouse is a centralized repository that stores and organizes structured data from multiple sources. Optimized for reporting and analysis, warehouses give businesses a unified view of their historical and current data.

Data Visualization

Data visualization is the representation of data as charts, diagrams, pictures, or tables. When information is presented visually, it’s easier to see patterns and quickly spot outliers. Data visualizations are also perfect for performing comparison and forecast analyses.

Make metric analysis easy for everyone.Get Started Now