Cardinality

Cardinality describes how unique the values in a column are. It also plays a role in defining how tables relate to each other. A high-cardinality column contains many unique values, while a low-cardinality column contains few unique values.

In depth

Think of a single column in a table. Cardinality equals its distinct value count. For example, email, UUID, and invoice ID usually have a unique value for almost every row (high cardinality). In comparison, status, plan tier, and a true/false flag have only a handful of options (low cardinality).

Cardinality also describes relationships between tables:

  • One to one: Each employee has one badge ID.
  • One to many: One customer, many orders.
  • Many to many: Products and tags linked through a bridge table.

Many analytics tasks depend on these patterns. Grouping by a high‑cardinality column can clutter charts and slow queries. Grouping by a low‑cardinality column makes for cleaner comparisons. Join logic depends on relationship cardinality to prevent duplicate rows and inflated totals.

A simple way to compare columns is the cardinality ratio:

cardinality ratio = distinct_count / row_count

As a rule of thumb, ratios above 0.8 signal high cardinality. Ratios below 0.1 signal low cardinality.

Pro tip

Before you join tables or publish a metric, check distinct counts on the join keys. If a supposed one‑to‑many key shows many‑to‑many, fix it with a bridge table or a different grain.

Why Cardinality matters

  • Query speed: High‑cardinality columns increase index size and slow performance.
  • Data quality: Faulty relationship cardinality creates duplicates and causes double counting.
  • Clear visuals: High‑cardinality dimensions crowd charts. Low‑cardinality dimensions make trends more obvious.
  • Storage and compression: Some databases compress or index low‑cardinality columns more efficiently.

Cardinality - In practice

  • High cardinality examples: “user_id”, “email”, “session_id”, “invoice_id”.
  • Low cardinality examples: “status” {open, closed}, “plan” {free, pro, enterprise}, “is_active” {true, false}.
  • Recommended actions:
    • Use top N, binning, or an “Other” bucket when charting high‑cardinality fields.
    • Use approximate distinct functions for very large datasets when exact counts are expensive and even unusable (you are trading perfect accuracy for speed and performance).
    • Validate relationship cardinality after joins by counting duplicates of the primary key.

Cardinality and PowerMetrics

In PowerMetrics:

  • When creating or exploring a metric, pick dimensions with manageable cardinality for default views. Save high‑cardinality fields for filters (prevent users from applying segmentation).
  • Use top N and grouping to keep charts readable (especially helpful when a dimension contains thousands of unique values).
  • In data models, confirm the join keys form the expected one‑to‑many pattern. If not, adjust the model or add a bridge to avoid double counting.
  • Track distinct counts over time for keys like “user_id” to catch unexpected spikes.

Related terms