Counting rows in big tables is known to be slow in PostgreSQL. The MVCC model requires a full count of live rows for a precise number. There are workarounds to speed this up dramatically if the count does not have to be exact like it seems to be in your case.
(Remember that even an "exact" count is potentially dead on arrival!)
Exact count (slow for big tables):
SELECT count(*) AS exact_count FROM myschema.mytable;
Estimate (extremely fast):
SELECT reltuples AS estimate FROM pg_class where relname = 'mytable';
Typically, the estimate is very close. How close, depends on whether ANALYZE
or VACUUM
are run enough - where "enough" is defined by the level of write activity to your table.
Safer
The above ignores the possibility of multiple tables with the same name in one database - in different schemas. To account for that:
SELECT c.reltuples::bigint AS estimate
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = 'mytable'
AND n.nspname = 'myschema';
The cast to bigint
formats the real
number nicely, especially for big counts.
Better
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE oid = 'myschema.mytable'::regclass;
Faster, simpler, safer, more elegant. See the manual on Object Identifier Types.
Replace 'myschema.mytable'::regclass
with to_regclass('myschema.mytable')
in Postgres 9.4+ to get nothing instead of an exception for invalid table names. See:
Better yet (for very little added cost)
We can do what the Postgres planner does. Quoting the Row Estimation Examples in the manual:
These numbers are current as of the last VACUUM
or ANALYZE
on the
table. The planner then fetches the actual current number of pages in
the table (this is a cheap operation, not requiring a table scan). If
that is different from relpages
then reltuples
is scaled
accordingly to arrive at a current number-of-rows estimate.
So:
SELECT ((reltuples / relpages)
* (pg_relation_size('myschema.mytable') / current_setting('block_size')::int)
)::bigint
FROM pg_class
WHERE oid = 'myschema.mytable'::regclass;
Typically, we can replace current_setting('block_size')::int
with just 8192
to make it slightly faster. But the query only takes ~ 1 ms as it is.
More Web resources:
SELECT 100 * count(*) AS estimate FROM mytable TABLESAMPLE SYSTEM (1);
Like @a_horse commented, the added clause for the SELECT
command can be useful if statistics in pg_class
are not current enough for some reason. For example:
- No
autovacuum
running.
- Immediately after a large
INSERT
/ UPDATE
/ DELETE
.
TEMPORARY
tables (which are not covered by autovacuum
).
This only looks at a random n % (1
in the example) selection of blocks and counts rows in it. A bigger sample increases the cost and reduces the error, your pick. Accuracy depends on more factors:
- Distribution of row size. If a given block happens to hold wider than usual rows, the count is lower than usual etc.
- Dead tuples or a
FILLFACTOR
occupy space per block. If unevenly distributed across the table, the estimate may be off.
- General rounding errors.
Typically, the estimate from pg_class
will be faster and more accurate.
Answer to actual question
First, I need to know the number of rows in that table, if the total
count is greater than some predefined constant,
And whether it ...
... is possible at the moment the count pass my constant value, it will
stop the counting (and not wait to finish the counting to inform the
row count is greater).
Yes. You can use a subquery with LIMIT
:
SELECT count(*) FROM (SELECT 1 FROM token LIMIT 500000) t;
Postgres actually stops counting beyond the given limit, you get an exact and current count for up to n rows (500000 in the example), and n otherwise. Not nearly as fast as the estimate in pg_class
, though.