📢 The schedule is out 🗓 for POSETTE: An Event for Postgres 2025!
📢 The schedule is out 🗓 for POSETTE: An Event for Postgres 2025!
Written by Ozgun Erdogan
December 1, 2017
So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.
If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:
For all these new features in Citus 7.1, you can try them today on Citus Cloud, get started locally with our Citus open source downloads, or keep right on reading to learn more about all the new Citus things.
The new Citus 7.1 distributed transactions feature is so important to our customers that we dedicated an entire blog post to it. As well as a blog on distributed deadlocks, an important pre-requisite feature we had to deliver in Citus 7.0, before tackling the distributed transaction challenge in Citus 7.1.
We already covered the new zero-downtime shard rebalancer in the previous Citus Cloud 2 post, so we won’t dig in too much here. Just in case you missed the Citus Cloud 2 launch, bottom line, the Citus Cloud shard rebalancer now offers a fully online operation, with zero-downtime. To implement the zero-downtime shard rebalancer, we extended Postgres 10 logical replication support. Oh, and to make it easy to observe progress during the rebalancer operation, we created a new UI, which we hope you find useful:
So today we’re excited to announce Citus Cloud 2 w/ sweet new features for our cloud database | Worry-free Postgres https://t.co/vawtGRkmi9 pic.twitter.com/meFjnDT3kt
— Citus Data (@citusdata) November 16, 2017
In SQL, a window function performs a computation across a set of table rows that are somehow related to the current row. This is comparable to the type of computation that can be done with an aggregate function (sum, count, or avg). However, window functions don’t cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities.
You can read more about window functions or watch Bruce Momjian’s excellent talk on this topic.
In Citus, our customers leverage window functions to analyze their users’ actions over time. For example, a funnel query could ask: Show me users who visited web pages on a website, where the time interval between visits was less than half an hour.
SELECT
*
FROM
(
SELECT
user_id,
time,
lag(time, 1) over w1 as prev_time
FROM
web_page_visits
GROUP BY
user_id, webpage, time
WINDOW w1 as (PARTITION BY user_id, webpage ORDER BY time)
) a
WHERE
time-prev_time < INTERVAL '30 minutes'
ORDER BY 1;
Citus previously supported window functions when the computation targeted a single tenant. Citus 7.1 extends its window function support to include computations that span across tenants or users, where the window function’s PARTITION BY
clause includes the distribution column.
The DISTINCT
clause removes all duplicate rows from the result set and keeps one row for each group of duplicates. The DISTINCT ON
clause keeps only the first row of each set of rows where the given expressions evaluate to equal. You can read more about DISTINCT
clauses in PostgreSQL’s documentation.
In Citus, our users run DISTINCT
and DISTINCT ON
clauses when creating roll-up tables, removing duplicates, or finding the first or last event associated with an entity. Citus 7.1 adds support for most queries that include distinct clauses. In particular, the following syntax is now supported:
SELECT DISTINCT column_1, column_2, … FROM [distributed table | reference table];
SELECT DISTINCT ON (distribution_column), other_columns FROM [distributed table];
SELECT DISTINCT ON column_1, column_2, … FROM [distributed table | reference table]
;SELECT DISTINCT [subquery_target_columns] FROM (subquery);
Another frequent action is counting the number of unique occurrences of a column or expression in a data set. Citus 7.1 provides extends its support for exact count(distinct) clauses on distributed and reference tables, where the distinct expression doesn’t need to include the distribution column.
Citus 7.1 expands SQL support for subqueries that include reference tables. Previously, subqueries were supported when you join distributed tables with reference tables on the distribution column. Citus 7.1 now provides support for join clauses on any column or expression.
Also, Citus 7.1 introduces performance optimizations for SELECT
queries that include ORDER BY
and LIMIT
clause on the distribution column. In particular, queries that have the following structure now run much faster in Citus 7.1:
SELECT l_orderkey, count(DISTINCT l_partkey)
FROM lineitem
GROUP BY l_orderkey
ORDER BY 2 DESC LIMIT 10;
We ship new releases of Citus roughly every three months. This means every three months there are new awesome Citus features you may want to take advantage of. In Citus 7.1, we’ve added a feature that, unless you opt out, will automatically notify you of recent product updates, with messages like this in your database logs: a new minor release of Citus (X.Y.Z) is available
. This new 7.1 feature also sends anonymized database cluster information to our product team so they can keep making the product better. The anonymized data is extremely useful in understanding cluster sizes and usage patterns and scaling patterns, so that we can continue to prioritize the features you need the most in our gridding exercises.
The team at Citus is super excited to see Citus 7.1 out in the wild, helping developers like you to never have to worry about scaling again. (Or at least, to worry a heck of a lot less.) As always, Citus is available for open source downloads, as enterprise on-prem software, and via Citus Cloud, our fully-managed database as a service. If you give Citus a spin. we’d love your feedback. And if you have feature requests, we’re all ears. Feel free to join the conversation in our slack channel.