Citus 14.0 is out! Now with PG18 Support. Read all about it in Mehmet’s 14.0 blog post. 💥
Citus 14.0 is out! Now with PG18 Support. Read all about it in Mehmet’s 14.0 blog post. 💥
In January 2019, Microsoft acquired Citus Data. For more details on the exciting news, please visit the announcements on the Official Microsoft Blog and the Citus Data Blog.
Citus extends PostgreSQL to support distributed SQL queries. On top of PostgreSQL, Citus comes with its own transparent sharding, replication, distributed query planner, and executor logic. Together, these features enable you to scale analytical workloads by parallelizing queries, and to scale transactional workloads by routing transactions across the cluster.
As you’ll learn in the Citus concepts section of the documentation there are two ways of sharding Citus—row-based and schema-based. In both sharding methods Citus divides Postgres tables into multiple smaller tables, called shards. The shards are then spread across the nodes in the Citus database cluster.
In the case of row-based sharding you decide how tables are split using the create_distributed_table() function.
SELECT create_distributed_table(
'table_name',
'distribution_column');
With schema-based sharding, the schema name acts as the grouping and you determine which schemas are distributed with citus_schema_distribute("name") function.
SELECT citus_schema_distribute('name');
When new data is ingested or when queries come in, the Citus coordinator routes them to the correct shards based on the value of the distribution column (row-based) or the schema name (schema-based) depending on which sharding method you chose.
| Citus Version | Compatible with PostgreSQL |
|---|---|
| 5.2 | 9.5 only |
| 6.x | 9.5, 9.6 |
| 7.x | 9.6, 10 |
| 8.x | 10, 11 |
| 9.0-9.4 | 11, 12 |
| 9.5 | 11, 12, 13 |
| 10.0.x | 11, 12, 13 |
| 10.1.x | 12, 13 |
| 10.2.x | 12, 13, 14 |
| 11.0.x | 13, 14 |
| 11.1.x, 11.2.x, 11.3.x | 13, 14, 15 |
| 12.0 | 14, 15 |
| 12.1 | 14, 15, 16 |
| 13.0.x, 13.1.x, 13.2.x | 15, 16, 17 |
| 14.0 | 16, 17, 18 |
Since Citus provides distributed functionality by extending PostgreSQL, it uses the standard PostgreSQL SQL constructs. Citus provides full SQL support for queries which access a single node in the database cluster. These queries are common, for instance, in multi-tenant SaaS applications where different nodes store different tenants (see When to Use Citus). When a query spans across shards, there are a few limitations which you can typically work around using other PostgreSQL functionality.
Since Citus is based on PostgreSQL, you can directly use PostgreSQL extensions such as HyperLogLog, TopN, or PostGIS with it. When using Citus with other Postgres extensions, you will first need to create the Citus extension on your PostgreSQL instance and then create the other extensions you want to use. Citus will work with tools that use standard PostgreSQL drivers such as Tableau through regular ODBC/JDBC drivers.
In general, you can use standard PostgreSQL drivers and language bindings with Citus, which means almost any language is supported. You can view a list of supported drivers and interfaces for PostgreSQL here.
The Citus extension to Postgres is commonly used with customer-facing applications that are growing fast, have demanding performance requirements, are starting to experience slow queries, need to plan for future scale—or all of the above. Common use cases for Citus—both self-hosted and in the cloud—include:
There are several ways in which Citus is different than other analytics databases.
Citus achieves order-of-magnitude faster execution compared to vanilla PostgreSQL through a combination of parallelism, keeping more data in memory, higher I/O bandwidth, and a simultaneous utilization of multiple cores available in your Citus database cluster.
Citus enables real-time interaction with large datasets that span billions of records—and is a good fit for customer-facing workloads that often require low-latency response times. Performance increases as you add nodes to a Citus database cluster. This 15-min performance demo from SIGMOD shows how Citus speeds up Postgres, using the HammerDB benchmark. Recently GigaOm published a benchmark performance report for Citus. Find out why benchmarking databases is so hard in this blog post by the lead engineer for Citus.
Patroni is one of the most popular high availability (HA) solutions amongst Postgres open source users. As of Citus 11.2 and Patroni 3.0 there is now an integration between Citus and Patroni that enables fully declarative clustering with high availability and automatic failover.
Citus implements transparent sharding at the database layer—so if you use Citus, you do not need to manually shard your application, and you do not need to re-architect your application in order to scale out. You can read more about the Citus architecture and sharding semantics in our documentation.
Optimal shard count is related to the total number of cores on the workers. Citus partitions an incoming query into its fragment queries which run on individual worker shards. Hence, the degree of parallelism for each query is governed by the number of shards the query hits. To ensure maximum parallelism, you should create enough shards on each node such that there is at least one shard per CPU core.
The easiest way to start is by utilizing schema-based sharding, which assumes assigning each tenant to a separate schema. Citus then automatically distributes the schemas among the nodes in your cluster and routes queries accordingly. The only change you will need to do in your application is to SET search_path when switching tenants. In some cases like with microservices, even that change may not be necessary if every microservice uses a separate user matching their schema name.
If you want the best performance, row-based sharding using a distribution column is the best approach, but that sometimes requires adjusting the schema and queries for optimal performance.
Since Citus is deployed as a Postgres extension, Postgres users can often start using Citus by simply installing the extension on their existing database. Once the extension is created, you can create and use distributed tables through standard Postgres interfaces while maintaining compatibility with existing Postgres tools. For more information, see our Migrating to Citus guide.
The Citus server is licensed under the GNU Affero General Public License v3.0. For additional details, including answers to common questions about the AGPL, see the FAQ from the Free Software Foundation. The client drivers are licensed under the PostgreSQL license.
With this licensing structure, we looked to accomplish the following objectives:
With a significant volume of database software delivered today as a hosted service vs. distributed in binary form, GNU AGPL became the most effective license to fulfill all of the above.
Having the client drivers under the PostgreSQL license removes any ambiguity as to the extent of the server license.
This site uses cookies for analytics, personalized content and ads. By continuing to browse this site, you agree to this use. Learn more.