Citus Blog

Articles tagged: Citus

Ozgun Erdogan

Citus' Replication Model: Today and Tomorrow

Written byBy Ozgun Erdogan | December 15, 2016Dec 15, 2016

Citus is a distributed database that extends (not forks) PostgreSQL. Citus does this by transparently sharding database tables across the cluster and replicating those shards.

After open sourcing Citus, one question that we frequently heard from users related to how Citus replicated data and automated node failovers. In this blog post, we intend to cover the two replication models available in Citus: statement-based and streaming replication. We also plan to describe how these models evolved over time for different use cases.

Keep reading
Marco Slot

Real-time event aggregation at scale using Postgres w/ Citus

Written byBy Marco Slot | November 29, 2016Nov 29, 2016

Citus is commonly used to scale out event data pipelines on top of PostgreSQL. Its ability to transparently shard data and parallelise queries over many machines makes it possible to have real-time responsiveness even with terabytes of data. Users with very high data volumes often store pre-aggregated data to avoid the cost of processing raw data at run-time. With Citus 6.0 this type of workflow became even easier using a new feature that enables pre-aggregation inside the database in a massively parallel fashion using standard SQL. For large datasets, querying pre-computed aggregation tables can be orders of magnitude faster than querying the facts table on demand.

Keep reading

Citus 6.0 allows you to scale out your transactional relational database with minimal changes to your application, thus reducing complexity over other alternatives while still allowing scale. If you're building a multi-tenant application and outgrow a single node Postgres, by sharding based on tenant with Citus 6.0 you can linearly add more memory and processing power to your database without a large re-architecting of your application. You can still maintain referential integrity, and to your application it's still just standard Postgres.

Keep reading
Eren Basak

How Distributed Outer Joins on PostgreSQL with Citus Work

Written byBy Eren Basak | October 10, 2016Oct 10, 2016

SQL is a very powerful language for analyzing and reporting against data. At the core of SQL is the idea of joins and how you combine various tables together. One such type of join: outer joins are useful when we need to retain rows, even if it has no match on the other side.

And while the most common type of join, inner join, against tables A and B would bring only the tuples that have a match for both A and B, outer joins give us the ability to bring together from say all of table A even if they don’t have a corresponding match in table B. For example, let's say you keep customers in one table and purchases in another table. When you want to see all purchases of customers, you may want to see all customers in the result even if they did not do any purchases yet. Then, you need an outer join. Within this post we’ll analyze a bit on what outer joins are, and then how we support them in a distributed fashion on Citus.

Keep reading
Ozgun Erdogan

Designing your SaaS Database for Scale with Postgres

Written byBy Ozgun Erdogan | October 3, 2016Oct 3, 2016

If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants / customers / accounts and your database tables capture this natural relation.

With smaller amounts of data (10s of GB), it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant database across dozens or hundreds of machines.

After our blog post on sharding a multi-tenant app with Postgres, we received a number of questions on architectural patterns for multi-tenant databases and when to use which. At a high level, developers have three options:

Keep reading
Craig Kerstiens

Announcing Citus 5.2

Written byBy Craig Kerstiens | August 19, 2016Aug 19, 2016

For years we've been focused on making Citus the best solution for scaling out your database. We've seen customers attain up to 100x performance when compared on the same hardware to vanilla Postgres. Of course you don't always need to scale out to get good performance–if you have 10 GB of data a single node Postgres can work great. But at data sizes of 100 GB and up, the need to scale out may exist.

Today, with the release of Citus 5.2, it's now easier to get started earlier so you don't have to worry about when that moment comes where you won't be able to scale up further.

Keep reading
Craig Kerstiens

Sharding a multi-tenant app with Postgres

Written byBy Craig Kerstiens | August 10, 2016Aug 10, 2016

Whether you’re building marketing analytics, a portal for e-commerce sites, or an application to cater to schools, if you’re building an application and your customer is another business then a multi-tenant approach is the norm. The same code runs for all customers, but each customer sees their own private data set, except in some cases of holistic internal reporting.

Early in your application’s life customer data has a simple structure which evolves organically. Typically all information relates to a central customer/user/tenant table. With a smaller amount of data (10’s of GB) it’s easy to scale the application by throwing more hardware at it, but what happens when you’ve had enough success and data that you have no longer fits in memory on a single box, or you need more concurrency? You scale out, often painfully.

Keep reading

If you're looking at Citus its likely you've outgrown a single node database. In most cases your application is no longer performing as you’d like. In cases where your data is still under 100 GB a single Postgres instance will still work well for you, and is a great choice. At levels beyond that Citus can help, but how you model your data has a major impact on how much performance you're able to get out of the system.

Keep reading

Page 12 of 13