Citus BlogCitus Blog

Thoughts about the Citus database—as well as PostgreSQL, sharding, distributed databases, and other open source extensions to Postgres.

Craig Kerstiens

PGConf EU: HyperLogLog, Eclipse, and Distributed Postgres

Written byBy Craig Kerstiens | December 11, 2017Dec 11, 2017

We're big fans of Postgres and enjoy getting around to the various community conferences to give talks on relevant topics as well as learn from others. A few months ago we had a good number of Citus team members over at the largest Postgres conference in Europe. Additionally, three of our Citus team members gave talks at the conference. We thought for those of you that couldn't make the conference you might still enjoy getting a glimpse of some of the content. You can browse the full set of talks that were given and slides for them on the PGConf EU website or flip through the presentations from members of the Citus team below.

Keep reading
Lukas Fittl

Citus warp: Database migrations without the pain

Written byBy Lukas Fittl | December 8, 2017Dec 8, 2017

We rolled out a new database migration feature for the Citus fully-managed database as a service—the Warp migration feature—as part of our Citus Cloud 2 announcement. Today I wanted to walk through Citus Cloud’s warp migration feature in more detail. Before we drill in, we should probably take a step back and look at what typically (and sometimes painfully) goes on for a large database migration.

Keep reading

So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.

If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:

  • Distributed transaction support
  • Zero-downtime shard rebalancer
  • Window function enhancements
  • Distinct ON/count(distinct) enhancements
  • Additional SQL enhancements
  • Checking for new software updates

Keep reading
Ozgun Erdogan

How Citus Executes Distributed Transactions on Postgres

Written byBy Ozgun Erdogan | November 22, 2017Nov 22, 2017

Distributed transactions are one of the meanest, baddest problems in relational databases. With the release of Citus 7.1, distributed transactions are now available to all our users. In this article, we are going to describe how we built distributed transaction support into Citus by using PostgreSQL modules. But first, let’s give an overview of what a distributed transaction is.

(If this sounds familiar, that’s because we first announced distributed transactions as part of last week’s Citus Cloud 2 announcement. The Citus Cloud announcement centered on other new useful capabilities —such as our warp feature to streamline migrations from single-node Postgres deployments to Citus Cloud — but it seems worthwhile to dedicate an entire post to distributed transactions.)

Keep reading
Claire Giordano

A Flywheel for SaaS Databases ft. Postgres and Citus

Written byBy Claire Giordano | November 21, 2017Nov 21, 2017

What happens to your database when your SaaS application is an overnight success? When your customer base has grown bigger than you’d always hoped but it all happened so fast and now what do you do? What happens when your SaaS application needs to scale, but your database is getting more and more sluggish?

Until recently, the conventional wisdom for SaaS startups is that you should launch your application on top of an open source relational database, like Postgres or MySQL. And that once you become an overnight success, you will need to slog your way through a period of fail whales while you re-architect your app to go to NoSQL. Or while you re-architect your app to shard at the application layer. Ick.

Keep reading

Relational databases have been a mainstay in applications for decades now. And with their dominance has come a rich set of tools: you have tools to help with monitoring, to gain insight into performance, and to operate the database in safe ways.

The knock against relational databases has long been: what happens when you need to scale? At that point, you usually have to make difficult trade-offs, like having to trade off relational semantics in order to get a database that scales out (think: NoSQL.) Or having to find a way to reduce the amount of data you need to retain, in order to continue to skate by with a single-node relational database. Or having to trade off as much as a year’s worth of application features in order to divert an engineering team away from your core business, to instead spend their time sharding the application. Bottom line: the database trade-offs to get scale can be painful.

Today I’m excited to announce Citus Cloud 2, the newest version of our cloud database. We launched the first release of Citus Cloud 18 months ago as a fully-managed database as a service that enables you to keep right on scaling your relational database. If you’re unfamiliar, Citus is an extension to Postgres that transforms your Postgres database into a distributed system under the covers, while appearing to your application as a single-node database. With Citus, you don’t need to teach your application all about distributed systems to continue scaling. We make Citus available as open source, as on-prem enterprise software, and as a fully-managed database as a service, Citus Cloud. And with Citus Cloud, you have all of your management/backups/etc. taken care of for you.

Keep reading
Sai Krishna Srirampur

Scaling out your Django Multi-tenant App

Written byBy Sai Srirampur | November 14, 2017Nov 14, 2017

There are a number of data architectures you could use when building a multi-tenant app. Some, such as using one database per customer or one schema per customer, have trade-offs when it comes to larger scale. The other option is to build the notion of tenancy directly into the logic of your SaaS application. With django-multitenant and Citus, built-in tenancy becomes much easier to put in place for your application without having to re-invent the wheel yourself.

Our django-multitenant Python library, enables easy scale out of applications that are built on top of Django and follow a multi tenant data model. This Python library has evolved from our experience working with SaaS customers, scaling out their multi-tenant apps.

Keep reading
Craig Kerstiens

Faster bulk loading in Postgres with copy

Written byBy Craig Kerstiens | November 8, 2017Nov 8, 2017

If you've used a relational database, you understand basic INSERT statements. Even if you come from a NoSQL background, you likely grok inserts. Within the Postgres world, there is a utility that is useful for fast bulk ingestion: \copy. Postgres \copy is a mechanism for you to bulk load data in or out of Postgres.

First, lets pause. Do you even need to bulk load data and what's it have to do with Citus? We see customers leverage Citus for a number of different uses. When looking at Citus for a transactional workload, say as the system of record for some kind of multi-tenant SaaS app, your app is mostly performing standard insert/updates/deletes.

But when you're leveraging Citus for real-time analytics, you may already have a separate ingest pipeline. In this case you might be looking at event data, which can be higher in volume than a standard web app workload. If you already have an ingest pipeline that reads off Apache Kafka or Kinesis, you could be a great candidate for bulk ingest.

Back to our feature presentation: Postgres \copy. Copy is interesting because you can achieve much higher throughput than with single row inserts.

Keep reading
Jason Petersen

What it means to be a Postgres extension

Written byBy Jason Petersen | October 25, 2017Oct 25, 2017

Nearly 18 months ago, we open sourced our Citus distributed database and "unforked it" from PostgreSQL by refactoring Citus into a PostgreSQL extension. Seasoned PostgreSQL users likely already know of and use popular PostgreSQL extensions, such as hstore, PostGIS, and pg_stat_statements; however, we realized some of you might appreciate a recap of our journey from fork to extension and beyond.

Keep reading
Craig Kerstiens

Monitoring your bloat in Postgres

Written byBy Craig Kerstiens | October 20, 2017Oct 20, 2017

Postgres under the covers in simplified terms is one giant append only log. When you insert a new record that gets appended, but the same happens for deletes and updates. For a delete a record is just flagged as invisible, it's not actually removed from disk immediately. For updates the old record is flagged as invisible and a new record is written. What then later happens is Postgres comes through and looks at all records that are invisible and actually frees up the disk storage. This process is known as vacuum.

There are a couple of key levels to VACUUM within Postgres:

  • VACUUM ANALYZE - This one is commonly run when you've recently loaded data into your database and want Postgres to update it's statistics about the data
  • VACUUM FULL - This will take a lock during the operation, but will scan the full table and reclaim all the space it can from dead tuples.
Keep reading

Page 19 of 32