Citus BlogCitus Blog

Thoughts about the Citus database—as well as PostgreSQL, sharding, distributed databases, and other open source extensions to Postgres.

Craig Kerstiens

A tour of Postgres Index Types

Written byBy Craig Kerstiens | October 17, 2017Oct 17, 2017

At Citus we spend a lot of time working with customers on data modeling, optimizing queries, and adding indexes to make things snappy. My goal is to be as available for our customers as we need to be, in order to make you successful. Part of that is keeping your Citus cluster well tuned and performant which we take care of for you. Another part is helping you with everything you need to know about Postgres and Citus. After all a healthy and performant database means a fast performing app and who wouldn’t want that. Today we’re going to condense some of the information we’ve shared directly with customers about Postgres indexes.

Postgres has a number of index types, and with each new release seems to come with another new index type. Each of these indexes can be useful, but which one to use depends on 1. the data type and then sometimes 2. the underlying data within the table, and 3. the types of lookups performed. In what follows we'll look at a quick survey of the index types available to you in Postgres and when you should leverage each. Before we dig in, here’s a quick glimpse of the indexes we’ll walk you through:

  • B-Tree
  • Generalized Inverted Index (GIN)
  • Generalized Inverted Seach Tree (GiST)
  • Space partitioned GiST (SP-GiST)
  • Block Range Indexes (BRIN)
  • Hash

Now onto the indexing

Keep reading
Craig Kerstiens

Indexing all the things in Postgres

Written byBy Craig Kerstiens | October 11, 2017Oct 11, 2017

Postgres indexes make your application fast. And while one option is to analyze each of your relational database queries with pg_stat_statements to see where you should add indexes... an alternative fix (and a quick one at that) could be to add indexes to each and every database table—and every column—within your database. To make this easy for you, here's a query you can run that will create the CREATE INDEX commands for every table and column in your Postgres database.

Disclaimer: Adding an index to every column of every table is not a best practice for production databases. And it’s certainly not something we recommend at Citus Data, where we take scaling out Postgres and database performance very seriously. But indexing all the things is a fun “what if” thought exercise that is well worth thinking about, to understand the value of indexes in Postgres. What if you indexed all the things?

With that disclaimer out of the way, onto my advice of how to index all the things as a way to understand your database’s performance.

Keep reading
Samay Sharma

Fermi Estimates On Postgres Performance

Written byBy Samay Sharma | September 29, 2017Sep 29, 2017

A lot of people look to Citus for a solution that scales out their Postgres database, whether on-prem or as open source or in the cloud, as a fully-managed database as a service. And yet, a common question even before looking at Citus is: "what kind of performance can I get with Postgres?" The answer is: it depends. The performance you can expect from single node Postgres comes down to your workload, both on inserts and on the query side and how large that single node is. Unfortunately, "it depends" often leaves people a bit dissatisfied.

Fortunately, there are some fermi estimates, or in laymans terms ballpark, of what performance single node Postgres can deliver. These ballparks apply both to single-node Postgres, but from there you can start to get estimates of how much further you can go when scaling out with Citus. Let's walk through a simplified guide for what you should expect in terms of the read performance and ingest performance for queries in Postgres.

Keep reading
Marco Slot

Podyn: DynamoDB to PostgreSQL replication and migration tool

Written byBy Marco Slot | September 22, 2017Sep 22, 2017

Wouldn’t it be great if you could run SQL queries on your data in DynamoDB? While this isn’t possible directly, there is an alternative: With Podyn, you can automatically replicate the schema, data, and changes in your DynamoDB tables to Postgres. Once your data is flowing into Postgres, you can start using a wide array of features including views, indexes, rollup tables, and advanced SQL queries. And if you chose Dynamo because you needed scale beyond what you thought a single Postgres instance could provide, you can replicate it into Citus.

Keep reading
Craig Kerstiens

Migrating from single-node Postgres to Citus

Written byBy Craig Kerstiens | September 20, 2017Sep 20, 2017

There are a lot of things that are everyday occurrences for engineering teams. Deploying new code, deploying a new service, it's even fairly common to deploy a net new data store or language. But migrating from one database to another is far more rare. While migrating your database can seem like a daunting task, there are lessons you can learn from others—and steps you can take to minimize risk in migrating from one database to another.

At Citus Data, we've helped many a customer migrate from single node Postgres, like RDS or Heroku Postgres, to a distributed Citus database cluster, so they can scale out and take advantage of the compute, memory, and disk resources of a distributed, scale-out solution. So we've been privy to some valuable lessons learned, and we’ve developed some best practices. Here you can find your guide for steps to follow as you start to create your migration plan to Citus.

Keep reading
Craig Kerstiens

How Citus works (a look at dynamic executors)

Written byBy Craig Kerstiens | September 15, 2017Sep 15, 2017

In the beginning there was Postgres

We love Postgres at Citus. And rather than create a newfangled database from scratch, we implemented Citus as an extension to Postgres. We've talked a lot on our blog here about you can leverage Citus, about key use cases, and different data models and sharding approaches. But we haven’t spent a lot of time explaining how Citus works. So if you want to dive deeper into how Citus works, here we're going to walk through how Citus shards the data all the way through to how the executors run queries.

Distributing data within Citus

Citus gets its benefits from sharding your data which allows us to split the data across multiple physical nodes. When your tables are significantly smaller due to sharding your indexes are smaller, vacuum runs faster, everything works like it did when your database was smaller and easier to manage.

Keep reading

"Thirty years ago, my older brother was trying to get a report on birds written that he'd had three months to write. It was due the next day.

We were out at our family cabin in Bolinas, and he was at the kitchen table close to tears, surrounded by binder paper and pencils and unopened books on birds, immobilized by the hugeness of the task ahead. Then my father sat down beside him, put his arm around my brother's shoulder, and said, 'Bird by bird, buddy. Just take it bird by bird.'"

Bird by Bird: Some Instructions on Writing and Life, by Anne LaMott

When we started working on Citus, our vision was to combine the power of relational databases with the elastic scale of NoSQL. To do this, we took a different approach. Instead of building a new database from scratch, we leveraged PostgreSQL’s new extension APIs. This way, Citus would make Postgres a distributed database and integrate with the rich ecosystem of tools you already use.

When PostgreSQL is involved, executing on this vision isn’t a simple task. The PostgreSQL manual offers 3,558 pages of features built over two decades. The tools built around Postgres use and combine these features in unimaginable ways.

After our Citus open source announcement, we talked to many of you about scaling out your relational database. In every conversation, we’d hear about different Postgres features that needed to scale out of the box. We’d take notes from our meeting and add these features into an internal document. The list would keep getting longer, and longer, and longer.

Like the child writing a report on birds, the task ahead felt insurmountable. So how do you take a solid relational database and make sure that all those complex features scale? You take it bird by bird. We broke down the problem of scaling into five hundred smaller ones and started implementing these features one by one.

Keep reading
Marco Slot

Databases and Distributed Deadlocks: A FAQ

Written byBy Marco Slot | August 31, 2017Aug 31, 2017

Since Citus is a distributed database, we often hear questions about distributed transactions. Specifically, people ask us about transactions that modify data living on different machines.

So we started to work on distributed transactions. We then identified distributed deadlock detection as the building block to enable distributed transactions in Citus.

First some background: At Citus we focus on scaling out Postgres. We want to make Postgres performance & Postgres scale something you never have to worry about. We even have a cloud offering, a fully-managed database as a service, to make Citus even more worry-free. We carry the pager so you don’t have to and all that. And because we've built Citus using the PostgreSQL extension APIs, Citus stays in sync with all the latest Postgres innovations as they are released (aka we are not a fork.) Yes, we're excited for Postgres 10 like all the rest of you :)

Back to distributed deadlocks: As we began working on distributed deadlock detection, we realized that we needed to clarify certain concepts. So we created a simple FAQ for the Citus development team. And we found ourselves referring back to the FAQ over and over again. So we decided to share it here on our blog, in the hopes you find it useful.

Keep reading
Craig Kerstiens

Five sharding data models and which is right

Written byBy Craig Kerstiens | August 28, 2017Aug 28, 2017

When it comes to scaling your database, there are challenges but the good news is that you have options. The easiest option of course is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices.

Deleting portions of your data is simple, if you can afford to do it. Regarding sharding there are a number of approaches and which one is right depends on a number of factors. Here we'll review a survey of five sharding approaches and dig into what factors guide you to each approach.

Keep reading

A key part of running a reliable database service is ensuring you have a good plan for disaster recovery. Disaster recovery comes into play when disks or instances fail, and you need to be able to recover your data. In those type of cases logical backups, via pg_dump, may be days old and in such cases not ideal for you to restore from. To remove the risk of data loss, many of us turn to the Postgres WAL to keep safe.

Years ago Daniel Farina, now a principal engineer at Citus Data, authored a continuous archiving utility to make it easy for Postgres users to prepare for and recover from disasters. The tool, WAL-E, has been used to keep millions of Postgres databases safe. Today we're excited to introduce an exciting new version of this tool: WAL-G. WAL-G, the successor to WAL-E, was created by a software engineering intern here at Citus Data, Katie Li, who is an undergraduate at UC Berkeley.

Keep reading

Page 20 of 32