Citus Blog

Articles tagged: Postgres

Ozgun Erdogan

Principles of Sharding for Relational Databases

Written byBy Ozgun Erdogan | August 9, 2017Aug 9, 2017

When your database is small (10s of GB), it's easy to throw more hardware at the problem and scale up. As these tables grows however, you need to think about other ways to scale your database.

In one way, sharding is the best way to scale. Sharding enables you to linearly scale your database’s cpu, memory, and disk resources by separating your database into smaller parts. In other ways, sharding is a controversial topic. The internet is full of advice on sharding, from "essential to scaling your database infrastructure" to "why you never want to shard". So the question is, whose advice should you take?

Keep reading
Craig Kerstiens

Database Table Types with Citus and Postgres

Written byBy Craig Kerstiens | July 27, 2017Jul 27, 2017

Citus is Postgres that scales out horizontally. We do this by distributing queries across multiple Postgres servers—and as is often the case with scale-out architectures, this scale-out approach provides some great performance gains. And because Citus is an extension to Postgres, you get all the awesome features in Postgres such as support for JSONB, full-text search, PostGIS, and more.

The distributed nature of the Citus extension gives you new flexibility when it comes to modeling your data. This is good. But you’ll need to think about how to model your data and what type of database tables to use. The way you query your data ultimately determines how you can best model each table. In this post, we'll dive into the three different types of database tables in Citus and how you should think about each.

Keep reading
Craig Kerstiens

Customizing My Postgres Shell

Written byBy Craig Kerstiens | July 16, 2017Jul 16, 2017

As a developer your CLI is your home. You spend a lifetime of person-years in your shell and even small optimizations can pay major dividends to your efficiency. For anyone that works with Postgres and likely the psql editor, you should consider investing some love and care into psql. A little known fact is that psql has a number of options you can configure it with, and these configuration options can all live within an rc file called psqlrc in your home directory. Here is my .psqlrc file, which I've customized to my liking. Let’s walk through some of the commands within my .psqlrc file:

Keep reading
Burak Yucesoy

Efficient rollup tables with HyperLogLog in Postgres

Written byBy Burak Yucesoy | June 30, 2017Jun 30, 2017

HyperLogLog is an awesome approximation algorithm that addresses the distinct count problem. I am a big fan of HyperLogLog (HLL), so much so that I already wrote about the internals and how HLL solves the distributed distinct count problem. But there’s more to talk about, including HLL with rollup tables.

Rollup Tables and Postgres

Rollup tables are commonly used in Postgres when you don’t need to perform detailed analysis, but you still need to answer basic aggregation queries on older data.

With rollup tables, you can pre-aggregate your older data for the queries you still need to answer. Then you no longer need to store all of the older data, rather, you can delete the older data or roll it off to slower storage—saving space and computing power.

Let’s walk through a rollup table example in Postgres without using HLL.

Keep reading
Craig Kerstiens

Scaling Connections in Postgres

Written byBy Craig Kerstiens | May 10, 2017May 10, 2017

There are a number of applications out there that have a high number of connections to Postgres. What's high? That all depends on your application, but generally when you get to the few hundred connection area in Postgres you're in the higher end. Anything in the thousands is definitely in the high territory, and even several hundred can put strain on your application. Generally a safe level for connections should be somewhere around 300-500 connections. This may seem low if you're already running with thousands of connections, but it's likely perfectly fine with pgBouncer taking care of the heavy lifting for you. Let's drill into why a bit further.

Keep reading
Lukas Fittl

Postgres tips for Rails developers

Written byBy Lukas Fittl | April 28, 2017Apr 28, 2017

This week at RailsConf, we found ourselves sharing a lot of tips for using PostgreSQL with Rails. We thought it might be worthwhile to write up many of these and share more broadly. Here you’ll find some tips that will help you in debugging and improving performance of your database from your Rails app.

And now, on to the code.

Keep reading

Update: This post was originally published in January 2013. We updated the post in 2017 to include the appropriate SQL commands you can use with Citus. The Citus extension to Postgres is open source and available to try via download or in the cloud via the Azure Cosmos DB for PostgreSQL managed service.

PostgreSQL's Full Text Search capability is an excellent example of a powerful, relatively new feature that Citus users are able to leverage. First introduced in PostgreSQL 8.3 and continuously improved since then, Full Text Search (FTS) provides the SQL semantics necessary to run keyword searches over a corpus of documents stored in a database. FTS executes these operations efficiently by enabling the use of GIN and GiST indexes . In addition, from our standpoint, one of the best features of FTS is that Citus makes it linearly scalable. Specifically, with Citus one can double the size of the search dataset and still satisfy the same SLAs simply by doubling the number of machines in the Citus cluster.

In this blog post we wanted to demonstrate using Full Text Search with Citus, but in order to do so we first needed to find an interesting dataset to use in our examples. After hunting around we eventually hit on the idea of using email archives from the PostgreSQL mailing lists. Besides having a nicely self-reflexive quality about it, we also thought this would provide a good opportunity to learn more about the PostgreSQL community. Early in the history of PostgreSQL project, these lists became the primary communication mechanism for both developers and users, and the collected archives in turn provide a unique opportunity to get a complete view of almost everything that has happened in the project over the past fifteen years.

Keep reading

Running SELECT COUNT(DISTINCT) on your database is all too common. In applications it's typical to have some analytics dashboard highlighting the number of unique items such as unique users, unique products, unique visits. While traditional SELECT COUNT(DISTINCT) queries works well in single machine setups, it is a difficult problem to solve in distributed systems. When you have this type of query, you can't just push query to the workers and add up results, because most likely there will be overlapping records in different workers. Instead you can do:

  • Pull all distinct data to one machine and count there. (Doesn't scale)
  • Do a map/reduce. (Scales but it's very slow)

This is where approximation algorithms or sketches come in. Sketches are probabilistic algorithms which can generate approximate results efficiently within mathematically provable error bounds. There are a many of them out there, but today we're just going to focus on one, HyperLogLog or HLL. HLL is very successfull for estimating unique number of elements in a list. First we'll look some at the internals of the HLL to help us understand why HLL algorithm is useful to solve distict count problem in a scalable way, then how it can be applied in a distributed fashion. Then we will see some examples of HLL usage.

Keep reading

Citus is a distributed database that extends (not forks) PostgreSQL for large workloads. One challenge associated with building a distributed relational database (RDBMS) is that they require notable effort to deploy and operate. To remove these operational barriers, we’ve been thinking about offering Citus as a managed database for a while now.

Naturally, we were also worried that providing a native database offering on AWS could split our startup’s focus and take up significant engineering resources. (Honestly, if the founding engineers of the Heroku Postgres team didn’t join Citus, we might have decided to wait on this.) After having Citus Cloud publicly available for eight months though, we are now more bullish on the cloud then ever.

It turns out that targeting an important use case for your customers and delivering it to them in a way that removes their pain points, matters more than anything else. In this blog post, we’ll only focus on removing operational pain points and not on use cases: Why is cloud changing the way databases are delivered to customers? What AWS technologies Citus Cloud is using to enable that in a unique way?

Keep reading
Lukas Fittl

Scale Out Multi-Tenant Apps based on Ruby on Rails

Written byBy Lukas Fittl | January 5, 2017Jan 5, 2017

Today we’re happy to announce our new activerecord-multi-tenant Ruby library, which enables easy scale-out of applications that are built on top of Ruby on Rails and follow a multi-tenant data model.

This Ruby library has evolved from our experience working with customers, scaling out their multi-tenant apps, and patching some restrictions that ActiveRecord and Rails currently have when it comes to automatic query building. It is based on the excellent acts_as_tenant library, and extends it for the particular use-case of a distributed multi-tenant database like Citus.

Keep reading

Page 13 of 15