Blog posts tagged with 'Postgres' on the Citus Blog - Page 13

Introducing WAL-G by Citus: Faster Disaster Recovery for Postgres

Written byBy Craig Kerstiens | August 18, 2017Aug 18, 2017

A key part of running a reliable database service is ensuring you have a good plan for disaster recovery. Disaster recovery comes into play when disks or instances fail, and you need to be able to recover your data. In those type of cases logical backups, via pg_dump, may be days old and in such cases not ideal for you to restore from. To remove the risk of data loss, many of us turn to the Postgres WAL to keep safe.

Years ago Daniel Farina, now a principal engineer at Citus Data, authored a continuous archiving utility to make it easy for Postgres users to prepare for and recover from disasters. The tool, WAL-E, has been used to keep millions of Postgres databases safe. Today we're excited to introduce an exciting new version of this tool: WAL-G. WAL-G, the successor to WAL-E, was created by a software engineering intern here at Citus Data, Katie Li, who is an undergraduate at UC Berkeley.

Keep reading

Principles of Sharding for Relational Databases

Written byBy Ozgun Erdogan | August 9, 2017Aug 9, 2017

When your database is small (10s of GB), it's easy to throw more hardware at the problem and scale up. As these tables grows however, you need to think about other ways to scale your database.

In one way, sharding is the best way to scale. Sharding enables you to linearly scale your database’s cpu, memory, and disk resources by separating your database into smaller parts. In other ways, sharding is a controversial topic. The internet is full of advice on sharding, from "essential to scaling your database infrastructure" to "why you never want to shard". So the question is, whose advice should you take?

Keep reading

Database Table Types with Citus and Postgres

Written byBy Craig Kerstiens | July 27, 2017Jul 27, 2017

Citus is Postgres that scales out horizontally. We do this by distributing queries across multiple Postgres servers—and as is often the case with scale-out architectures, this scale-out approach provides some great performance gains. And because Citus is an extension to Postgres, you get all the awesome features in Postgres such as support for JSONB, full-text search, PostGIS, and more.

The distributed nature of the Citus extension gives you new flexibility when it comes to modeling your data. This is good. But you’ll need to think about how to model your data and what type of database tables to use. The way you query your data ultimately determines how you can best model each table. In this post, we'll dive into the three different types of database tables in Citus and how you should think about each.

Keep reading

Customizing My Postgres Shell

Written byBy Craig Kerstiens | July 16, 2017Jul 16, 2017

As a developer your CLI is your home. You spend a lifetime of person-years in your shell and even small optimizations can pay major dividends to your efficiency. For anyone that works with Postgres and likely the psql editor, you should consider investing some love and care into psql. A little known fact is that psql has a number of options you can configure it with, and these configuration options can all live within an rc file called psqlrc in your home directory. Here is my .psqlrc file, which I've customized to my liking. Let’s walk through some of the commands within my .psqlrc file:

Keep reading

Efficient rollup tables with HyperLogLog in Postgres

Written byBy Burak Yucesoy | June 30, 2017Jun 30, 2017

HyperLogLog is an awesome approximation algorithm that addresses the distinct count problem. I am a big fan of HyperLogLog (HLL), so much so that I already wrote about the internals and how HLL solves the distributed distinct count problem. But there’s more to talk about, including HLL with rollup tables.

Rollup Tables and Postgres

Rollup tables are commonly used in Postgres when you don’t need to perform detailed analysis, but you still need to answer basic aggregation queries on older data.

With rollup tables, you can pre-aggregate your older data for the queries you still need to answer. Then you no longer need to store all of the older data, rather, you can delete the older data or roll it off to slower storage—saving space and computing power.

Let’s walk through a rollup table example in Postgres without using HLL.

Keep reading

Scaling Connections in Postgres

Written byBy Craig Kerstiens | May 10, 2017May 10, 2017

There are a number of applications out there that have a high number of connections to Postgres. What's high? That all depends on your application, but generally when you get to the few hundred connection area in Postgres you're in the higher end. Anything in the thousands is definitely in the high territory, and even several hundred can put strain on your application. Generally a safe level for connections should be somewhere around 300-500 connections. This may seem low if you're already running with thousands of connections, but it's likely perfectly fine with pgBouncer taking care of the heavy lifting for you. Let's drill into why a bit further.

Keep reading

Postgres tips for Rails developers

Written byBy Lukas Fittl | April 28, 2017Apr 28, 2017

This week at RailsConf, we found ourselves sharing a lot of tips for using PostgreSQL with Rails. We thought it might be worthwhile to write up many of these and share more broadly. Here you’ll find some tips that will help you in debugging and improving performance of your database from your Rails app.

And now, on to the code.

Keep reading

Update: This post was originally published in January 2013. We updated the post in 2017 to include the appropriate SQL commands you can use with Citus. The Citus extension to Postgres is open source and available to try via download or in the cloud via the Azure Cosmos DB for PostgreSQL managed service.

PostgreSQL's Full Text Search capability is an excellent example of a powerful, relatively new feature that Citus users are able to leverage. First introduced in PostgreSQL 8.3 and continuously improved since then, Full Text Search (FTS) provides the SQL semantics necessary to run keyword searches over a corpus of documents stored in a database. FTS executes these operations efficiently by enabling the use of GIN and GiST indexes . In addition, from our standpoint, one of the best features of FTS is that Citus makes it linearly scalable. Specifically, with Citus one can double the size of the search dataset and still satisfy the same SLAs simply by doubling the number of machines in the Citus cluster.

In this blog post we wanted to demonstrate using Full Text Search with Citus, but in order to do so we first needed to find an interesting dataset to use in our examples. After hunting around we eventually hit on the idea of using email archives from the PostgreSQL mailing lists. Besides having a nicely self-reflexive quality about it, we also thought this would provide a good opportunity to learn more about the PostgreSQL community. Early in the history of PostgreSQL project, these lists became the primary communication mechanism for both developers and users, and the collected archives in turn provide a unique opportunity to get a complete view of almost everything that has happened in the project over the past fifteen years.

Keep reading

Distributed count(distinct) with HyperLogLog on Postgres

Written byBy Burak Yucesoy | April 4, 2017Apr 4, 2017

Running SELECT COUNT(DISTINCT) on your database is all too common. In applications it's typical to have some analytics dashboard highlighting the number of unique items such as unique users, unique products, unique visits. While traditional SELECT COUNT(DISTINCT) queries works well in single machine setups, it is a difficult problem to solve in distributed systems. When you have this type of query, you can't just push query to the workers and add up results, because most likely there will be overlapping records in different workers. Instead you can do:

Pull all distinct data to one machine and count there. (Doesn't scale)
Do a map/reduce. (Scales but it's very slow)

This is where approximation algorithms or sketches come in. Sketches are probabilistic algorithms which can generate approximate results efficiently within mathematically provable error bounds. There are a many of them out there, but today we're just going to focus on one, HyperLogLog or HLL. HLL is very successfull for estimating unique number of elements in a list. First we'll look some at the internals of the HLL to help us understand why HLL algorithm is useful to solve distict count problem in a scalable way, then how it can be applied in a distributed fashion. Then we will see some examples of HLL usage.

Keep reading

How to Scale PostgreSQL on AWS–Learnings from Citus Cloud

Written byBy Ozgun Erdogan | March 10, 2017Mar 10, 2017

Citus is a distributed database that extends (not forks) PostgreSQL for large workloads. One challenge associated with building a distributed relational database (RDBMS) is that they require notable effort to deploy and operate. To remove these operational barriers, we’ve been thinking about offering Citus as a managed database for a while now.

Naturally, we were also worried that providing a native database offering on AWS could split our startup’s focus and take up significant engineering resources. (Honestly, if the founding engineers of the Heroku Postgres team didn’t join Citus, we might have decided to wait on this.) After having Citus Cloud publicly available for eight months though, we are now more bullish on the cloud then ever.

It turns out that targeting an important use case for your customers and delivering it to them in a way that removes their pain points, matters more than anything else. In this blog post, we’ll only focus on removing operational pain points and not on use cases: Why is cloud changing the way databases are delivered to customers? What AWS technologies Citus Cloud is using to enable that in a unique way?

Keep reading