Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.
If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.
Written byByThomas Munro | December 12, 2020Dec 12, 2020
[UPDATE in Sep 2021]: This blog post was originally written during the PostgreSQL 14 development cycle. The feature discussed is now a candidate for PostgreSQL 15 and the text has been updated to reflect this.
As part of my work on the open source PostgreSQL team at Microsoft, I've been developing a new feature for PostgreSQL to track dependencies on collation versions, with help from co-author Julien Rouhaud and many others who have contributed ideas. It's taken a long time to build a consensus on how to tackle this thorny problem (work I began at EnterpriseDB and continued at Microsoft), and you can read about some of the details and considerations in the commit message below and the referenced discussion thread. We're not quite done with that yet. It was originally planned for PostgreSQL 14, but some unhandled complications arose so this project is back in the workshop.
commit 257836a75585934cc05ed7a80bccf8190d41e056
Author: Thomas Munro <tmunro@postgresql.org>
Date: Mon Nov 2 19:50:45 2020 +1300
Track collation versions for indexes.
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index. When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.
Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
In this article I'll talk about the problem we need to solve—that PostgreSQL indexes can get corrupted by changes in collations that occur naturally over time—and how the new feature will make things better in a future version of PostgreSQL. Plus, you’ll get a bit of background on collations, too.
If you've built your application on Postgres, you already know why so many people love Postgres.
And if you're new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.
But what to do if your Postgres database gets very large?
Written byByMarco Slot | October 31, 2020Oct 31, 2020
One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.
Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:
This post details the improvements I recently contributed to Postgres 14 (to be released Q3 of 2021), significantly reducing the identified snapshot scalability bottleneck.
As the explanation of the implementation details is fairly long, I thought it'd be more fun for of you if I start with the results of the work, instead of the technical details (I'm cheating, I know ;)).
Postgres is an amazing RDBMS implementation. Postgres is open source and it's one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you.
In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4.
pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it's easy to try out. Let's walk through what's new in pg_auto_failover, and let's explore the new capabilities you can take advantage of.
Written byByAndres Freund | October 8, 2020Oct 8, 2020
One common challenge with Postgres for those of you who manage busy Postgres databases, and those of you who foresee being in that situation, is that Postgres does not handle large numbers of connections particularly well.
While it is possible to have a few thousand established connections without running into problems, there are some real and hard-to-avoid problems.
Since joining Microsoft last year in the Azure Database for PostgreSQL team—where I work on open source Postgres—I have spent a lot of time analyzing and addressing some of the issues with connection scalability in Postgres.
As part of the Citus team (Citus scales out Postgres horizontally, but that’s not all we work on), I've been working on pg_auto_failover for quite some time now and I'm excited that we have now introduced pg_auto_failover as Open Source, to give you automated failover and high availability!
When designing pg_auto_failover, our goal was this: to provide an easy to set up Business Continuity solution for Postgres that implements fault tolerance of any one node in the system. The documentation chapter about the pg_auto_failover architecture includes the following:
It is important to understand that pg_auto_failover is optimized for Business Continuity. In the event of losing a single node, then pg_auto_failover is capable of continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL Synchronous Replication.
Introduction to pg_auto_failover
The pg_auto_failover solution for Postgres is meant to provide an easy to setup and reliable automated failover solution. This solution includes software driven decision making for when to implement failover in production.
Written byByOzgun Erdogan | October 24, 2018Oct 24, 2018
Today, we’re excited to announce that we have donated 1% of Citus Data’s stock to the non-profit PostgreSQL organizations in the US and Europe. The United States PostgreSQL Association (PgUS) has received this stock grant. PgUS will work with PostgreSQL Europe to support the growth, education, and future innovation of Postgres both in the US and Europe.
To our knowledge, this is the first time a company has donated 1% of its equity to support the mission of an open source foundation.
To coincide with this donation, we’re also joining the Pledge 1% movement, alongside well-known technology organizations such as Atlassian, Twilio, Box, and more.