Blog posts tagged with 'open source' on the Citus Blog - Page 7

Citus 10 brings columnar compression to Postgres

Written byBy Jeff Davis | March 6, 2021Mar 6, 2021

Citus 10 is out! Check out the Citus 10 blog post for all the details. Citus is an open source extension to Postgres (not a fork) that enables scale-out, but offers other great features, too. See the Citus docs and the Citus github repo and README.

This post will highlight Citus Columnar, one of the big new features in Citus 10. You can also take a look at the columnar documentation. Citus Columnar can be used with or without the scale-out features of Citus.

Postgres typically stores data using the heap access method, which is row-based storage. Row-based tables are good for transactional workloads, but can cause excessive IO for some analytic queries.

Columnar storage is a new way to store data in a Postgres table. Columnar groups data together by column instead of by row; and compresses the data, too. Arranging data by column tends to compress well, and it also means that queries can skip over columns they don’t need. Columnar dramatically reduces the IO needed to answer a typical analytic query—often by 10X!

Keep reading

Citus 10: Columnar for Postgres, rebalancer, single-node, & more

Written byBy Marco Slot | March 5, 2021Mar 5, 2021

Development on Citus first started around a decade ago and once a year we release a major new Citus open source version. We wanted to make number 10 something special, but I could not have imagined how truly spectacular this release would become. Citus 10 extends Postgres (12 and 13) with many new superpowers:

Columnar storage for Postgres: Compress your PostgreSQL and Citus tables to reduce storage cost and speed up your analytical queries.
Sharding on a single Citus node: Make your single-node Postgres server ready to scale out by sharding tables locally using Citus.
Shard rebalancer in Citus open source: We have open sourced the shard rebalancer so you can easily add Citus nodes and rebalance your cluster.
Joins and foreign keys between local PostgreSQL tables and Citus tables: Mix and match PostgreSQL and Citus tables with foreign keys and joins.
Functions to change the way your tables are distributed: Redistribute your tables in a single step using new alter table functions.
Much more: Better naming, improved SQL & DDL support, simplified operations.

These new capabilities represent a fundamental shift in what Citus is and what Citus can do for you.

Keep reading

Citus Tips: How to undistribute a distributed Postgres table

Written byBy Halil Ozan Akgul | February 6, 2021Feb 6, 2021

Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.

If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.

Keep reading

Don’t let collation versions corrupt your PostgreSQL indexes

Written byBy Thomas Munro | December 12, 2020Dec 12, 2020

[UPDATE in Sep 2021]: This blog post was originally written during the PostgreSQL 14 development cycle. The feature discussed is now a candidate for PostgreSQL 15 and the text has been updated to reflect this.

As part of my work on the open source PostgreSQL team at Microsoft, I've been developing a new feature for PostgreSQL to track dependencies on collation versions, with help from co-author Julien Rouhaud and many others who have contributed ideas. It's taken a long time to build a consensus on how to tackle this thorny problem (work I began at EnterpriseDB and continued at Microsoft), and you can read about some of the details and considerations in the commit message below and the referenced discussion thread. We're not quite done with that yet. It was originally planned for PostgreSQL 14, but some unhandled complications arose so this project is back in the workshop.

commit 257836a75585934cc05ed7a80bccf8190d41e056
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Mon Nov 2 19:50:45 2020 +1300

    Track collation versions for indexes.

    Record the current version of dependent collations in pg_depend when
    creating or rebuilding an index.  When accessing the index later, warn
    that the index may be corrupted if the current version doesn't match.

    Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
    Michael Paquier, Robert Haas, Tom Lane and others for very helpful
    discussion.

    Author: Thomas Munro <thomas.munro@gmail.com>
    Author: Julien Rouhaud <rjuju123@gmail.com>
    Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
    Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com

In this article I'll talk about the problem we need to solve—that PostgreSQL indexes can get corrupted by changes in collations that occur naturally over time—and how the new feature will make things better in a future version of PostgreSQL. Plus, you’ll get a bit of background on collations, too.

Keep reading

When to use Hyperscale (Citus) to scale out Postgres

Written byBy Claire Giordano | December 5, 2020Dec 5, 2020

If you've built your application on Postgres, you already know why so many people love Postgres.

And if you're new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

Keep reading

What’s new in the Citus 9.5 extension to Postgres

Written byBy Claire Giordano | November 14, 2020Nov 14, 2020

When I gave the kickoff talk in the Postgres devroom at FOSDEM this year, one of the Q&A questions was: “what’s happening with the Citus open source extension to Postgres?” The answer is, a lot. Since FOSDEM, Marco Slot and I have blogged about how Citus 9.2 speeds up large-scale htap workloads on Postgres, the Citus 9.3 release notes, and what’s new in Citus 9.4.

Now it’s time to walk through everything new in the Citus 9.5 open source release.

Keep reading

Evolving pg_cron together: Postgres 13, audit log, background workers, & job names

Written byBy Marco Slot | October 31, 2020Oct 31, 2020

One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.

Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:

Keep reading

Improving Postgres Connection Scalability: Snapshots

Written byBy Andres Freund | October 25, 2020Oct 25, 2020

I recently analyzed the limits of connection scalability, to understand the most effective way to improve Postgres' handling of large numbers of connections, and why that is important. I concluded that the most pressing issue is snapshot scalability.

This post details the improvements I recently contributed to Postgres 14 (to be released Q3 of 2021), significantly reducing the identified snapshot scalability bottleneck.

As the explanation of the implementation details is fairly long, I thought it'd be more fun for of you if I start with the results of the work, instead of the technical details (I'm cheating, I know ;)).

Keep reading

What's new in pg_auto_failover 1.4 for Postgres high availability

Written byBy Dimitri Fontaine | October 10, 2020Oct 10, 2020

Postgres is an amazing RDBMS implementation. Postgres is open source and it's one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you.

In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4.

pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it's easy to try out. Let's walk through what's new in pg_auto_failover, and let's explore the new capabilities you can take advantage of.

Keep reading

Analyzing the Limits of Connection Scalability in Postgres

Written byBy Andres Freund | October 8, 2020Oct 8, 2020

One common challenge with Postgres for those of you who manage busy Postgres databases, and those of you who foresee being in that situation, is that Postgres does not handle large numbers of connections particularly well.

While it is possible to have a few thousand established connections without running into problems, there are some real and hard-to-avoid problems.

Since joining Microsoft last year in the Azure Database for PostgreSQL team—where I work on open source Postgres—I have spent a lot of time analyzing and addressing some of the issues with connection scalability in Postgres.

Keep reading