Citus Blog

Articles tagged: Postgres

Samay Sharma

Debugging Postgres autovacuum problems: 13 tips

Written byBy Samay Sharma | July 28, 2022Jul 28, 2022

If you've been running PostgreSQL for a while, you've heard about autovacuum. Yes, autovacuum, the thing which everybody asks you not to turn off, which is supposed to keep your database clean and reduce bloat automatically.

And yet—imagine this: one fine day, you see that your database size is larger than you expect, the I/O load on your database has increased, and things have slowed down without much change in workload. You begin looking into what might have happened. You run the excellent Postgres bloat query and you notice you have a lot of bloat. So you run the VACUUM command manually to clear the bloat in your Postgres database. Good!

But then you have to address the elephant in the room: why didn't Postgres autovacuum clean up the bloat in the first place...? Does the above story sound familiar? Well, you are not alone. 😊

Keep reading
David Rowley

Speeding up sort performance in Postgres 15

Written byBy David Rowley | May 19, 2022May 19, 2022

In recent years, PostgreSQL has seen several improvements which make sorting faster. In the PostgreSQL 15 development cycle—which ended in April 2022—Ronan Dunklau, Thomas Munro, Heikki Linnakangas, and I contributed some changes to PostgreSQL to make sorts go even faster.

Each of the improvements to sort should be available when PostgreSQL 15 is out in late 2022.

Why care about sort performance? When you run your application on PostgreSQL, there are several scenarios where PostgreSQL needs to sort records (aka rows) on your behalf. The main one is for ORDER BY queries. Sorting can also be used in:

  • Aggregate functions with an ORDER BY clause
  • GROUP BY queries
  • Queries with a plan containing a Merge Join
  • UNION queries
  • DISTINCT queries
  • Queries with window functions with a PARTITION BY and/or ORDER BY clause

If PostgreSQL is able to sort records faster, then queries using sort will run more quickly.

Keep reading

One of the good things with a virtual event like Citus Con is that you have a lot of flexibility about where and when to watch the talks. From your home office, or a café, or the beach—or even the car, while you wait to pick up your kids. As long as you have an internet connection, you’re in.

But you still need to figure out which talks and livestreams you want to watch when the event goes live on Tuesday, April 12. To help you out, we’ve created this guide to Citus Con: An Event for Postgres. And just for kicks I’m calling it the “Ultimate Guide” to CitusCon. (Ha! Since this is a first time event maybe it will be the only guide to Citus Con. Therefore definitely “ultimate”.)

In working on this event—I’m a co-chair along with Teresa Giacomini, also head of the talk selection team—I realized I had “tagged and categorized” each and every talk both in my head and on a spreadsheet. So that’s what this blog post will give you… a framework for knowing which talks are in which categories.

Of course, if you want to see the abstracts for all the talks, just pop over to the Schedule & Sessions page for Citus Con.

Keep reading

My main advice when running performance benchmarks for Postgres is: "Automate it!"

If you're measuring database performance, you are likely going to have to run the same benchmark over and over again. Either because you want a slightly different configuration, or because you realized you used some wrong settings, or maybe some other reason. By automating the way you're running performance benchmarks, you won't be too annoyed when this happens, because re-running the benchmarks will cost very little effort (it will only cost some time).

However, building this automation for the database benchmarks can be very time-consuming, too. So, in this post I'll share the tools I built to make it easy to run benchmarks against Postgres—specifically against the Citus extension to Postgres running in a managed database service on Azure called Hyperscale (Citus) in Azure Database for PostgreSQL.

Here's your map for reading this post: each anchor link takes you to a different section. The first sections explore the different types of application workloads and their characteristics, plus the off-the-shelf benchmarks that are commonly used for each. After that you can dive into the "how to" aspects of using HammerDB with Citus and Postgres on Azure. And yes, you'll see some sample benchmarking results, too.

Keep reading

When you find yourself answering the same questions again and again, it’s a good idea to blog about it. Which is why this post about Citus Con: An Event for Postgres exists: to answer your questions, and share the news about this first-ever, inaugural event.

Citus Con: An Event for Postgres is a free and virtual developer event happening in April 2022, organized by the Postgres and Citus team here at Microsoft. Speakers will come from different parts of the Postgres ecosystem, including Postgres users, Citus open source users, Azure Database for PostgreSQL customers, and developers/experts in PostgreSQL and Postgres extensions, like Citus.

The Call for Proposals (CFP) for Citus Con is open until Feb 6th. Whether this will be your 1000th conference talk or your very 1st, we’d love to see what Postgres experiences you have to share.

Keep reading

If you’ve never done it before, you might be daunted by the idea of giving a conference talk. You know: the work involved, the butterflies, how to make it a good talk and not a boring one, the people who might judge you… And perhaps the hardest bit: choosing a topic others will find interesting.

[Updated for 2025]: For the 4th year in a row, I’m the chair of the talk selection team for a free and virtual developer conference that is now called POSETTE: An Event for Postgres, formerly called Citus Con. I’ve also served on talk selection committees for PgDaySF 2020 and PGDay Chicago 2024. Wearing my talk selection team hat, as I reached out to spread the word about open CFPs such as the CFP for POSETTE, people would sometimes ask:

Why give a talk at a Postgres conference?

This post will walk you through the ways you, your team, your project—and especially the Postgres community—can benefit from a talk you give.

Keep reading
Claire Giordano

UK COVID-19 dashboard built using Postgres and Citus for millions of users

Written byBy Claire Giordano & Pouria Hadjibagheri | December 11, 2021Dec 11, 2021

From the beginning of the COVID-19 pandemic, the United Kingdom (UK) government has made it a top priority to track key health metrics and to share those metrics with the public.

And the citizens of the UK were hungry for information, as they tried to make sense of what was happening. Maps, graphs, and tables became the lingua franca of the pandemic. As a result, the GOV.UK Coronavirus dashboard became one of the most visited public service websites in the United Kingdom.

The list of people who rely on the UK Coronavirus dashboard is quite long: government personnel, public health officials, healthcare employees, journalists, and the public all use the same service.

Keep reading
Burak Velioglu

How to scale Postgres for time series data with Citus

Written byBy Burak Velioglu | October 22, 2021Oct 22, 2021

Managing time series data at scale can be a challenge. PostgreSQL offers many powerful data processing features such as indexes, COPY and SQL—but the high data volumes and ever-growing nature of time series data can cause your database to slow down over time.

Fortunately, Postgres has a built-in solution to this problem: Partitioning tables by time range.

Partitioning with the Postgres declarative partitioning feature can help you speed up query and ingest times for your time series workloads. Range partitioning lets you create a table and break it up into smaller partitions, based on ranges (typically time ranges). Query performance improves since each query only has to deal with much smaller chunks. Though, you’ll still be limited by the memory, CPU, and storage resources of your Postgres server.

The good news is you can scale out your partitioned Postgres tables to handle enormous amounts of data by distributing the partitions across a cluster. How? By using the Citus extension to Postgres. In other words, with Citus you can create distributed time-partitioned tables. To save disk space on your nodes, you can also compress your partitions—without giving up indexes on them. Even better: the latest Citus 10.2 open-source release makes it a lot easier to manage your partitions in PostgreSQL.

Keep reading

Today, we are excited to announce PostgreSQL 14's General Availability (GA) on Azure's Hyperscale (Citus) option. To our knowledge, this is the first time a major cloud provider has announced GA for a new Postgres major version on their platform one day after the official release.

Starting today, you can deploy Postgres 14 in many Hyperscale (Citus) regions. In upcoming months, we will roll out Postgres 14 across more Azure regions and also release it with our new Flexible Server option in Azure Database for PostgreSQL.

This announcement helps us bring the latest in Postgres to Azure customers as new features become available. Further, it shows our commitment to open source PostgreSQL and its ecosystem. We choose to extend Postgres and share our contributions, instead of creating and managing a proprietary fork on the cloud.

In this blog post, you'll first get a glimpse into some of our favorite features in Postgres 14. These include connection scaling, faster VACUUM, and improvements to crash recovery times.

We'll then describe the work involved in making Postgres extensions compatible with new major Postgres versions, including our distributed database Citus as well as other extensions such as HyperLogLog (HLL), pg_cron, and TopN. Finally, you'll learn how packaging, testing, and deployments work on Hyperscale (Citus). This last part ties everything together and enables us to release new versions on Azure, with speed.

Keep reading
David Rowley

Speeding up recovery & VACUUM in Postgres 14

Written byBy David Rowley | March 25, 2021Mar 25, 2021

One of the performance projects I’ve focused on in PostgreSQL 14 is speeding up PostgreSQL recovery and vacuum. In the PostgreSQL team at Microsoft, I spend most of my time working with other members of the community on the PostgreSQL open source project. And in Postgres 14 (due to release in Q3 of 2021), I committed a change to optimize the compactify_tuples function, to reduce CPU utilization in the PostgreSQL recovery process. This performance optimization in PostgreSQL 14 made our crash recovery test case about 2.4x faster.

The compactify_tuples function is used internally in PostgreSQL:

  • when PostgreSQL starts up after a non-clean shutdown—called crash recovery
  • by the recovery process that is used by physical standby servers to replay changes (as described in the write-ahead log) as they arrive from the primary server
  • by VACUUM

So the good news is that the improvements to compactify_tuples will: improve crash recovery performance; reduce the load on the standby server, allowing it to replay the write-ahead log from the primary server more quickly; and improve VACUUM performance.

Keep reading

Page 5 of 15