Citus Blog

Articles tagged: Postgres

Craig Kerstiens

Options for scaling from 1 to 100,000 tenants

Written byBy Craig Kerstiens | June 28, 2018Jun 28, 2018

When you first start out in building a SaaS application you talk about that day in the future when you will have scaling problems, how that'll be the day, how that would be a good problem to have. You focus on getting the first few customers, making sure they have a great experience, and suddenly you're at 10s of customers, then 100s. You've upgraded your app server to a larger one, then you've gone from one ec2 app server to multiple ones with ELB in front of things. You've upgraded your Postgres database from an r3.large on AWS, to r3.xlarge, now you're eyeing that r3.2xlarge next month. In the back of your mind though, you're starting to look at your plans for future growth of your SaaS app, and you're wondering how much larger you can keep going. Your database is performing well at 100 tenants (tenants = customers), your back of the napkin math says you'll be able to scale your app up to 1,000 tenants, but after that you know you're going to have to explore some options.

What are those options and what are the trade-offs and benefits?

Keep reading
Craig Kerstiens

Fun with SQL: Functions in Postgres

Written byBy Craig Kerstiens | June 21, 2018Jun 21, 2018

In our previous Fun with SQL post on the Citus Data blog, we covered window functions. Window functions are a special class of function that allow you to grab values across rows and then perform some logic. By jumping ahead to window functions, we missed so many of the other handy functions that exist within Postgres natively. There are in fact several hundred built-in functions. And when needed, you can also create your own user defined functions (UDFs), if you need something custom. Today we're going to walk through just a small sampling of SQL functions that can be extremely handy in PostgreSQL.

Keep reading

Many companies generate large volumes of time series data from events happening in their application. It’s often useful to have a real-time analytics dashboard to spot trends and changes as they happen. You can build a real-time analytics dashboard on Postgres by constructing a simple pipeline:

  1. Load events into a raw data table in batches
  2. Periodically aggregate new events into a rollup table
  3. Select from the rollup table in the dashboard

For large data streams, Citus (an open source extension to Postgres that scales out Postgres horizontally) can scale out each of these steps across all the cores in a cluster of Postgres nodes.

One of the challenges of maintaining a rollup table is tracking which events have already been aggregated—so you can make sure that each event is aggregated exactly once. A common technique to ensure exactly-once aggregation is to run the aggregation for a particular time period after that time period is over. We often recommend aggregating at the end of the time period for its simplicity, but you cannot provide any results before the time period is over and backfilling is complicated.

Keep reading
Craig Kerstiens

Configuring memory for Postgres

Written byBy Craig Kerstiens | June 12, 2018Jun 12, 2018

work_mem is perhaps the most confusing setting within Postgres. work_mem is a configuration within Postgres that determines how much memory can be used during certain operations. At its surface, the work_mem setting seems simple: after all, work_mem just specifies the amount of memory available to be used by internal sort operations and hash tables before writing data to disk. And yet, leaving work_mem unconfigured can bring on a host of issues. What perhaps is more troubling, though, is when you receive an out of memory error on your database and you jump in to tune work_mem, only for it to behave in an un-intuitive manner.

Keep reading
Craig Kerstiens

Citus what is it good for? OLTP? OLAP? HTAP?

Written byBy Craig Kerstiens | June 7, 2018Jun 7, 2018

Earlier this week as I was waiting to begin a talk at a conference, I chatted with someone in the audience that had a few questions. They led off with this question: is Citus a good fit for X? The heart of what they were looking to figure out: is the Citus distributed database a better fit for analytical (data warehousing) workloads, or for more transactional workloads, to power applications? We hear this question quite a lot, so I thought I'd elaborate more on the use cases that make sense for Citus from a technical perspective.

Before I dig in, if you're not familiar with Citus; we transform Postgres into a distributed database that allows you to scale your Postgres database horizontally. Under the covers, your data is sharded across multiple nodes, meanwhile things still appear as a single node to your application. By appearing still like a single node database, your application doesn't need to know about the sharding. We do this as a pure extension to Postgres, which means you get all the power and flexibility that's included within Postgres such as JSONB, PostGIS, rich indexing, and more.

Keep reading
Craig Kerstiens

Fun with SQL: Window functions in Postgres

Written byBy Craig Kerstiens | June 1, 2018Jun 1, 2018

Today we continue to explore all the powerful and fun things you can do with SQL. SQL is a very expressive language and when it comes to analyzing your data there isn't a better option. You can see the evidence of SQL's power in all the attempts made by NoSQL databases to recreate the capabilities of SQL. So why not just start with a SQL database that scales? (Like my favorites, Postgres and Citus.)

Today, in the latest post in our 'Fun with SQL' series (earlier blog posts were about recursive CTEs, generate_series, and relocating shards on a Citus database cluster), we're going to look at window functions in PostgreSQL. Window functions are key in various analytic and reporting use cases where you want to compare and contrast data. Window functions allow you to compare values between rows that are somehow related to the current row. Some practical uses of window functions can be:

  • Finding the first time all users performed some action
  • Finding how much each users bill increased or decreased from the previous month
  • Find where all users ranked for some sub-grouping

Keep reading
Craig Kerstiens

Preparing your multi-tenant app for scale

Written byBy Craig Kerstiens | May 22, 2018May 22, 2018

We spend a lot of time with companies that are growing fast, or planning for future growth. It may be you've built your product and are now just trying to keep the system growing and scaling to handle new users and revenue. Or you may be still building the product, but know that an even moderate level of success could lead to a lot of scaling. In either case where you spend your time is key in order to not lose valuable time.

As Donald Knuth states it in Computer Programming as an Art:

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

With the above in mind one of the most common questions we get is: What do I need to do now to make sure I can scale my multi-tenant application later?

We've written some before about approaches not to take such as schema based sharding or one database per customer and the trade-offs that come with that approach. Here we'll dig into three key steps you should take that won't be wasted effort should the need to scale occur.

Keep reading
Craig Kerstiens

Fun with SQL: Recursive CTEs in Postgres

Written byBy Craig Kerstiens | May 15, 2018May 15, 2018

Common Table Expressions (CTEs) are a powerful construct within SQL. In day to day conversation, you may hear CTEs referred to as WITH clauses. You can think of CTEs as similar to a view that is materialized only while that query is running and does not exist outside of that query. CTEs can be very useful building blocks for allowing your large SQL queries to be more readable. But, they can also be used recursively allowing you to create some very complex queries without having to drop down to a procedural language like plpgsql or plv8.

Keep reading
Craig Kerstiens

Raw SQL access for users with row-level-security

Written byBy Craig Kerstiens | April 4, 2018Apr 4, 2018

We talk with a lot of SaaS companies that are encountering issues with their database. The most common issue we discuss relates to performance, either a need to keep scaling or at times just dealing with really intensive data needs of only a few customers and how to handle that.

And then as you continue to scale and capture more data you want to provide more value back to your customers.

At times you might even consider giving raw SQL access to your largest and most important customers. Typically controlling what data you give them, via dashboards and canned reports is ideal–this way you can control performance impact and other risks. But, if you have extra large/important customers that require you to give them raw access to the data... then PostgreSQL and thus Citus has your answer.

Pro-tip: Don't grant access to *all** of your customers.*

Keep reading
Craig Kerstiens

Contributing to Postgres via patch review

Written byBy Craig Kerstiens | March 31, 2018Mar 31, 2018

Citus is an open source extension to Postgres that transforms Postgres into a distributed database, scaling horizontally. The fact that Citus is built on top of Postgres is a huge benefit to our users: it means that when you choose Citus, you get all the great features that are available in Postgres. And Postgres itself is an awesome database. Awesome. As a team, we value the foundation we're built on and regularly aim to contribute back to it. We have a number of developers that have contributed to Postgres over the years from features like watch, event triggers, and the PostgreSQL extension framework.

Recently a few more of our engineers expressed an interest in giving back to the PostgreSQL community. In fact it's a common question, how can we better help the PostgreSQL project? And a common answer is reviewing patches. To help kick start that process we organized a session and carved out a few days just for patch review during the most recent commitfest.

Keep reading

Page 10 of 15