Citus BlogCitus Blog

Thoughts about the Citus database—as well as PostgreSQL, sharding, distributed databases, and other open source extensions to Postgres.

Lukas Fittl

Postgres tips for Rails developers

Written byBy Lukas Fittl | April 28, 2017Apr 28, 2017

This week at RailsConf, we found ourselves sharing a lot of tips for using PostgreSQL with Rails. We thought it might be worthwhile to write up many of these and share more broadly. Here you’ll find some tips that will help you in debugging and improving performance of your database from your Rails app.

And now, on to the code.

Keep reading
Anonymous

Analyzing PostgreSQL Email Archives with PostgreSQL

Written byBy Carl Steinbach | April 20, 2017Apr 20, 2017

Update: This post was originally published in January 2013. We updated the post in 2017 to include the appropriate SQL commands you can use with Citus. The Citus extension to Postgres is open source and available to try via download or in the cloud via the Azure Cosmos DB for PostgreSQL managed service.

PostgreSQL's Full Text Search capability is an excellent example of a powerful, relatively new feature that Citus users are able to leverage. First introduced in PostgreSQL 8.3 and continuously improved since then, Full Text Search (FTS) provides the SQL semantics necessary to run keyword searches over a corpus of documents stored in a database. FTS executes these operations efficiently by enabling the use of GIN and GiST indexes . In addition, from our standpoint, one of the best features of FTS is that Citus makes it linearly scalable. Specifically, with Citus one can double the size of the search dataset and still satisfy the same SLAs simply by doubling the number of machines in the Citus cluster.

In this blog post we wanted to demonstrate using Full Text Search with Citus, but in order to do so we first needed to find an interesting dataset to use in our examples. After hunting around we eventually hit on the idea of using email archives from the PostgreSQL mailing lists. Besides having a nicely self-reflexive quality about it, we also thought this would provide a good opportunity to learn more about the PostgreSQL community. Early in the history of PostgreSQL project, these lists became the primary communication mechanism for both developers and users, and the collected archives in turn provide a unique opportunity to get a complete view of almost everything that has happened in the project over the past fifteen years.

Keep reading

One of the most common questions we get at Citus is how the rebalancer works–which is understandable. When you have an elastic scaled out database, how easy it is to scale is a key factor into how usable it will be. While we'll happily take time for anyone that's interested and live demo it, this walk through should give you a better idea of how it works for those of you that are curious.

Keep reading

Running SELECT COUNT(DISTINCT) on your database is all too common. In applications it's typical to have some analytics dashboard highlighting the number of unique items such as unique users, unique products, unique visits. While traditional SELECT COUNT(DISTINCT) queries works well in single machine setups, it is a difficult problem to solve in distributed systems. When you have this type of query, you can't just push query to the workers and add up results, because most likely there will be overlapping records in different workers. Instead you can do:

  • Pull all distinct data to one machine and count there. (Doesn't scale)
  • Do a map/reduce. (Scales but it's very slow)

This is where approximation algorithms or sketches come in. Sketches are probabilistic algorithms which can generate approximate results efficiently within mathematically provable error bounds. There are a many of them out there, but today we're just going to focus on one, HyperLogLog or HLL. HLL is very successfull for estimating unique number of elements in a list. First we'll look some at the internals of the HLL to help us understand why HLL algorithm is useful to solve distict count problem in a scalable way, then how it can be applied in a distributed fashion. Then we will see some examples of HLL usage.

Keep reading

AWS is the leader when it comes to the cloud, and for good reason. AWS is well ahead in the quality and breadth of services they offer.

However, when a service is running at the scale of AWS, it is natural to expect some failures to occur. According to AWS EBS availability is designed for 99.999%.

The annual failure rate (AFR) is 0.1% - 0.2%, where failure means a complete or partial failure. For example, if you had 1,000 EBS discs, you should expect 1 or 2 to have a failure per year. In our experience, partial failure is significantly more common than a complete loss. Even so, a partial loss can take a lot of time to resolve and can still be debilitating to a business.

Over the years, there have been some AWS failures that made news headlines due to havoc caused for both companies and their users. These incidents put a spotlight on AWS’ imperfections.

Keep reading

For many SaaS products, a common database problem is having one customer that has so much data, it adversely impacts other customers on the shared machine. This leads many to ask, “What do I do with my largest customer?”

Tenant isolation is a great way to solve this issue. Effectively it allows you to control which tenant or customer in particular you want to isolate on a completely new node. By separating a tenant, you get dedicated resources with more memory and cpu processing power.

Keep reading

Citus is a distributed database that extends (not forks) PostgreSQL for large workloads. One challenge associated with building a distributed relational database (RDBMS) is that they require notable effort to deploy and operate. To remove these operational barriers, we’ve been thinking about offering Citus as a managed database for a while now.

Naturally, we were also worried that providing a native database offering on AWS could split our startup’s focus and take up significant engineering resources. (Honestly, if the founding engineers of the Heroku Postgres team didn’t join Citus, we might have decided to wait on this.) After having Citus Cloud publicly available for eight months though, we are now more bullish on the cloud then ever.

It turns out that targeting an important use case for your customers and delivering it to them in a way that removes their pain points, matters more than anything else. In this blog post, we’ll only focus on removing operational pain points and not on use cases: Why is cloud changing the way databases are delivered to customers? What AWS technologies Citus Cloud is using to enable that in a unique way?

Keep reading
Craig Kerstiens

A multi-tenant sharding tutorial

Written byBy Craig Kerstiens | March 9, 2017Mar 9, 2017

A number of SaaS applications have data models where they want to have their customers interact with only their data. At the enterprise end you have companies like Salesforce and Workday that fall into this bucket, but we see a ton of small ones as well. If you're just getting started figuring out how you should approach your data so it can scale in the future, it doesn't have to be hard.

Here we're going to walk through an example data model that you can use as a basis for learning how you could apply the same to your own multi-tenant application.

Keep reading

Microservices and NoSQL get a lot of hype, but in many cases what you really want is a relational database that simply works, and can easily scale as your application data grows. Microservices can help you split up areas of concern, but also introduce complexity and often heavy engineering work to migrate to them. Yet, there are a lot of monolithic apps out that do need to scale. If you don't want the added complexity of microservices, but do need to continue scaling your relational database then you can with Citus. With Citus 6.1 we're continuing to make scaling out your database even easier with all the benefits of Postgres (SQL, JSONB, PostGIS, indexes, etc.) still packed in there.

With this new release customers like Heap and ConvertFlow are able to scale from single node Postgres to horizontal linear scale. Citus 6.1 brings several improvements, making scaling your multi-tenant app even easier. These include:

  • Integrated reference table support
  • Tenant Isolation
  • View support on distributed tables
  • Distributed Vaccum / Analyze

All of this with the same language bindings, clients, drivers, libraries (like ActiveRecord) that Postgres already works with.

Give Citus 6.1 a try today on Citus Cloud, our fully managed database-as-a-service on top of AWS, or read on to learn more about all that’s included in this release.

Keep reading
Craig Kerstiens

Setting up your log destination on Citus Cloud

Written byBy Craig Kerstiens | February 13, 2017Feb 13, 2017

Your database is a key part of your stack, and when things act up in your application getting insights into it are key. With Citus Cloud you have a number of dashboards with metrics you can look into as well as centralized logging. In addition to the centralized logging, you also have the ability to drain your logs to the provider of your choice. This means you can have all your Citus Cloud logs (both the coordinator and distributed nodes) integrated with the rest of your application logs.

Keep reading

Page 22 of 32