Citus Blog

Articles tagged: Citus

Craig Kerstiens

Citus Data internal hackathon roundup

Written byBy Craig Kerstiens | March 26, 2018Mar 26, 2018

At Citus Data, we regularly get the team together, because even with an engineering team that is distributed around the globe, face-to-face time is valuable to connecting and collaborating. During our team offsites, we often organize engineering hackathons to proof out new ideas, learn new things, or just for fun. We recently completed one of our Citus hackathons and thought we'd share some of what we built.

The theme of our hackathon this time was on building the ultimate dashboard for our Citus extension to Postgres. For Postgres, there are lots of options out there for capturing and displaying insights into your database. You could use New Relic, Vivid Cortex, or something entirely open source like pghero. But we wanted to explore the question, what more could we provide?

Our two teams took two very different approaches, but each emerged with something interesting that we hope to continue to build on and productize in the future. In case you’re curious, here’s a look at each of the projects from our hackday:

Keep reading
Marco Slot

Distributed Execution of Subqueries and CTEs in Citus

Written byBy Marco Slot | March 9, 2018Mar 9, 2018

The latest release of the Citus database brings a number of exciting improvements for analytical queries across all the data and for real-time analytics applications. Citus already offered full SQL support on distributed tables for single-tenant queries and support for advanced subqueries that can be distributed ("pushed down") to the shards. With Citus 7.2, you can also use CTEs (common table expressions), set operations, and most subqueries thanks to a new technique we call "recursive planning".

Keep reading
Joe Kutner

Using Hibernate and Spring to Build Multi-Tenant Java Apps

Written byBy Joe Kutner | February 13, 2018Feb 13, 2018

If you're building a Java app, there's a good chance you're using Hibernate. The Hibernate ORM is a nearly ubiquitous choice for Java developers who need to interact with a relational database. It's mature, widely supported, and feature rich—as demonstrated by its support for multi tenant applications.

Hibernate officially supports two different multi-tenancy mechanisms: separate database and separate schema. Unfortunately, both of these mechanisms come with some downsides in terms of scaling. A third Hibernate multi-tenancy mechanism, a tenant discriminator, also exists, and it’s usable—but it’s still considered a work-in-progress by some. Unlike the separate database and separate schema approaches, which require distinct database connections for each tenant, Hibernate’s tenant discriminator model stores tenant data in a single database and partitions records with either a simple column value or a complex SQL formula.

But fear not, despite the unfinished state of Hibernate's built-in support for a tenant discriminator (or in simple terms tenant_id), it's possible to implement your own discriminator using standard Spring, Hibernate, and AspectJ mechanisms that work quite well. The Hibernate tenant discriminator model works well as you start small on a single-node Postgres database, and even better, tenant discriminator can continue to scale as your data grows by leveraging the Citus extension to Postgres.

Keep reading

In both Citus Cloud 2 and in the enterprise edition of Citus 7.1 there was a pretty big update to one of our flagship features—the shard rebalancer. No, I’m not talking about our shard rebalancer visualization that reminds me of the Windows '95 disk defrag. (Side-node: At one point I tried to persuade my engineering team to play tetris music in the background while the shard rebalancer UI in Citus Cloud was running. The good news for all of you is that I was overwhelmingly veto'ed by my team. Whew.) The interesting new capability in the Citus database is the online nature of our shard rebalancer.

Keep reading

Today, we’re excited to announce our latest release of our distributed database—Citus 7.2. With this release, we’re making Citus more of a drop-in replacement for your single-node Postgres database, so you don’t need to adapt your SQL for a distributed system.

For multi-tenant applications where the single-tenant queries were scoped to a single machine, Citus already provided full SQL support. . The improvements in Citus 7.2 take our support for distributed SQL one big step further. With Citus database version 7.2, we now extend our distributed SQL support to queries that run on data spread across a cluster of machines. This becomes particularly important for real-time analytics workloads, where even the most complex SELECT queries need to be parallelized across machines.

If you’re into bulleted lists, here’s the quick overview of what’s new in Citus database version 7.2 for distributed queries that span across machines. For an overview of other recent Citus features check out these blogs about distributed transactions and Citus 7.1.

Keep reading

Years ago Citus used to have multiple methods for distributing data across many nodes (we actually still support both today), there was both hash-based partitioning and time-based partitioning. Over time we found big benefits in further enhancing the features around hash-based partitioning which enabled us to add richer SQL support, transactions, foreign keys, and more. Thus in recent years, we put less energy into time-based partitioning. But… no one stopped asking us about time partitioning, especially for fast data expiration. All that time we were listening. We just thought it best to align our product with the path of core Postgres as opposed to branching away from it.

Postgres has had some form of time-based partitioning for years. Though for many years it was a bit kludgy and wasn't part of core Postgres. With Postgres 10 came native time partitioning, and because Citus is an open source extension to Postgres that means anyone using Citus gets to take advantage of time-based partitioning as well. You can now create tables that are distributed across nodes by ID and partitioned by time on disk.

We have found a few Postgres extensions that make partitioning much easier to use. The best in class for improving time partitioning is pg_partman and today we'll dig into getting time partitioning set up with your Citus database cluster using pg_partman.

Keep reading
Nate Barbettini

Multi-tenant web apps with ASP.NET Core and Postgres

Written byBy Nate Barbettini | January 22, 2018Jan 22, 2018

When it comes to building large-scale, multi-tenant applications, Microsoft's ASP.NET platform is a strong choice. Like other popular web frameworks such as Express and Django, ASP.NET is used to build web applications and APIs. It's been around for a while, but don't let that fool you: ASP.NET packs some serious muscle. After all, it powers one of the biggest Q&A networks on the web: Stack Exchange!

In the past, ASP.NET apps could only run on Windows servers. That's changed with the latest version, ASP.NET Core, which is fully open source and cross-platform. ASP.NET Core runs anywhere you need it to (Windows, Mac, Linux, Docker) and features a modern middleware pipeline, a rich package ecosystem, and blazing-fast performance.

My experience working on multi-tenant enterprise apps has taught me that it's never too early to design for scale. How you architect your code matters, as does how you architect your data. In the past, the apps I worked on were designed around a database-per-tenant model—unfortunately, the database-per-tenant model didn’t scale and caused problems once our app reached thousands of customers (aka tenants). In this post, I’ll show you a different approach to scale the underlying database with ASP.NET: sharding. With sharding you can leave behind the drawbacks of the database-per-tenant model and can scale infinitely.

In this blog post, I'll show you how to build your multi-tenant app with scale in mind. You'll learn how to use ASP.NET Core's middleware pipeline plus the sharding features of Postgres and Citus to build a scalable multi-tenant application on ASP.NET Core. Along the way we’ll start to build the MVP of our very own StackExchange. Let's get started!

Keep reading
Craig Kerstiens

Database sharding explained in plain English

Written byBy Craig Kerstiens | January 10, 2018Jan 10, 2018

Sharding is one of those database topics that most developers have a distant understanding of, but the details aren't always perfectly clear unless you've implemented sharding yourself. In building the Citus database (our extension to Postgres that shards the underlying database), we've followed a lot of the same principles you'd follow if you were manually sharding Postgres yourself. The main difference of course is that with Citus, we’ve done the heavy lifting to shard Postgres and make it easy to adopt, whereas if you were to shard at the application layer then there’s a good bit of of work needed to re-architect your application.

I've found myself explaining how sharding works to many people over the past year and realized it would be useful (and maybe even interesting) to break it down in plain English.

Keep reading

Citus scales out Postgres for a number of different use cases, both as a system of record and as a system of engagement. One use case we're seeing implemented a lot these days: using the Citus database to power customer-facing real-time analytics dashboards, even when dealing with billions of events per day. Dashboards and pipelines are easy to handle when you’re at 10 GB of data, as you grow even basic operations like a count of unique users require non-trivial engineering work to get performing well.

Citus is a good fit for these types of event dashboards because of Citus’ ability to ingest large amounts of data, to perform rollups concurrently, to mix both raw unrolled-up data with pre-aggregated data, and finally to support a large number of concurrent users. Adding all these capabilities together, the Citus extension to Postgres works well for end users where a data warehouse may not work nearly as well. We've talked some here about various parts of building a real-time customer facing dashboard, but today we thought we'd go one step further and give you a guide for doing it end to end.

Keep reading
Murat Tuncer

Distributed count distinct vs. HyperLogLog in Postgres

Written byBy Murat Tuncer | December 22, 2017Dec 22, 2017

Citus 7.1 shipped just a few weeks back and included a number of great new features. In case you missed the details check out Ozgun’s blog or read up on what Citus is on our site. Today though we want to drill further into an important area in Postgres, counting.

Getting a distinct count of some value out of your database is a common question. We've talked about how to count more quickly on our blog before, and followed that up with how you can use probabilistic algorithms like HyperLogLog to do counts faster.

Keep reading

Page 9 of 13