16 Notes on Cassandra Summit Europe 2014 (Part II)

Cassandra Summit 2014 Header

And here we are again! Transcoding my handwritten notes before I forget the interesting details from the Summit. If you’re new here I’d advice you to read the first part of this serie from the first day at the Summit.

But before beginning I want to make sure I congratulate all the organisers of this event enough for making such a big and high quality one. DataStax, thank you very much, you are simply GREAT!

So this second day was a talks day, lots of talks organised in groups of 4 that happened simultaneously, so hard decisions to be made every 45 minutes or so.

The conferences day began with a Keynote by Billy Bosworth and Jonathan Ellis speaking about Cassandra and the upcoming 3.0 and I took a few of interesting notes:

Use DTCS for time series compaction.

It optimises time range queries but not time sorting queries.

Lightweight transactions by using conditional statements

Typical example of balance transfers. We update after reading. Use the read value to be sure it has not been updated

BEGIN BATCH
  UPDATE accounts SET balance = 180 WHERE ... IF balance = 100
  UPDATE accounts SET balance = 100 WHERE ... IF balance = 180
APPLY BATCH

User Defined Types

CREATE TABLE address (
   ...
)

CREATE TABLE users (
  id uuid PRIMARY KEY,
  name text, 
  addresses map<text, frozen<address>>
)

Beware that every addresses map value will be blob serialised so even an update on a single field of the address implies a full entity update.

Collections indexing

C* 3.0 will allow us to query on collections, i.e:

 SELECT ... WHERE field CONTAINS X

Counters…

Prior to v2.1 only the increment was written to the commit log and therefore the main problem was to ensure commit log was never replayed. In 2.1 and afterwards the full counter value is also stored in the commit log making a replay idempotent, just as any other operation.

JSON integration

JSON arrives to Cassandra! C* will map a json formatted string to a matching tables structure and will allow something like:

INSERT INTO user VALUES '{ "name" = "Carlos", "surname" = "Alonso", ...}'

User defined functions

Like any other stored procedure language.

Local indexes -> Global indexes

Each node indexes it’s owned data, denormalised as another table. With global indexes the index table is in it’s own partition making it faster and easier to query.

After a few good talks then the awesome Patrick McFadin came with a few of good advices on performance.

Use preprocessed statements: They were conceived for performance.

Once the statement is preprocessed then using it is a shortcut, saving all the parsing code.

Use execute async whenever possible

It benefits from parallel processing. This is speed for free!!

Batches are for atomicity

And should be small!

Row cache is better than partition

Row cache just caches a fixed number (defined on table creation) of rows (hot rows), rather than the whole partition.

And again, after a few more good talks and very close to the end Theo Hultberg filled the last lines of my notebook.

If you need to read a very long row, split it and read parts concurrently

Sounds clever and simple, huh?

Beware tombstones!

Don’t just assume Cassandra is fast compacting them. Again, test!

Cassandra is written in Java

So make sure you know the differences between data types in your language and Java.

Atomic operations?

Make sure you know what is atomic and what isn’t from the docs, otherwise you’ll soon be in trouble.

And that’s all folks!! A really nice compilation of notes, ideas and concepts from a really awesome event, have I mentioned that before?

But I don’t want to say goodbye without mentioning a very good product presented there. I didn’t made any note on it because I’m sure I won’t forget. It is Stratio Cassandra and it seems to fix one of the biggest problems C* has: lack of flexibility. If you happen to require a new data access pattern it’s very likely that your model simply doesn’t support it.

Andrés de la Peña and Daniel Higuero, from Stratio introduced the product that consists on indexing the data on each node using Lucene on top of Cassandra allowing you to access the data with the flexibility Lucene provides. Simply brilliant.

And that’s all! Hope you find those useful.

2 thoughts on “16 Notes on Cassandra Summit Europe 2014 (Part II)

  1. This post is very interesting, but is hard to find in search engine.
    I found it on 24 spot. You can reach google top ten easily
    using one useful wp plugin and increase targeted traffic
    many times. Just search in google for:
    Aemikimi’s Rank Plugin

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s