Cassandra ResultSet Paging

In mobile applications the amount of data sent from the REST service to the mobile application is an important consideration. So for any dataset where we might return a large number of rows we have to implement paging. In this post (based on the current documentation) I will show you how Cassandra handles paging and how you can implement paging in your stateless REST service.

Introduction

When working with stateless (micro)services it is not possible to keep result sets in memory. The next request is unlikely to land on the same service instance. Fortunately Cassandra’s PagingState is built with stateless services in mind.

In this posts I will use a simple chat application as an example. Our chat REST backend allows our users to view all the chats in a room. Since we keep one day of history for our chat we want users to be able to page through the chat 20 messages at a time.

So first we define a chat table:

CREATE TABLE chat (
    room TEXT,
    timestamp TIMESTAMP,
    user TEXT,
    message TEXT,
    PRIMARY KEY((room), timestamp, user))
    WITH CLUSTERING ORDER BY (timestamp DESC, user ASC);

We have the room as the partition key and include timestamp and user into the key to make it unique. We also cluster on timestamp in a descending order so that we get the latest chat message first.

Our inserts and selects are very straight-forward. First our insert function:

    public void insertChat(Chat chat) {
        Insert insert = QueryBuilder.insertInto(TABLE_CHAT)
                .using(QueryBuilder.ttl(CHAT_TTL))
                .value(FIELD_ROOM, chat.getRoom())
                .value(FIELD_TIMESTAMP, new Date())
                .value(FIELD_USER, chat.getUser())
                .value(FIELD_MESSAGE, chat.getMessage());

        cassandra.execute(insert);
    }

We have set a TTL of one day. This way Cassandra automatically cleans up old chat messages for us. Of course you would not add the TTL line if you wanted to keep chats indefinitely.

Our select statement:

        Select select = QueryBuilder
                .select(FIELD_ROOM, FIELD_TIMESTAMP, FIELD_USER, FIELD_MESSAGE)
                .from(TABLE_CHAT);
        select.where(eq(FIELD_ROOM, room));
        select.setFetchSize(FETCH_SIZE);

As you can see here we instruct the driver to use a fetch size (the constant is set to 20). This means that when you iterate over the result set the driver will request rows in batches of 20 from Cassandra. Without this the driver would request a large result set.

Note	The default value for fetch size is somewhere around 5000. So it would be a shame to leave it at the default; you’d be throwing away 4980 results.

Adding paging

So now that we have the basics set up we can add paging. Let’s first take a look at our controller:

    @RequestMapping(value = "/{room}", method = RequestMethod.GET)
    public ChatPage getChat(
	@PathVariable String room,
	@RequestParam(value = "next", required = false) String next) {
        return chatRepository.selectChat(room, next);
    }

The GET /chat/{room} method has an optional @RequestParam called next where the client can send the next page state. Such a request would look like this:

GET /chat/cats?next=002400100000001c00080000015611882(...)

The Cassandra page state is a long hexadecimal string that informs Cassandra which rows can be skipped.

So let’s now take a look at the complete select function in the repository:

    public ChatPage selectChat(String room, String page) {
        //Create select statement
        Select select = QueryBuilder
                .select(FIELD_ROOM, FIELD_TIMESTAMP, FIELD_USER, FIELD_MESSAGE)
                .from(TABLE_CHAT);
        select.where(eq(FIELD_ROOM, room));
        select.setFetchSize(FETCH_SIZE);

        //If we have a 'next' page set we deserialise it and add it to the select
        //statement
        if(page != null) {
            select.setPagingState(PagingState.fromString(page));
        }

        //Used to map rows to Chat domain objects
        CassandraConverter converter = cassandra.getConverter();

        //Execute the query
        ResultSet resultSet = cassandra.getSession().execute(select);

        //Get the next paging state
        PagingState newPagingState = resultSet.getExecutionInfo().getPagingState();
        //The number of rows that can be read without fetching
        int remaining = resultSet.getAvailableWithoutFetching();

        List<Chat> chats = new ArrayList<>(FETCH_SIZE);

        for (Row row : resultSet) {
            //Convert rows to chat objects
            Chat chat = converter.read(Chat.class, row);

            chats.add(chat);

            //If we can't move to the next row without fetching we break
            if (--remaining == 0) {
                break;
            }
        }

        //Serialise the next paging state
        String serializedNewPagingState = newPagingState != null ?
		newPagingState.toString() :
		null;

        //Return an object with a list of chat messages and the next paging state
        return new ChatPage(chats, serializedNewPagingState);
    }

You might notice that we use the raw Cassandra session here. The Spring CassandraTemplate currently does not seem to support paging that well.

Hopefully the inline comments explain the flow well enough. The main difference between just mapping a result to a list of domain objects is that here you have to iterate over part of the ResultSet manually. It is however not hard to create a generic helper that does the same thing so I expect this to be added to the CassandraTemplate quite soon. When the ChatPage object is returned from our API it looks something like this:

{
  "chats": [
    {
      "user": "Simon",
      "room": "cats",
      "message": "Message 4 cats rock!",
      "timestamp": "2016-07-24T14:57:37.090Z"
    },
    ...
  ],
  "next": "002500100000001d0008000001561d6953ec00000553696..."
}

Our service client can use the 'next' value as the paging state for the next page on the next request. This way all state is kept on the client.

Conclusion

When you’re accustomed to Spring’s CassandraTemplate class doing paging on large result sets requires a substantial amount of extra work. It is however a very powerful feature that is solid and easy to implement once you get over the hurdle of understanding it.