Common case when running a backend system is to require some kind of uniqueness of the records, based on their content. Wheneven having a transactional system, that is pumped with events from other systems/services, every transaction is with generated UUID, but this does not guarantee that the source is unique. There are many cases, where there is backend service that sends data, and if the call fails, the data sending is retried. There are plenty of ways to handle this case with SQL databases, but I would like to explain how we approach the problem in highly scalable system, relying on NoSQL database in AP segment of CAP Theorem - Cassandra.

The Problem

We would like to store our posts (postID, postText), but we want to guarantee that two requests for creating one and the same post does not result in 2 but just 1 record. (we assume that the ID is generated by our system, and there is no way to recognize if it is the same post, but only checking the content).

The solution

The functionality is covered by OSS library CUS

The solution will be relying on Cassandra as DB store, and the service will be implemented in Golang. Our service will implement the following steps:

Database schema

CREATE TABLE IF NOT EXISTS cus_data(
	obj_id TEXT PRIMARY KEY,
	checksum TEXT,
	data BLOB);

CREATE TABLE IF NOT EXISTS cus_checksum(
	checksum TEXT PRIMARY KEY,
	obj_id TEXT);

In cus_data we will store the obj_id the checksum and the real data - data, where as in table cus_checksum we are going to store the reverse index of checksum against obj_id. The crucial part is that we are going to use Lightweight Transactions (LWT), provided by Cassandra, Scylla and AWS Keyspaces, to make sure that we are consistent in inserting records, that are unique. Usually in SQL like database we would use serialized transaction isolation for this, but since we are operating in AP database like cassandra - we can do that only with LWT.

CQL Statements

Insert checksum and check with LWT if the operation succeded

INSERT INTO cus_checksum (checksum, obj_id) VALUES (?,?) IF NOT EXISTS

Insert the real data record

INSERT INTO cus_data (obj_id, checksum, data) VALUES (?,?,?)

Go implementation

If you follow the code, the most importat thing is how to use LWT with gocql driver. It can be seen in Create method of the library

row := make(map[string]interface{})
if applied, err = qry.MapScanCAS(row); err != nil {
  return nil, fmt.Errorf("failed no data available from LWT : %v", err)
}
if !applied {
  var ok bool
  var duplicateObjId string
  if duplicateObjId, ok = row["obj_id"].(string); !ok {
    return nil, fmt.Errorf("Failed to cast obj_id arg to string : %v", row)
  }
  return &duplicateObjId, ErrDuplicateObject
}

if err := s.session.Query(insertDataCQL, id, checksum, data).Exec(); err != nil {
  return nil, fmt.Errorf("failed to insert data : %v", err)
}

NOTE: Here is used MapScanCAS instead of MapScan or Scan method in the gocql driver. This is important if you are doing LWT

Source code

Cassandra Unique Store library

~Enjoy