logo

Patterns Page

Patterns Page
This page contains some very useful NoSQL patterns and is useful reading even if you are not using PlayOrm.  PlayOrm just makes some of the patterns easier to implement if you are using java or scala.
This project source code is available on github at https://github.com/deanhiller/playorm. If you find this page useful, please give us kudos and click the Star button(top right on GitHub).

FindAll

One of the most important queries in NoSql is findAll(List keys). If you are not finding multiple keys at a time in at least a few use cases, you are using NoSql completely wrong!!!! This method in PlayOrm and NoSql finds all the entities by primary key in parallel reading from many disks in parallel so it is extremely fast. When you do a OneToMany in PlayOrm, this is done under the covers for you. We avoid Joins ALOT in NoSql even with PlayOrm(we still leverage them in certain use-cases but not as much as you do with an RDBMS). We instead frequently use OneToMany which embeds foreign keys in the row(multiple foreign keys in one row AND different counts in each row…one row may have 2 and another row 100). When you do something like this

 

public User {
@NoSqlId(usegenerated=false)
private String id;

@OneToMany
private List<Shirt> shirts = new ArrayList();
}

//our client code then....(On access to the first Shirt variable like shirt.getColor
//PlayOrm does a findAll and gets all the shirts as it is faster that way and they all come back in
//parallely). If you do shirt.getId, no datastore access occurs since that is already in the User row.
for(Shirt shirt : user.getShirts()) {
//do something with the shirt
}

This is not always a good thing though. If you have join conditions like you want shirts that are all red, you may want to go this route. And if you want to do joins in a highly scalable fashion, you MUST use partitions or your system will NOT scale properly.

A ToMany Slightly Wider Row

This is a very common pattern if you are NOT using PlayOrm AND if you are using it. In PlayOrm, it is just a @NoSqlOneToMany annotation is all but is vastly different than JPA. Let’s say you have a User that has Emails and it is a oneToMany(it could easily be a many to many too). In this case, in noSQL , the extremely common way to represent a onetomany is to embed FK’s in the row itself like this row.

Column/Row Key Value
Row Key (guid) or “Dean”
Last Name Hiller
Phone Number 555-5555
Email.EmailFk56 null
Eamil.EmailFk98 null

NOTICE that the values of the Email.Email{foreignkey} columns are null as we don’t need them. We are simply using a composite column name which is very frequent in all noSQL solutions. We use this pattern a lot when we know reading in this row will not blow out memory. If we are worried about memory issues, then another pattern is the Wide Row Child pattern below which can grow to at the very least 1 million (max is less than 10 million and depends on context) and won’t blow out your memory. If you want to grow past 1 million, you need to rethink the design a bit and don’t want this pattern.  Edit: We have done tests and 10 million was fine and we know the limit is in the billions but still not sure the performance past 10 million columns in a row.

NoSqlEmbedded

A very common pattern in NoSql is to resolve things on read. We actually use @NoSqlEmbedded in one use case on our project. For security lookups and to make security lookups really really fast just by pk, we have a special table that is a duplication of data. It contains the secureresource name + key = row where the row contains a permission. Rather than contain a singe column for the permission which would require us to read in the row, it contains a List so if one part of the GUI writes the row with a List containing just admin and another one at the same time writes a row with a List containing writer. BOTH for the same user here!!!! What happens is the row ends up containing admin AND writer so when someone reads the row, in their entity, they have a getHighestPermissions() which returns the admin as they go through the list to see if there is an admin. This allows us to have NO reads on this security table and just keep writing/writing/writing to it.

public SecurityTable {
@NoSqlId(usegenerated=false)
private String resourceNameAndKey;
@NoSqlEmbedded
private List<String> permissions = new ArrayList<String>();

public SecurityTable(String key, String permission) {
addPermission(permission);
this.key = key;
}

public void addPermission(String p) {
this.add(p);
}
public String getHighestPermission() {
if(permissions.contains("admin"));
return "admin";
else if(permissions.contains("writer"));
return "writer";
else
return "reader";
}
}

//our client code then might run like this on two different servers...
SecurityTable t = new SecurityTable(key, "admin");
mgr.put(t);

//second server
SecurityTable t = new SecurityTable(key, "writer");
mgr.put(t);

//The row from the above code ends up with admin AND writer in the row so when someone reads it they
//figure out the highest level of permission at that time.
//This is VERY fast as we don't do any reads at all and just send data down.

Virtual Tables

A pattern we used for one client is having 30,000 tables in one CF since cassandra/hadoop/etc don’t cope so well with too many column families. This allows us to have data in what looks to be it’s own tables. In that case, we have 30,000 sensors and readings for those sensors. Virtual tables make it very easy (and keep the indexes very small as well) such that we can query into any one virtual table.

TimePartitioning

Combined with Virtual tables, this is extremely good as a pattern. Partitions are spread across the cassandra cluster unlike cassandra partitioning where all data is on one node(which is very bad in certain contexts where you need the parallel disk reads). With time partitioning, you may decide your partition key is the beginning of every month so all data for one month is in a single partition. We currently use time partitioning inside virtual tables(30,000 virt tables) as one client wants to store 10 years worth of research data and some is gathered every second. PlayOrm allows an unlimited number of partitions and it is all about just deciding the partition sizes. Your size may be one year, one hour or like us one month. It depends on your incoming data rates.

CustomerPartitioning

Customer partitioning is where you attach a partition id to every table like a customer id or account number. You can then join all the tables in those partitions related to that customer. A single partition can scale up as large as an RDBMS but your limits are reached where the RDBMS reaches it’s limits. Unlike an RDBMS however, you can now have an infinite amount of customers since there is no limit on the number of partitions you can have. Also data in a partition is still spread across the cluster so you get multiple disk reads when any customer accesses the system.

TimePeriod demarcation

Another way to go is rather than partition a table at all, and let’s say a User has emails based in time as the most important factor. You can have a User OneToMany with TimePeriod and a TimePeriod OneToMany with Emails. This pattern is useful in certain situations.

A Denormalized Duplicate Table

Let’s face it, security changes are usually not happening at a very rapid pace. It is a manual job of someone going in an administering a new user and adding him to a secure resource. Let’s say we have a Schema object that we give Users or Groups access to. A Group contains more Users. Let’s say a Schema contains tables we are giving the user access to. We can increase the write time or administration time but get a HUGE boost in security check time. The way you do this is when someone adds a user to a Schema, you create a row in a SecureKeyToTable table. So let’s say we have a Group like so
User Name Secure Key
dean verylongkeyA
bob verylongkeyBCD

And a schema with the following tables

Table Name
email
computers

Now, when someone adds that group to the Schema above giving users access to those tables, we insert these rows into a special table that is really a duplication of the xref table we may have already between the group and the schema like so

Row Key FK to Table
email:dean:verylongkeyA email
computers:dean:verylongkeyA computers
email:bob:verylongkeyBCD email
computers:bob:verylongkeyBCD computers

In this fashion, and this is VERY important, we can now do a findAll passing in 50 security checks if we need to and in parallel all the responses come back from different disks in the cluster. If we need to check if Bob has access to email table, computers table and whole slew of tables, we can know that within 10ms if not faster. Currently we have one client that has a design like this with 30,000 tables and we have to do around 262 security checks / second so it is extremely fast. The administration part is also not to slow even though it has to update 30,000 rows potentially if keys change.

A Wide Row Child

Another frequently used pattern may be a @OneToMany where the Many relationship may be between 0 and 1,000,000 or so other entities. In this case, PlayOrm offers the CursorToMany like so(and you can do this manually in any NoSql system without PlayOrm though it requires a bit of coding on your part).

 

public User {

@NoSqlId
private String id;

//This field is another row in another table....(it is actually a child wide row of the User row)...
@NoSqlOneToMany
private CursorToMany<Shirt> shirts = new CursorToManyImpl<Shirt>();

}

//our client code then ...
int count = 0;
while(user.getShirts().next()) {
Shirt shirt = user.getShirts().getCurrent();
count++;
//do some work here generating an entity called something...
NoSql.em().put(something); //put something
if(count % 50 == 0) {
//flush clear every 50 so we don't blow out memory....
NoSql.em().flush();
NoSql.em().clear();
}
}
What PlayOrm does behind the scenes is it has another table and creates a row with a primary key of the Table name + primary key of the user. This row is then a very wide row allowing millions of entries. This row is NOT read in unless you access the cursor in PlayOrm. Since we know this can be a very wide row and could blow out memory, you need to flush/clear every so often as you go through the list of items.

Scalable Equals Only Index

(You can do this one yourself or with PlayOrm which makes it easier). PlayOrm currently has an issue for this where you can add @NoSqlIndexed(byKeyOnly=true) on some code like so


@NoSqlQuery(name="byEmail", query="select u from TABLE as u where u.email = :email")
public User {
@NoSqlId
private String id;

//This field is another row in another table....(it is actually a child wide row of the User row)...
@NoSqlIndexed(byKeyOnly=true)
private String email;
}

This will create two tables. It creates the User table AND an emailToUser table as well. This pattern scales to an infinite amount of rows with no penalty where with other types of indexes you must play a partitioning game so they don’t become too large. The emailToUser table is very simple in that it is just a row key of email and a column with the users primary key. In this pattern, the JQL query CANNOT use >, <, etc. etc. ONLY the equals(=) is allowed. This is extremly fast read of the email to get the primary key and then get the User by the primary key. This is very useful when you know the email is allowed to change(a business requirement). Now PlayOrm does all the work of deleting the old emailToUser row and adding the new one when you modify the email. You can of course do this pattern yourself as well. Have fun with it.

Another Use-Case

Another perfect use-case for virtual databases is you are selling to small businesses so you basically design a table called Small Business and each business has tons of clients so you have a SmallBizClients table. Those clients then have relationships with other clients or products that the small business has so maybe a smallbizproducts table and clientrelationships tables. You are now good to go and we work on our partitioning of the indexes to create the virtual databases. Clearly we would want to partition the SmallBizClients table so we can have trillions of clients and billions of small busineses. We can then grab the index for that small business and start querying the smallbizclients table for that small business.

 

 << Back to Documentation Index