Cloud

Working with dynamic schema in Azure DocumentDB

dynamicAbout a week ago Azure DocumentDB, a new Microsoft’s entry into NoSql database market, become generally available. DocumentDB allows user to to store/update/delete arbitrary objects (as complex as needed, not limited by relation structure) and query these objects using a special (quite limited) flavor of SQL (yes, SQL!) or LINQ when it’s used from .NET. Although these kinds of databases are called NoSql, they should rather be called object databases, or non-relational, because, well, you can still use SQL to query them.
So, these objects that you can story in DocumentDB, they could be really anything. However, in order to effectively use DocumentDB .NET client library, you need to know the object schema in advance. For example, let’s say you have the following class:

public class Person
{
  public string FirstName { get; set; }
  public string LastName { get; set; }
}

And then you instantiating the object of this class and storing it to DocumentDB (assuming that you already created database, document connection and client):

var person = new Person() { FirstName = "John", LastName = "Doe" };
await client.CreateDocumentAsync(collection.SelfLink, person);

So far so good, then let’s retrieve this object back:

person = client.CreateDocumentQuery(collection.SelfLink)
                .Where(p => p.FirstName == "John")
                .AsEnumerable()
                .FirstOrDefault();

This works, but we knew that is a Person and we already have class for Person defined. But what if we don’t have this class defined? Let’s say that we have a set of JSON objects as our data source:

              var data = @"{
                        'FirstName': 'John',
                        'LastName': 'Doeth',
                        'Position': 'Developer',
                        'DOB': '10/12/1973'}";
                    var obj = JsonConvert.DeserializeObject(data);
                    await client.CreateDocumentAsync(collection.SelfLink, obj);

Now we don’t have our Person class any more, we have JSON. Well, we still can query the data… with dynamic SQL:

dynamic person = client.CreateDocumentQuery(collection.SelfLink, "SELECT * FROM root WHERE FirstName = 'John'").ToEnumerable().FirstOrDefault();

The person object, our result, in this case going to be a dynamic object. So, in order to access any of it’s properties we need to know them at compile time:

var firstName = person.FirstName;

But… what if we don’t know the properties as compile time? In our example data comes from JSON, so it could be truly anything. DocumentDB will happily store that “anything”, but how do we work with that data when we retrieve it?
Worse, there is one more issue: ordering. DocumentDB subset of SQL language doesn’t support ordering, so we can’t just write something like SELECT * FROM root WHERE FirstName = ‘John’ ORDER BY FirstName.
If we know the property name at compile time, ordering is not a big problem, we can do is with LINQ (on client side, unfortunately):

client.CreateDocumentQuery(collection.SelfLink, "SELECT * FROM root WHERE FirstName = 'John'").OrderBy(p => p.FirstName)

Well, there is a solution. It’s possible to work with dynamic object properties which we don’t know at compile time, although it’s not pretty. Basically, we need to case dynamic to DynamicObject and then use DynamicObject’s methods to access properties. That process also involves creating CallSite object (basically a runtime compiler services). We can encapsulate this not-that-pretty code into one convenient set of extensions.
Due to the limitations of our blogging platform which is choking on large amounts of code, I’m publishing the extensions to GitHub: https://github.com/starnovsky/DynamicExtensions
With these extensions, querying DocumentDB dynamic schema (i.e. when we don’t have schema defined at compile time) becomes a breeze:
Ordering will look like that:

var person = client.CreateDocumentQuery(collection.SelfLink, "SELECT * FROM root").ToEnumerable().OrderBy("FirstName").FirstOrefault();

We can access dynamic properties too:

DynamicExtensions.PropertyValue(person, "FirstName")

And we can also perform SQL LIKE queries (which DocumentDB SQL is also lacking):

client.CreateDocumentQuery(collection.SelfLink, "SELECT * FROM root").ToEnumerable().WhereContains("LastName", "th")

The code above will return all persons which contains “th” substring in LastName.
Summary:
Azure DocumentDB is a powerful NoSql database as a service and allows to store any object without knowing the object structure in advance (unlike relational database), however it’s hard to query dynamic schema (i.e. when object schema is not known at compile time) which is making DocumentDB less usable when it could be. Provided set of dynamic extensions is making querying dynamic schema objects much simpler.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Stan Tarnovskiy

Solutions Architect at Perficient

More from this Author

Subscribe to the Weekly Blog Digest:

Sign Up
Follow Us
TwitterLinkedinFacebookYoutubeInstagram