What to love and hate about Azure’s DocumentDB

Azure DocumentDB is Microsoft’s fully-managed document-oriented NoSQL database service that is built to work within the Azure Cloud ecosystem much like SQL Azure, SQL Storage, Azure search, etc. DocumentDB is relatively a new player in the NoSQL world (it was released for general availability in April 2015). It comes with an impressive list of features and has gone through several version updates. It also has its limits (the list hasn’t been updated for a while so make sure you read the comments section). In this post we are going to discuss some of the distinct features that the community loves about DocumentDB as well as other work-in-progress features that they, more or less, hate.

What to love about Azure’s DocumentDB

Platform-as-a-Service (PaaS)

DocumentDB lives among a plethora of other services in Azure. That means, by design, DocumentDB is PaaS. Because it is cloud-based, you don’t have to manage any virtual machine yourself. The setup and the initial/ongoing configuration is effortless and all managed through the Azure Portal or via PowerShell and the Command Line Interface (Stick to the web interface if you can). You also have control over scaling up/down, replication and tuning performance further by customizing the index policies and consistency levels you want for a particular application or scenario, making it incredibly flexible. Beyond initial setup and configuration, there is the great benefit of not having to continuously maintain your servers, manage your backups and constantly be concerned about security.

Scale-out & High Availability

DocumentDB is architected with High Availability and Scalability and puts a great emphasis on performance. It has been used to power high-scale production services at Microsoft like the User Data Store that powers the MSN suite of web and mobile apps. You can achieve near-infinite scaling (in terms of storage and throughput) for your DocumentDB application by horizontally partitioning your data – a concept commonly referred to as sharding. Learn more about scaling and partitioning data in DocumentDB.

JSON & Native RESTful API

DocumentDB reuses existing languages, protocols and formats. It uses JSON to represent documents which is originally derived from the Javascript language (it is native to JavaScript) and is a standard for data interchange nowadays. It also provides a native REST interface over HTTP. In fact, the client drivers are largely thin wrappers around the REST interface. Queries must include an authorization header. You wouldn’t probably use the REST interfaces directly unless you are writing a wrapper yourself, or you want to consume your documents from a client that only supports REST.

GET a document example:

GET https://contosomarketing.documents.azure.com/dbs/-yI8AA==/colls/-yI8AKNuyAA=/docs/V7tQANV3rAkDAAAAAAAAAA== HTTP/1.1
x-ms-date: Mon, 18 Apr 2015 07:39:23 GMT
authorization: type%3dmaster%26ver%3d1.0%26sig%3dJith8Kqorph9kuifIQZ2P%2fbQryAZhe%2bw62rML04YpoE%3d
x-ms-version: 2015-04-08
Host: contosomarketing.documents.azure.com

The response would look like the following:

HTTP/1.1 200 OK
{
id: 'TestDocument',
book: ‘Autumn of the Patriarch’,
_id: ‘V7tQANV3rAkDAAAAAAAAAA==‘,
ts: 1407830727,
self: ‘dbs/V7tQAA==/colls/V7tQANV3rAk=/docs/V7tQANV3rAkDAAAAAAAAAA==/’,
etag: ‘6c006596-0000-0000-0000-53e9cac70000’,
attachments: ‘attachments/’,
Price: 200
}

Asynchronous API & LINQ

You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. In .NET, this can be achieved by using the Async and Await keywords. DocumentDB’s .NET SDK fully supports the .NET async pattern. In fact, write operations only have async methods while read operations provide both, sync and async versions.

using (client = new DocumentClient(new Uri(endpoint), authKey))
{
    var database = new Database { Id = "AsyncDemo" };
    database = await client.CreateDatabaseAsync(database);
 
    var collection = new DocumentCollection { Id = "Families" };
    collection = await client.CreateDocumentCollectionAsync(database.SelfLink, collection);
}

DocumentDB .NET SDK also includes a LINQ (Language Integrated Query) provider that implements query translation of projection, filtering, traversal and sorting operators to SQL. LINQ is a preferred programming model for many developers and DocumentDB continues to provide more support for LINQ.

            // LINQ Query -- Id == "value" OR City == "value"
            var query =
                from f in client.CreateDocumentQuery<Family>(collectionLink)
                where f.Id == "AndersenFamily" || f.Address.City == "NY"
                select new { Name = f.LastName, City = f.Address.City };

You can find more samples in the github project.

JavaScript Integration

DocumentDB loves JavaScript and is carefully designed from the ground up to make it a first class citizen. This means that JavaScript developers can easily get productive and start manipulating their data. DocumentDB takes a traditional relational style by allowing developers to register stored procedures, triggers, and user defined functions (UDF) written only and natively in JavaScript. This approach of using the JavaScript Language Integrated Queries as the “modern day T-SQL” frees application developers from the complexities of type system mismatches and object-relational mapping technologies.

Define the stored procedure:

var helloWorldStoredProc = {
    id: "helloWorld",
    body: function () {
        var context = getContext();
        var response = context.getResponse();

        response.setBody("Hello, World");
    }
}

Register the stored procedure:

var createdStoredProcedure;
client.createStoredProcedureAsync(collection._self, helloWorldStoredProc)
    .then(function (response) {
        createdStoredProcedure = response.resource;
        console.log("Successfully created stored procedure");
    }, function (error) {
        console.log("Error", error);
    });

Execute the stored procedure and get the results:

client.executeStoredProcedureAsync(createdStoredProcedure._self)
    .then(function (response) {
        console.log(response.result); // "Hello, World"
    }, function (err) {
        console.log("Error", error);
    });

Rich Querying Capabilities

DocumentDB, albeit a NoSQL database, supports querying of documents using the a traditional, SQL-like, query language which includes hierarchical querying and the ability to execute JavaScript.

Basic query:

SELECT food.id,
    food.description,
    food.tags,
    food.foodGroup
FROM food
WHERE food.foodGroup = "Snacks" and food.id = "19015"

Advanced query with a JOIN and built-in functions:

SELECT food.id, 
    food.commonName, 
    food.foodGroup, 
    ROUND(nutrient.nutritionValue) AS amount, 
    nutrient.units 
FROM food JOIN nutrient IN food.nutrients 
WHERE IS_DEFINED(food.commonName) 
    AND nutrient.description = "Water" 
    AND food.foodGroup IN ("Sausages and Luncheon Meats", "Legumes and Legume Products")
    AND food.id > "42178"

This might be a surprise for you because of its resemblance to traditional SQL. DocumentDB offers an easy-to-use, somewhat familiar programming pattern. DocumentDB’s Query Playground in an interactive site where you try out the rich querying capabilities and learn about the different types of queries that DocumentDB supports. You can read more about it in this tutorial and you can also download the DocumentDB SQL cheat sheet.

What to Hate about Azure’s DocumentDB

Cost

DocumentDB is an Azure service and it’s not free. Many in the community have complained about the pricing model ever since the product was in preview. I’d attribute that to the lack of flexibility and the different way that DocumentDB handles collections compared to other NoSQL offerings.

Azure’s DocumentDB is billed based on the number of collections contained in a database account. Each collection comes with 10GB of SSD storage. The cost of each collection is based on the performance level you set for it (S1,S2 or S3). If you are the type of developer who favors creating multiple containers to isolate your data, then you may want to reconsider that approach.

The team is continuously reviewing the pricing model and I expect to see changes in the future.

Client SDKs

Azure’s DocumentDB offers a variety of APIs and SDKs including:

  • .NET SDK
  • Java SDK
  • Python SDK
  • NodeJS SDK
  • Client JavaScript SDK

If you’ve tried to use the Client JavaScript SDK before then you know how painful that experience is. If you haven’t, then I suggest that you wait. The Client SDKs are still not mature and the documentation looks unfinished. I suggest that you read my post on using the Client SDKs with Cordova where I raised some the pain points.

Documentation and Samples

Documentation is another area where DocumentDB is lacking. That is expected with any new product, and I reckon that it will improve in the near future. We are seeing more and more samples for the different SDKs, except for the Client SDK which is not very supported.

Tooling

If you’re used to Database Management tools such as SQL Server Management Studio then you may be disappointed with the lack of such tooling for DocumentDB. You will be limited to what the Azure Portal has to offer. DocumentDB Studio is an open source project that aims to provide a viewer for the DocumentDB service. If you are considering migrating to DocumentDB, then you’re covered with an open source migration tool from Microsoft.

Conclusion

Azure’s DocumentDB comes with a unique set of features that make Microsoft compete closely with the competitors in the cloud infrastructure market. While relatively a new product, it has undergone many updates and will continue to evolve. So I expect this list to change in the next few months.

Also Read

3 Comments
  • John Macintyre [johnmac@MSFT]
    February 4, 2016

    Thanks for spending some time with DocumentDB George. As you note, we are actively looking at the pricing model and want to make sure lack of flexibility doesn’t hold back use of the service. We’d love to get more feedback on docs/samples you want to see. Also note that on the tooling front you can access DocumentDB from VS https://azure.microsoft.com/en-us/blog/exploring-azure-documentdb-in-visual-studio/ … and we plan to make more investments in the tooling space.

    • George Saadeh
      February 4, 2016

      Thank you for your input John. I didn’t include the management tools in Visual Studio so I am glad you mentioned it. I should add it to the list. I am impressed how fast the service is evolving and I will be writing more about it. As for the documentation, I am very pleased with the new documentation portal but it’s annoying to still find many outdated resources out there (back from the preview days). It’s also lacking samples for properly using the Client Side SDKs.

      • Mimi Gentz [mimig@MSFT]
        February 10, 2016

        George, what are the outdated resources that you’re referring to?

Leave a Reply to John Macintyre [johnmac@MSFT] Cancel reply

Your email address will not be published. Required fields are marked *