Windows Azure Data Storage

The following is excerpted from my just released book Windows Azure Data Storage (Wiley Press, Oct 2013). And, since the format is eBook only, there will be updates to the content as new features are added to the Azure Data Storage world.

Business craves data.

As a developer, this is not news to you. The people running businesses have wanted it for years. They demand data about how many widgets have been ordered, how much inventory is available to be used in manufacturing, how many accounts are more than 45 days past due. More recently, the corporate appetite for data has spread way past these snacks. They want to store information about how individual consumers navigate through their website. They want to keep track of how different metrics about the machines are used in the manufacturing process. They have hundreds of MB of documents, spreadsheets, pictures, audio, and video files that need to be stored and managed. And the volume of data that is collected grows by an obscene amount every single day.

What businesses plan on doing with this information depends greatly on the industry, as well as the type and quality of the data. Inevitably, the data needs to be stored. Fortunately (or it would be an incredibly short book) Windows Azure has a number of different data storage technologies that are targeted at some of the most common business scenarios. Whether you have transient storage requirements or the need for a more permanent resting place for your data, Windows Azure is likely to have you covered.

Business Scenarios for Storage

A feature without a problem to solve is like a lighthouse on a sunny day—no one really notices and it’s not really helping anyone. To ensure that the features covered in this book don’t meet the same fate, the rest of this chapter maps the Windows Azure Data Storage components and functionality onto problems that you are likely already familiar with. If you haven’t faced them in your own workplace, then you probably know people or companies that have. At a minimum, your own toolkit will be enriched by knowing how you can address common problems that may come up in the future.


A style of data storage that has recently received a lot of attention in the development community is NoSQL. While the immediate impression, given the name, is that style considers SQL to be an anathema, this is not the case. The name actually means Not Only SQL.

To a certain extent, the easiest way to define NoSQL is to look at what it’s not, as well as the niche it tries to fill. There is no question that the amount of data stored throughout the world is vast. And the volume is increasing at an accelerating rate. Studies indicate that over the course of four years (2008-2012), the total amount of digital data has increased by 500 percent. While this is not quite exponential growth, it is very steep linear growth. What is also readily apparent is that this growth is not likely to plateau in the near future.

Now think for a moment about how you might model this structure using a relational database. For relational databases, you would need tables and columns with foreign key relationships. For instance, start with a page table that has a URL column in it. A second table containing the links from that page to other pages would also be created. Each record in the second table would contain the key to the first page and the key to the linked-to page. In the relational database world, this is commonly how many-to-many relationships are created. While feasible, querying against this structure would be time consuming, as every single link in the network would be stored in that one, single table. And to this point, the contents of the page have not yet been considered.

NoSQL is designed to address these issues. To start, it is not a relational data store. Instead, there is no fixed schema and querying does not require any joins to be performed. At least, not in the traditional sense. Instead, NoSQL is a variation (depending on the implementation) of the key-value paradigm. In the Windows Azure world, different forms of NoSQL-style storage is provided through Tables and Blobs.

Big Data

Any discussion of NoSQL tends to lead into the topic of Big Data. As a concept, Big Data has been generating a lot of buzz over the last 12-18 months. Yet, like the cloud before it, people find it challenging to define Big Data specifically. Sure, they know its “Big,” and they know that it’s “Data,” but beyond that, there is not a high level of agreement or understanding of the purpose and process of collecting and evaluating Big Data.

Most frequently, you read about Big Data in the context of Business Intelligence (BI). The goal of BI is to provide decision makers with the important information they need to make the choices that are inevitable in any organization. In order to achieve this goal, BI needs to gain access to data from a variety of sources within an organization, rationalize the definitions (i.e., make sure that the definition for common terms are the same across the different data sources), and present visualizations of the information to the user.

Based on the previous section, you might see why Big Data and NoSQL are frequently covered together. NoSQL supports large values of semi-structured data, and Big Data produces large volumes of semi-structured information. It seems like they are made for one another. Under the covers, they are. However, to go beyond Table, and Blob Storage, the front for Big Data in Windows Azure is Adobe Hadoop. Or, more accurately, the Azure HDInsight Services.

Relational Data

For the vast majority of developers, relational data is what immediately springs to mind when the term Data is mentioned. But since relational data has been intertwined with computers since the early in the history of computer programming, this shouldn’t be surprising.

With Windows Azure, there are two areas where relational data can live. First there are Window Azure Virtual Machines (Azure VMs), which are easy to create and can contain almost any database that you can imagine. Second, there are Windows SQL Azure databases. How you can configure, access and synchronize data with both of these modes are covered in detail in the book.


Messaging, message queues, and service bus have a long and occasionally maligned history. The concept behind messages and message queues are quite old (in technology terms) and, when used appropriately, are incredibly useful for implementing certain application patterns. In fact, many developers take advantage of the message pattern when they use seemingly non-messaging related technologies such as Windows Communication Foundation (WCF). If you look under the covers of guaranteed, in-order delivery using protocols, which don’t support such functionality (cough…HTTP…cough), you will see a messaging structure being used extensively.

In Windows Azure, basic queuing functionality is offered through Queue Storage. It feels a little odd to think of a message queue as a storage medium, yet ultimately that’s what it is. An application creates a message and posts it to the appropriate queue. That message sits there (that is to say, is stored) until a second application decides to remove it from the queue. So, unlike the data in a relational database, which is stored for long periods of time, Queue Storage is much more transient. But it still fits into the category of storage.

Windows Azure Service Bus is conceptually just an extension of Queue Storage. Messages are posted to and popped from the Service Bus. However, it also provides the ability for messages to pass between different networks, through firewalls, and even across corporate boundaries. Additionally, there is no requirement to open up an endpoint on either side of the communications channel that would expose the participant to external attacks.


It should be apparent even from just these sections that the level of integration between Azure and the various tools (both for developers and administrators) is quite high. This may not seem like a big deal, but anything that can improve your productivity is important. And deep integration definitely fits into that category. Second, the features in Azure are priced to let you plan with them at low or no cost. Most features have a long-enough trial period so that you can feel comfortable with the capabilities. Even after the trial, Azure bills based on usage, which means you would only be paying for what you use.

The goal of the book is to provide you with more details about the technologies introduced in this chapter. While the smallest detail of every technology is not covered, there is more than enough information for you to get started on the projects that you need to determine Azure’s viability in your environment.

Sometimes, Little Things Matter–Azure Queues, Poor Performance, Throttling and John Nagle

Sometimes it amazes me how much of a polyglot that developers need to be to solve problems. Not really a polyglot, as that actually relates to learning multiple languages, but maybe a poly-tech.

Allow me to set the scenario. A client of ours is using Windows Azure Queue Storage to collect messages from a large number of different sources. Applications of varying types push messages into the queue. On the receiving side, they have a number of worker roles whose job it is to pull messages from the queue and process them. To give you a sense of the scope, there are around 50,000 messages per hour being pushed through the queues, and between 50-200 worker roles processing the messages on the other end.

For the most part, this system had been working fine. Messages come in, messages go out. Sun goes up, sun goes down. Clients are happy and worker roles are happy.

Then a new release was rolled out. And as part of that release, the number of messages that passed through the queues increased. By greater than a factor of two. Still, Azure prides itself on scalability and even at more than 100,000 messages per hour, there shouldn’t be any issues. Right?

Well, there were some issues as it turned out. The first manifested itself as an HTTP status 503. This occurred while attempting to retrieve a message from the queue. The status code 503 is used to indicate a service unavailable. Which seemed a little odd since not every single attempt to retrieve messages returned that status. Most requests actually succeeded.

Identifying the source of this problem required looking into the logs that are provided automatically by Azure. Well, automatically once you have turned logging on. A very detailed description of what is stored in these logs can be found here. The logs themselves can be found at http://<accountname>$logs and what they showed was that the failing requests had a transaction status of ThrottlingError.

Azure Queue Throttling

A single Windows Azure Queue can process up to 2,000 transactions per second. The definition of a transaction is either a Put, a Get or a Delete operation. That last one might catch people by surprise. If you are evaluating the number of operations that you are performing, make sure to include the Delete in your count. This means that a fully processed message actually requires three transactions (because the Get is usually followed by a Delete in a successful dequeue function).

If you crack the 2,000 transactions per second limit, you start to get HTTP 503 status codes. The expectation is that your application will back off on processing when these 503 codes are received. Now the question of how an application backs off is an interesting one. And it’s going to depend a great deal on what your application is doing.

From my perspective, one of the most effective ways to handle this type of throttling is to redesign how the application uses queues. Not a complete redesign, but a shift in the queues being used. The key is found in the idea that the transactions per second limit is on a single queue. So by creating more queues, you can increase the number of transactions per second that your application can handle.

How you want to split your queues up will depend on your application. While there is no ‘right’ way I have seen a couple of different approaches. The first involved creating queues of different priorities. Then the messages being pushed into the queues can be done based on the relative priority.

A second way would be to create a queue for each type of message. This has the possibility of greatly increasing the number of queues. There are a number of benefits. The sender of the message does not have to be aware of the priority assigned to a message. They just submit a message to the queue with no concerns. That makes for a cleaner, simpler client. The worker is where control of where the priority lies. The worker can be pick and choose which queues to focus on based on whatever priority logic the application requires. This approach does presume that it’s easier to update the receiving workers then the clients, but you get the idea.


Now that the 503 messages were dealt with, we had to focus on what we perceived to be poor performance when retrieving messages from the queue. Specifically, we found (when we put a stop watch around the GetMessage call) that it was occasionally taking over 1000 milliseconds to retrieve the message. And the median seemed to be someplace in the 400-500 millisecond. This is an order of magnitude over the 50 milliseconds we were expecting.

This source of this particular problem was identified in conversation with a Microsoft support person. And when it was mentioned our collective response was ‘of course’. The requests were Nagling.

Some background might be required. Unless you are a serious poly-tech.

Nagle’s Algorithm is a mechanism by which the efficiency of TCP/IP communication can be improved. The problem Nagle addresses is when the data in the packets being sent are small. In that case, the size of the TCP header might actually be a very large percentage of the data being transmitted. The header for a TCP package is 40 bytes in size. If the payload was 5 or 10 bytes, that is a lot of overhead.

Nagle's algorithm combines these small outgoing messages into a single, larger message. The algorithm actually proscribes that as long as there is a sent packet for which the sender has received no acknowledgment from the recipient, the sender should keep combining payloads until a full packet’s worth is ready to be sent.

All of this is well and good. Until a sender using Nagle interacts with a recipient using TCP Delayed Acknowledgements. With delayed acknowledgements, the recipient may delay the ACK for up to 500ms to give the recipient a change to actually include the response with the ACK packet. Again, the idea is to increase the efficiency of TCP by reducing the number of ‘suboptimal’ packets.

Now consider how these two protocols work in conjunction (actually, opposition) with one another. Let’s say Fred is sending data to Barney. At the very end of the transmission, Fred has less than a complete packet’s worth of data to send. As specified in Nagle’s Algorithm, Fred will wait until it receives an ACK from Barney before it sends the last packet of data. After all, Fred might discover more information that needs to be sent. At the same time, Barney has implemented delayed acknowledgements. So Barney waits up to 500ms before sending an ACK in case the response can be sent back along with the ACK.

Both sides of the transmission end up waiting for the other. It is only the delayed acknowledgement timeout that breaks this impasse. And the result is the potential for occasionally waiting up to 500ms for a response to a GetMessage call. Sound familiar? That’s because it was pretty much exactly the problem we were facing.

There are two solutions to this problem. The first, which is completely unrealistic, is to turn off TCP delayed acknowledgments in Azure. Yeah, right. The second is much, much easier. Disable Nagle’s Algorithm in the call to GetMessage. In Azure, Nagle is enabled by default. To turn it off, you need to use the ServicePointManager .NET class.

CloudStorageAccount account = CloudStorageAccount.Parse(connectionString);
ServicePoint queueServicePoint =
ServicePointManager.FindServicePoint(account.QueueEndpoint); queueServicePoint.UseNagleAlgorithm = false;

So there you go. In order to be able to figure out why a couple of issues arose within Azure Queue Storage, you needed to be aware of HTTP status codes, the throttling limitations of Azure, queue design, TCP and John Nagle. As I initially started with, you need to be a poly-tech. And special thanks to Keith Hassen, who discovered much of what appears in this blog post while in the crucible of an escalating production problem.

Taking the SuggestedValues rule one step further

Have you ever created a custom field on a TFS work item, that you wanted to be free form entry but save the entries so the next person can select from the previous entries.

You could write a custom control to do this also. However having a service in the back ground to manage this at the server is much easier. And does not have to be installed on each client. You first need to create a Web Service that will subscribe to TFS’s Bissubscribe.exe. There is plenty of information out there to show you the mechanics of this. Check out Ewald Hofman’s blog for the details on creating a web service to subscribe to TFS. It’s an old post but still useful, easy to understand and follow.

As an example, let’s assume the field we want to do this for on is called Requested By. Where users can select from the Developers or Business User Security groups or enter a name that is not a member of a group in TFS at all. To solve this problem we created a GlobalList called RequestedBy. Then we added a SuggestedValues rule to the field that included the Developers and Business Users groups, as well as the RequestedBy GlobalList.

The field definition looks like this.

<FIELD name="Requested By" refname="RequestedBy" type="String">
        <LISTITEM value="[Project]\Developers" />
        <LISTITEM value="[Project]\Business Users" />
        <GLOBALLIST name="RequestedBy" />
    <REQUIRED />


If the user enters a value into the field that is not from one of the TFS groups or the globalist the web service kicks in and adds the value to the globalist. So the next user that enters that name will find them in the list and is less likely to spell the name differently than the first person.

And here is the code in the web service that accomplishes that task.

public void AddToGlobalList(WorkItemStore workItemStore, string globalList, string value)
    if (!string.IsNullOrWhiteSpace(value))
        var globalLists = workItemStore.ExportGlobalLists();
        var node = globalLists.SelectSingleNode(
    string.Format("//GLOBALLIST[@name='{0}']/LISTITEM[@value='{1}']", globalList, value));

        if (node == null)
            node = globalLists.SelectSingleNode(
        string.Format("//GLOBALLIST[@name='{0}']", globalList));
            if (node != null)
                var valueAttr = globalLists.CreateAttribute("value");
                valueAttr.Value = value;
                var child = globalLists.CreateElement("LISTITEM");

Microsoft Test Manager – Test Impact

I just had to share this, Test Impact at it’s finest.

Working with a client on a brand new application that has Test Impact coming through as an angel on our shoulders. The recommended tests are really our regression tests, no questions about it.

This is a great example of the importance of automated unit tests for legacy code. The time spent up front can save time and money down the road. There are different ways to get test impact working like this for any project. It will take initiative and creative thinking.

Samples of what we are getting in Test Impact.

Build Test Impact Summary: image


Click Code Changes and see this:













Example of the Compare Changes image








Recommended Tests Cases in Test Manager










Testa Smile

Microsoft watch a live stand-up and parking lot meeting with the TFS Agile Product team

See how Microsoft’s TFS Agile Team do their scrum stand-up and parking lot meetings  Short video is the stand-up – Long video is the parking lot meeting.

I like the use of the Agile Board it is a nice visual that is missing in the standard stand-up meetings in most companies.

Scrum Stand Up

Another interesting video on using business value in a scrum project

Business Value in Scrum

Testa Smile

IIS Express Default Settings

On occasion when I open a Web application in Visual Studio, I receive a message that is similar to the following:

image So that the search bots can find the text, the pertinent portion reads “The following settings were applied to the project based on settings for the local instance of IIS Express”.

The message basically says that the settings on the Web application with respect to authentication don’t match the default settings in your local IIS Express. So Visual Studio, to make sure that the project can be deployed, changes the Web application settings. Now there are many cases where this is not desirable and the message nicely tells you how to change it back. What is hard to find out is how to change the default settings for IIS Express.

If you go through the “normal” steps, your first thought might be to check out IIS Express itself. But even if you change the settings for the Default Web Site (or any other Web Site you have defined), that’s not good enough.

Instead, you need to modify the ApplicationHost.Config file. You will find it in your My Documents directory under IISExpress/Config. In that file, there is an <authentication> section that determines whether each of the different authentication providers is enabled or disabled. If you modify this file to match your Web application’s requirements, you will no longer get that annoying dialog box popping every time your load your Solution. Of course, you *might* have to changed it for different projects, that’s just the way it goes.

DevTeach Redux

Some of you might not be aware of it, but one of the premier development conferences is coming to Toronto in a few weeks (May 27-31). That conference would be DevTeach.

For the past 10 years, DevTeach has been bringing some of the best speakers from North America to Canada to talk about the thing that we’re most passionate about: development. You will hear topics covering a wide ranges of subjects, from Agile to Cloud, Mobile to Web development, SharePoint to SQL Server. If you are interested in hearing some of the most engaging and knowledgeable speakers, then DevTeach is the place to be.

In an earlier blog post, I mentioned that ObjectSharp will be out in force for the conference. Since then, we have added more speakers to the roster. Max Yermakhanov will be speaking on Hybrid Cloud and Daniel Crenna expounds on globalization in Web applications. Max is ObjectSharp’s resident IT guru. He is responsible for the fact that ObjectSharp’s infrastructure is as cloud-y as it can be. So he brings with him real-world experience related to seamlessly weaving Azure and on premise infrastructure.

Daniel is relatively new to ObjectSharp but not to the world of .NET. A former Microsoft MVP, he is responsible for a number of open source projects, including TweetSharp. His session on globalization in Web development will touch on the stuff that only comes up when you’ve gone through the crucible of actual implementation. And being in Canada, it comes up quite frequently.

ObjectSharp has been a sponsor and champion for DevTeach since its very early days. This year, the timing of the conference would have conflicted with our annual At the Movies event. So we put off At the Movies for a year. Because that’s how good this conference is.

So if you have been to one of our At The Movies events in the past, then I strongly suggest you look at DevTeach instead. Don’t worry…we’ll go back to doing At The Movies next year (it’s too much fun for us to stop). But until then, DevTeach is the place you should be at to hear the latest and greatest in development talks.

The New Model for Office and SharePoint Apps

As I write this blog entry, I’m flying to Atlanta to give the last of 13 seminars on the new App model that is available for Office 2013 and SharePoint 2013. I have taught this material to people all over North America, as well as in Paris. As a result, I have talked to a large number of people not only about the model, but also about their plans for it. This gives me a fairly unique perspective into how people are taking the new model, as well as how it will be adopted over the next 6-9 months.

What is “The New App Model”

In a nutshell, the new app model is conceptually similar for both Office 2013 and SharePoint 2013. The basic idea is that the application no longer needs to be installed on the client’s machine (or in the SharePoint farm). There is no assembly that is deployed onto the user’s system. Instead, a manifest, in the form of an XML file, is made accessible to the client software. This manifest file includes, among other things, a URL where the application lives. And that application interacts with your client software (whether it be Office or SharePoint) through a combination of JavaScript and server-side code. That’s right, the App Model allows you to create Web sites that are hosted any place you want, but appear to run inside of Office/SharePoint. And, by the way, the Web site can be constructed using any technology you want. There is no requirement that the site use the Microsoft stack. If you’re rather create Web applications using LAMP, that’s all fine and good in this model.

What’s the Benefit?

Well for the Apps for Office model, the benefit is that you don’t have to wrestle with VSTO or MSIs to be able to deploy your applications. There is (more or less) no administrative permissions required to install an application. And there is now an Office Store where users can search for and install your application. So your ability to reach more potential clients is much higher.

For the Apps for SharePoint model, there is no need for sandbox solutions. This is not to say that you still can’t write sandbox (or farm) solutions. You can. And they still have all of the same limitations that those applications had in SharePoint 2010. But the guidance is that they should no longer be needed. The client side object model (CSOM) has been expanded to the point where farm solutions are probably not required. And if you are working in a shared hosting environment (that’s everyone is SharePoint Online, as well as a number of clients of ours), then you can be freed from the limitations of the sandbox.

And What are the Problems?

The biggest problem is that, because the model is completely new, there is no compatibility with older versions of the products. This model will not work with Office 2010 or SharePoint 2010. At all. No way, no how. If you understand the details of what’s going on, you’ll understand why this limitation exists. But the practical impact is that your only audience for any app you write and want to sell is new users. In the corporate world, this could be a few years off. For SharePoint Online, it’s a little closer, as the back-end functionality is in the process of being converted, with the user interface to be upgraded over the next 12-18 month.

Along with the need to have users on the latest version, the capability of the interface with the software seems to be a little lacking in certain areas. I found this to be particularly true in the Apps for Office model. A number of people had interesting ideas for Word or Excel applications and their first choice for a user experience ran aground on the shoals of missing capabilities. For instance, there is no way to retrieve or modify the format for a particular cell. Nor is there the ability to have the app set the currently selected cell. Is this a critical lack of functionality? Possibility. But I also know a number of people who are on the development team and they are eager to address holes in the functionality, especially if there is a compelling story around the request.

Is It Worth Using?

I think that quick answer is ‘yes’. Now it could be that I’m biased…I have been teaching this material for a while. But I like to think that talking to people about the model, hearing what they want to do and working through how it might be done has given me perspective. And I don’t have a history of liking a technology just because I teach it.

Again, dividing between Office and SharePoint, I believe that app model for SharePoint will be transformative. In particular, if you have a Web-based application that has nothing whatsoever to do with SharePoint, it is simple to integrate the application with SharePoint. And put it into the SharePoint Store, increasing its visibility. The model also requires that people who create SharePoint applications need to rethink their approach. Instead of being forced to utilize SharePoint as a data store (a task for which is it not particularly well suited), you can use a real database. Yea!!!!

The app model for Office is a good one in cases where it fits. At the moment, that seems to be helper applications. Dictionaries, encyclopedias, image searching. Maybe an application that can perform calculations based on the data in the document. But at the moment, there do seem to be some pieces of functionality that I’d like to see put in place. And the model is so different from how users typically use Word/Excel that I can see it taking a little bit of time to see mass acceptance.

If you have any experience with the app model, either with Office or SharePoint, I’d like to hear how it went. What type of applications have you created? Was there missing functionality that you had to work around? I’m done with the teaching tour, but I’d still like to keep in touch with how people use the model.

ObjectSharp at DevTeach

If you are an aficionado of conferences, then odds are pretty good that you have already aware that DevTeach is coming to Toronto at the end of May (May 27-31, to be precise). If you have not attended, heard of or thought about DevTeach, then you’re in for a treat.

DevTeach is a conference. For developers. By developers. If you want to learn about the latest technology, then DevTeach is the place to be. This is true whether you are interesting in developing apps, using and administering SQL Server, or working with the latest mobile technology. You will hear from industry experts, people from not only all over North America, but also locals you can chat with afterwards. And when it comes to networking, there are few conferences that offer the opportunity to hang with as many of the best and brightest.

At ObjectSharp, we are proud to be a supporter of DevTeach. And we are lucky enough to have a list of associates who are knowledgeable enough to be able (and generous enough to be willing) to share their insights and experience with others. The following is a list of the sessions that are led by one of our own. If this list isn’t enough to entice you, then check out the full schedule here. Or you can just trust me and sign up here. Take advantage of the fact that all of this talent is within your reach to hear from and talk to.

clip_image002Colin Bowern

Designing with ASP.NET MVC and Web API – Tues, May 28

The State of (Corporate) HTML5 – Wed, May 29

Managing a Cross-Platform Code Base – Wed, May 29

Handing Identity Management for SaaS Apps – Thurs, May 30

clip_image004Bruce Johnson

var WebDeveloper = new OfficeAndSharePointAppDev; – Wed, May 29

Using Hybrid Solutions in Windows Azure – Thurs, May 30

Advanced Windows Phone 8 (full day, pre-conference session) – Mon, May 27

clip_image006Atley Hunter

Building Mobile Experiences that Don't Suck – Wed, May 29

HackTeach – Wed, May 29

MTM Test Suite–add requirement

One option in Test Manager for Test Suite is “Add Requirement” which adds the Requirement ID and it’s title as a test suite. Example below:




What happens if at a later date someone goes in and changes the title of the PBI?

First the change in the work item to the title does not show in your Test Suite. What has been added to Test Manager is an object on it’s own not a link to the actual work item. Think of it as a folder for tests related to your requirement.

What can I do?

There is the delete/add option however you will lose all test results associated to your test suite. When you delete a suite all test points contained within the suite are deleted. I would only use this when test execution has not happened yet for the test suite.

The rename option on a test suite can be used. Select test suite > right-click > Rename. In this option I would copy from the actual work item so the titles match.

How do you know there has been a change?

Often you don’t without someone telling you or finding by accident or creating a query to compare with. I’d like an alert that tells me when a Requirement title has changed and the Iteration Path. Both of these can affect the Test Plans and their Suites.

This happens in both MTM2010 and MTM2012.

Testa Smile