Deserializing Objects

Another post about the deserialization of objects after a web service call.  In this particular scenario, the web service method was returning an ArrayList.  On the client side, I was expecting to see a number of different types of objects.  But one of the objects, a custom class, was being returned as an array instead of an instance of the expected class.  And inside that array was an XmlAttribute and two XmlElement objects.  These were, it turned out, the XML nodes that were being returned from the web service method.  It's just that they weren't being converted into the desired object.

After a couple of hours of digging, I found out the cause.  Many of you are probably aware of the XmlInclude attribute that is used to decorate web methods to indicate that the schema for a particular class should be included in the WSDL for a web service.  That is not, strictly speaking, it's only purpose.  To be precise, it is used by the XmlSerializer to identify the types that are recognized during the serialization and deserialization process. It's this second piece that was the solution to my problem.

Because the web service being developed is intended to be used in an intranet environment, the same assembly was available on both sides of the fence.  To support this, the generated proxy class was modified to reference the shared assembly instead of using the class that is automatically included by WSDL.EXE.  It also meant that we didn't want to perform an Update Web Reference when the web service was modified...it would have meant making the manual changes to the proxy class again (Rich Turner might not remember me from the MVP Summit, but I was the one who was most excited when he described the ability to customize the proxy class generation being available come Whidbey).  The down side of this manual process is that, when objects of this type were added to the ArrayList that was returned, the proxy was not updated with the appropriate XmlInclude attribute. 

That's right.  The proxy class gets the same XmlInclude attribute.  WSDL.EXE generates the class with the appropriate attributes.  But if you're like me, then it's something you need to be aware of the next time an array of XmlElements appears unexpectedly on the client side.

Moving DataSets across Web Service Boundaries

A situation last week had me looking deep into the bowels of how DataSets are marshaled across web service methods.  The situation is as follows.

For programming ease, the typed DataSet that was automatically generated was modified to include a custom object as one of the column types.  If you've ever taken a close look at the code that gets generated when a DataSet is created from an XSD diagram, you'll realize that this is not a difficult thing to accomplish.  And, for the most part, there is little reason to modify the generated code. But that being said, there are times when doing so can make the life of the DataSet user a little easier.  And this isn't really the point of the blog, only the motivation for digging into DataSet marshaling.

The modified DataSet was being returned to the client by a web service method.  When the hacked column on the returned object was accessed on the client side, a StrongTypingException is thrown.  Another look at the DataSet generated code shows that the cause of this exception is a casting error in the property.  Further examination showed that the name of the class associated with the column was being returned.  As a string.  Naturally when the literal “This.Namespace.Class” is converted to an object of type This.Namespace.Class, casting exceptions are the result.

This behavior begged the question “why wasn't the object being XML serialized with the rest of the DataSet”.  After all, though I haven't mentioned it yet, This.Namespace.Class is completely serializable.  In particular, why was the name of the class being returned???  That didn't make any sense.

At this point, I broke out .NET Reflector and took at look at the details of the DataSet.  First of all, if a DataSet is returned across a web service boundary, it is not XML serialized in the same manner as other objects.  Instead of converting the contents of the DataSet to XML, a diffgram of the DataSet is generated.  More accurately, the WriteXml method is called with a WriteMode of Diffgram. Within the DataSet class, there is a method called GenerateDiffgram.  This method walks across each of the tables in the DataSet, followed by walks, in turn, through the DataTables and DataRows.  When it gets to an individual column, it loads up a DataStorage object.  More accurately, since DataStorage is an abstract class, it instantiates a DataStorage derived class, using the column type to pick the appropriate class.  A veritable DataStorage factory.  When the column type is one of the non-intrinsic types, the ObjectStorage class is used.  On the ObjectStorage class, the ObjectToXml method is called.  This is where the unexpected happens. At least, it was unexpected for me.  The ObjectToXml method does not XML Serialize the object!

In the case of the ObjectStorage class, what actually happens is that a check is performed to see if the object is a byte array.  If it is a base64 encoding of the array is returned.  Otherwise the ToString() method on the object is called. When it comes to re-hydration on the client side, a similar process occurs.  The difference is that the XmlToObject method in the ObjectStorage class instantiates the desired object by passing the string representation of the object (as emitted by the ObjectToXml method) into the constructor.

So what is the point of all this?  First, it explains why the name of the class was appearing in the generated XML.  Unless overridden, the output of ToString() for an arbitrary class is the name of the class.  It also explains why no object was being created on the client side, as the class I was working with didn't have the appropriate constructor.  My solution, which I freely admit is a bit of a hack, is to give the diffgram generation process what it's looking for.  I overloaded ToString() to return an XML document containing the values of the class, a poor man's version of the WriteXml method that is part of the IXmlSerializable interface.  I also created a constructor that took a string as a parameter and repopulated the properties of the class (the ReadXml portion of the process).  Problem solved.  But still, I have to wonder why ToString was used and not the IXmlSerializable interface methods in the first pace.  Here's hoping someone more knowledgeable than me will provide some insight.

Random ASP.NET AppDomain Restarts

Are you having a problem with apparently random application restarts in your ASP.NET application?  Does your session information mystically disappear with no discernable pattern?  Well, the following may help you out.

One of the things that ASP.NET does to help make it easier for developers to modify running web sites is to keep an eye on files that are part of the virtual directory.  If you drop a new version of a DLL into the bin directory, it takes effect from the next request on.  If you make a change to an ASPX file, it too is detected and becomes 'live' with any subsequent request.  No question that this is useful functionality.

A little known fact about this process is that ASP.NET is also aware of the amount of memory that maintaining two versions of the same assembly takes.  If changes were allowed to be made without a restart, eventually it could become detrimental to the overall performance ogf ASP.NET. To combat this, ASP.NET also tracks the number of files that are changed and after a certain number, performs an Application Restart.  As with any other restart, this causes the current session information to be lost.  By default, the number of changes before a restate is 15.  To modify this, you can change the numRecompilesBeforeApprestart attribute in the machine.config file.

While all of this is useful information, there is a small twist that can make our life more difficult.  The files that ASP.NET is watching while looking for a change is not limited to those with an .aspx extension. In fact, it is any file in the virtual directory.  Also, the name of the setting implies that the Recompiles are what is being counted.  It's not.  What is being counted is the number of files that change.  Or, more accurately, the number of changes to any file in the virtual directory. 

How can this cause problems?  What if you were to put a log file into the virtual directory of a web site.  What if that log file were opened and kept open while the information was written to it.  When that log file filled up, it is closed and a new one opened.  Of course, this close counts as a file change.  That would mean that after 15 log files are created (assuming no other changes to the web site), the virtual directory would automatically reset.  And since the amount of information written to the log file will be different depending on which pages/web methods are called, the frequency of the reset is dependent on factors that are not immediately apparent.  Imagine the joy of trying to discover why ASP.NET applications are being randomly reset without the benefit of this blog.

The moral of this story:  Don't put any non-static files into an ASP.NET virtual directory. My good deed for the day is done. 

Moving Data Across Service Boundaries

Yet another question that has come across my inbox more than once in the past week.  Which is a good enough reason to become the topic of a blog.  The question:

"Can a SqlDataReader by return by a web service method?"

The short answer:  No

If you attempt to do so, you will get an InvalidOperationException indicating that the SqlDataReader could not be serialized because it doesn't have a public constructor.  And while that is the technical result, the true reason is slightly different.

The SqlDataReader class is stream based.  That is to say that the connection to the database that feeds the SqlDataReader is kept open and busy for as long as the reader is working with data. If an instance of SqlDataReader were to be returned from a web service method, the client would attempt to do a Read.  Which would use the connection to retrieve data from a database that is no longer around.  And certainly not where the connection expects it to be.  Not a situation that's amenable to success.

When is a cache not really a cache

I spent a large portion of the day inside of the Caching Application Block.  Specifically, a colleague and I were tracking down what appeared to be a nasty threading bug caused by having the creation of two different cached objects that were related to one another.  As it turned out, the bug that existed couldn't explain away all of the behaviors that we observed.

As it turns out, there was what appears to be a poorly documented aspect of the Caching Application Block that was causing us grief.  The problem is that the default number of objects that the cache can store before scavenging begins is set very low.  Specifically, it is set to 5.  As well, there is a UtilizationForScavenging setting (by default, it is 80) that lowers this number ever more.  Once the cache contains the (maximum * utilization / 100) items, the scavenger starts to remove the excess items from the cache, trying to keep the number of items below the calculated value.  With the default values, this means that no more than 3 elements will be saved in the cache at a time.  The scavenging class uses a least recently used algorithm, however if you're using an absolute time expiration, there is no 'last used' information saved.  So the scavenger appears to remove the last added item. 

That's right.  Unless you make some changes to the default config, only three elements are kept in the cache.  Probably not the performance enhancer that you were looking for from a cache.  Fortunately, the values can easily be changed.  The can be found in the ScavengingInfo tag in app.config. And there is no reason not to set the maximum value much higher, as there is no allocations performed until actual items are cached. It was just the initial surprise (and subsequent fallout) that caused me to, once again, question how closely related the parents of the designers were.  But only for a moment. ;)

As one further word of warning, if there is no ScavengingInfo tag in the config class, then the default class (the same LruScavenging class just described) is used. And instead of getting the maximum cache information from the config file, a file called CacheManagerText.resx is used.  In that file, the entries called RES_MaxCacheStorageSize and RES_CacheUtilizationToScavenge are used to determine how many items to keep in the cache.  Out of the box, these values are set to the same 5 and 80 that the config file contains.

Solving the "No such interface is supported" problem

I was asked a question today about a fairly common error that occurs when serviced components are used in conjunction with ASP.NET. Specifically, a COM+ component (one that is derived from ServicedComponent) was being used on an ASP.NET page.  When the page was loaded, an error of "No such interface is supported" was raised.

To understand the why of this error requires a little bit on knowledge about COM+.  When a serviced component is first instantiated, the CLR checks the COM+ catalog for information about the runtime requirements of the class. But if the class had not previously been registered, then no information will be available.  To correct this discrepancy, the CLR automatically creates a type library for the class and uses that information to populate the COM+ catalog. This mechanism is called lazy registration. 

But wait.  It requires a privileged account in order to register a component in the COM+ catalog.  In particular, you need to be a machine admin.  And, unless you have been silly enough to grant the ASP.NET user admin rights to your machine, the update of the COM+ catalog fails.  No catalog information, no instantiation.  An unsuccessful instantiation means an exception.  An exception that includes an error message of "No such interface is supported". Go figure.

So ultimately, the solution is not to depend upon lazy registration of COM+ components that are deployed for use in ASP.NET.  Instead, perform the registration using the regsvcs.exe command manually.

Automatic mitigation for ASP.NET vulnerability

By now, most of you will have heard about the ASP.NET vulnerability that allows creatively formed URLs to bypass forms or Windows-based authentication.  And while there has been a piece of code that can be added to global.asax, Microsoft has released a more easily deployed mechanism for mitigating the security risk.  Check out http://www.microsoft.com/security/incident/aspnet.mspx to download an msi file that installs an HTTP Module that protects all of the sites on a  web server.

WS-Security vs. SSL

A blog post by Doug Reilly brought to mind discussions that I had with some service architects earlier in the year.  The question was whether it was better to use SSL or WS-Security to secure SOAP messages as they travel from the client to the server and back again.  While using SSL is certainly the easier of the two choices, there are a number of reasons why WS-Security is generally superior.

SSL Provides In-Transit Security Only

The basic mechanism behind SSL is that the client encrypts all of the requests based on a key retrieved from a third party.  When the request is received at the destination, it is decrypted and presented to the service.  This is a well understood process.  However, when you look a little deeper, you'll begin to realize that the request is only encrypted while it is travelling between the client and the server.  Once it hits the server, it is decrypted from that moment on.

To be completely accurately, it might not even need to hit the server to be decrypted.  If, for example, you have a proxy server in front of you web server, it is possible that the decryption certificate has been installed there.  That way the server can examine the message to determine the correct routing.  However, the message may not be re-encrypted before it is set to the web server that will actually handle the request.  So now that  'secure' request is travelling along a network in clear text.  Granted, the network that is travels along is quite likely the internal one for the company hosting the server.  Still, there is the possibility that sensitive data can be picked up.

Further, what if the web service logs all of the incoming requests into a database.  Now not only does the request travel unencrypted across the wire, but it is also stored in a format for all to see.

WS-Security alleviates this problem by maintaining its encryption right up to the point where the request is being process.  Also, if the request is logged, the logged version will quite likely be encrypted (the logging portion of the service *could* log the message in unencrypted form, but it would have to do so explicitly).

Targeted Security

If SSL is used to encrypt a web service request, it's an all or nothing proposition.  SSL secures the entire message, whether all of it is sensitive or not.  WS-Security allows you to secure only that part (or parts) of the message that needs to be secured.  Given that encryption/decryption is not a cheap operatio, this can be a performance boost.

It is also possible with WS-Security to secure different parts of the message using different keys or even different algorithems.  This allows separates parts of the message to be read by different people without exposing other, unneeded information.

Faster Routing

Although not part of the mainstream yet, look for intelligent load balancing based on the content of incoming requests in the near future.  When this does happen, wouldn't it be better not to have the router decrypt the request before determining where it should go?

So given all of this, when why would you need to use SSL?  Because there are still a lot of people for whom SSL is the ultimate in security over the web.  Without the comfort of https, some companies feel that their information is being sent naked into the wild.  Not true, but it's not always appropriate to get into screaming matches with clients. ;)  Sigh.  I guess more educating is in order.

Update:  My colleague, John Lam, pointed out that my comment about the key to SSL encryption being retrieved from a third party was inaccurate.  In actuality, the SSL mechanism involves the following steps (taken from here)

  1. A browser requests a secure page (usually https://).

  2. The web server sends its public key with its certificate.

  3. The browser checks that the certificate was issued by a trusted party (usually a trusted root CA), that the certificate is still valid and that the certificate is related to the site contacted.

  4. The browser then uses the public key, to encrypt a random symmetric encryption key and sends it to the server with the encrypted URL required as well as other encrypted http data.

  5. The web server decrypts the symmetric encryption key using its private key and uses the symmetric key to decrypt the URL and http data.

  6. The web server sends back the requested html document and http data encrypted with the symmetric key.

  7. The browser decrypts the http data and html document using the symmetric key and displays the information

You will notice that, while there is a third party involved in validating the certificate, the key does not come from the third party but from the server.

John also mentioned that the availability of SSL Accelerators makes the performance argument moot.  I don't agree with this.  While SSL Accelerators certainly increase the throughput of secured sites, we need to compare apples to apples.  There are also XML Accelerators available in hardware to decrypt incoming requests.  Certainly using hardware makes it easier to justify staying with just SSL, all you're really doing is pushing the bottleneck further out. Ultimately, because encryption is a computationally expensive operation, the less that gets encrypted the greater the overall throughput.

Finally, there is one further reason to choose WS-Security over SSL that I forgot to mention.  SSL is closely tied to HTTP.  Which is to say that SSL can't be used if the mechanism for transporting service requests is something other than HTTP.  At the moment, this isn't the case for the vast majority of requests.  But there are already SOA examples using UDP and SMTP as the transport.  WS-Security works independently of the underlying protocol, making it much easier to adapt to whatever the future requires.

Access Denied on Web Service Calls

This question has been asked of me enough that I feel it's worth a blog.  It's not the this solution is unique, but I'm hoping that Google will do its thing with respect to getting the word out to people who need it.

First of all, the symptom we're addressing is an HTTP status code of 401 (Access denied) that is returned when making a call to a web service method through a proxy class. The solution is quite simple.  Actually, to be more precise, there are two solutions, with the best one depending (of course) on your goals. First, the virtual directory in which the web service is running can be modified to allow anonymous access.  Alternatively, the credentials associated with the current user can be attached to the web service call by including the following statement prior to the method call.

ws.Credentials = System.Net.CredentialCache.DefaultCredentials;

Now that the solution has been addressed, let's take a brief look at why the problem exists in the first place.  When a request is made from a browser to a web server, the server may require some form of authentication. It is possible that the initial request can include the authentication information, but if the server doesn't see it, then an HTTP 401 status code is returned.  Included in the response is an Authenticate header which indicates the type of authencation that is expected and the name of the realm being accessed.  The browser then reissues the request, providing an Authentication header including your current credentials.  If that request fails (returns another 401 status), the browser prompts you for a set of credentials.  This is a sequence of events that all of you have seen before, even if the underlying rationale is new information.

However, when you make a web method call through the proxy class, the handshaking that goes on with respect to authentication doesn't take place. It's the browser that does this magic and the proxy class doesn't include that code. The result is that the client application receives the access denied status. So you need to configure the virtual directory to not reject unauthenticated requests (by turning anonymous access on) or provide your own set of credentials with the call (by populating the Credentials property on the proxy class).

The Danger of using

It is well known that the C# using statement is quite handy, especially when the object including in the using statement needs to be disposed.  However, there is a danger associated with having certain tasks done for you.  That is, that it is easy to forget exactly what is going on.

The scenario is that an HttpModule was being created by a client of ours to log incoming requests and outgoing responses to a web service.  I've done this before and it is not exceptionally difficult.  We created a handler for the BeginRequest event and accessed the Request.InputStream attribute. This stream was used as the basis for a StreamReader, as can be seen in the following code.

HttpApplication app = (HttpApplication)sender;

using (StreamReader sr = new StreamReader(app.Request.InputStream))
{
   
string content = sr.ReadToEnd();
    // Write content to a log file

}

app.Request.InputStream.Position = 0;

Seems innocuous enough.  But it's not.  The result of a call to a web service in which this HttpModule is installed results in an System.Xml.XmlException. The text of the message indicates that the root element of the XML document is missing. For those unfamiliar with streaming problems, this particular error message is frequently caused by an empty stream being processed by ASP.NET.

The root cause can be found in what the using statement does.  In particular, when the sr variable goes out of scope, it is disposed.  This is the non-obvious behavior I'm going about.  Not that it should be surprising that using causes the object to be disposed you understand.  That's the purpose of using after all.  But because you don't see an explicit Dispose or Close method, the solution to the problem is not obvious.  If you look at the definition of the Dispose method on the StreamReader, you will find that the underlying stream is closed. Not good, as it means that the remaining HttpModules and ASP.NET get nothing to work with.  The solution, for those who are interested (as pointed out to me by Marc Durand at Agfa during our debugging session), is to use the GetBytes method on the InputStream.  This keeps the stream open and the data flowing.