A Spoonful of SOA's Alphabet Soup

Last year, American corporations spent over $4 billion integrating heterogeneous applications (otherwise known as Enterprise Application Integration or EAI). Why? Because one of the best ways to squeeze more profits out of a company is to reduce the costs of doing business. Among the many other possibilities of boosting the bottom line(some of which have now been identified as illegal) is finding ways to streamline the flow of information through the corporation.

But what does that salute to business motherhood and apple pie have to do with web services? Because the pressure to lower costs is the reason that almost every developer should pay attention to the innovations and standards that are at play in the web services arena. Even with all of the hype surrounding the technology, it seems likely that web services will play a large role in EAI for the next few years at least. It is for this reason that I'm creating this series of articles. My goal is to walk developers through the process of designing and implementing a commercial-grade web service and supporting architecture.

To make sure that we're starting on the same page, the rest of this article will focus on some definitions. If you're already familiar with XML, SOAP, WSDL, UDDI and other various web service acronyms, feel free to skip to the next article. Otherwise, read on to learn about the alphabet soup that surrounds web services

What is a Web Service

To get our discussion started, let's define a web service. One of the more formal definitions that I've seen is "loosely coupled, reusable software components that semantically encapsulate descrete functionality and are distributed and programmatically accessible over standard Internet protocols". If you break apart each of the terms in this complicated rambling, you'll find the kernal of what most people expect a web service to be. That is a set of functions that can be accessed remotely using TCP/IP as the transportation medium. This broad definition covers almost all of the instances for which web services are suited, allowing us to explore the most common deployment options.

Ignoring the technical for a few seconds, let's consider the problem that web services are intended to address. That would be the need to access functionality provided by a remote server through the Internet. One of the most common examples is a stock quote function. If you're developing a corporate site, you don't want to focus on the details of retrieving quote information about your company. It's not worth it to dig into stock market streaming for such non-critical information. So instead you search the Internet until you find a company that specializes in providing stock market information. Fortunately for you they have implemented a technique that allows you to retrieve the needed data by making a call to their web site, leaving the formatting of the result up to you. And you have just experienced the power of a web service.

But the utility of a web service does not require that a third party be brought into the equation at all. Many companies are starting to deploy pieces of functionality as a web service within their own walls. This allows companies to experiment with web services without becoming dependent on an outside service. Not to mention avoiding connectivity, speed and security issues.

All web services start with a request being created. The source can be a browser or an application. Regardless, the request is formatted into an XML document and transmitted across the Internet to the web server. The server has a process that listens on a given port (usually, port 80, the HTTP port). When a request arrives, the XML document is parsed to determine what components needs to be instantiated and what methods are called. Finally, the result is bundled back up into an XML document and sent back to the calling application.

Now we have glossed over many of the issues that make developing web services challenging. This includes user authentication, transactions (grouping multiple SOAP requests into a single unit) and security. We will be dealing with these areas later in our series.

SOAP

Since SOAP is one of the focal points of the hype associated with Web Services, it makes sense to start there. And as much as SOAP might appear to a complicated entity, its basic purpose is quite simple. The Simple Object Access Protocol defines the XML format for messages that are sent between two processes. That's it, that's all. Nothing magical at work here.

But as with most technical subjects, the devil is in the details. Just sending an XML message using SOAP is by no means sufficient to have two applications communicate with one another. Each application needs to understand the context of the messages in order for SOAP to be effective. And that leads to the requirement to define data types (Section 5 of the SOAP specification) or RPC function call formats (Section 7 of the specification). Not to mention the techniques that utilize the HTTP port to send messages, something that, while not in the required portion of the SOAP specification, is supported by almost every SOAP implementation.


Figure 1 - SOAP Message Structure

The format of a SOAP message is quite straightforward. The message is delivered in an envelope. The envelope contains two parts. The Header contains information that describes how the recipient should process the message. The Body contains the payload of the message (in other words, the contents) and an optional section called SOAP Fault. This optional section contains error or status information.

UDDI

As we all know, building a better mousetrap is not sufficient to get the world to beat a path to your web site. As with any idea, you need to publicize the functionality that your web service makes available. Fortunately, there is a 'yellow pages' for web services and a set of functions that can be used to create, manipulate and search the entries. So when someone needs to find that stock market web service, they could browse this repository, searching for the functionality that they require.

The format used to add entries to these yellow pages is called the Universal Description, Discovery and Integration standard or UDDI. There are currently two main repositories for this information, one at http://uddi.microsoft.com/inquire and the other at http://www-3.ibm.com/services/uddi/inquiry.api. Ultimately, the goal of this evolving standard (and, when you get right down to it, web services in general) is to eliminate the human part of business transactions. The ultimate vision is that when a customer places an order with your company through the web service whose interface you have exposed through UDDI, the service will be able to search the directory to see that client has a web service that allows an invoice to be submitted electronically. Utopian vision? Currently, yes. Pipe dream? I don't think so. Give the IT community enough time, and this functionality will make its way into enough of the mainstream to be considered a requirement for 'serious' business.

WSDL

Next up in our alphabet soup is Web Services Description Language or WSDL. For those of you like to pronounce your acronyms, that would be "whiz-dull". As you might guess, the purpose of WSDL is to describe the web service that is being exposed. More specifically, it provides the details about the functionality that is implemented, what messages are used to request the service and the sequence in which the messages are exchanged. In other words, everything that an intrepid developer needs to know in order to use our service in their application.

Now defining the WSDL file for a web service is not a requirement for it to be used. However, not having one is like selling an ActiveX component without providing documentation, on-line or otherwise. And even though the format of a WSDL document is convoluted (it is an XML-based standard, after all), most development environments (including Visual Studio) provide tools that will automatically generate WSDL for you. So there is no reason (other than laziness) for not creating a WSDL document for any web service that you develop.

XML

The idea that I can describe XML adequately in a couple of paragraphs is ludicrous. The subject is complex enough that books are devoted so just the basic structure and usage. In my opinion, the reason for this difficulty it that XML is both flexible and extensible. So it can be used in almost any situation where data needs to be stored or transferred. Still, I'm brave (or stupid) enough to give a brief definition a try.

My two second description of XML has always been 'comma-delimited files on steroids'. My reason? The purpose of a CSV file is to store information so that another application can read/import it. This works fine, so long as the receiving application is aware of the structure of the incoming file. The best that CSV files can offer in this area is to place field names at the top of each column. So the possibility for problems of interpretation (or at least situations that require programmer-to-programmer discussions) exists. XML improves on this by applying a more rigorously enforced structure to the data. The order in which the data is physically laid out in the document is irrelevant, but each data field is named, so that it is easier to find. And when you combine XML data with DTDs or Schemas, the type and range of data are now subject to validation. Certainly a superior mechanism than CSV files.

For the most part, our use of XML will be to package up the data that needs to be send to the web service for processing and to receive back the results. As such, our XML documents will be relatively straightforward. But your complexity may vary, depending on the solution that you're trying to provide.

So much for the common protocols/acronyms.  These standards are enough to get you started in the web services world.  But anything more than a cursory look at the technology reveals many more TLAs or FLAs (three- and four-letter acronyms).  For completeness (actually, there is no hope for completenesss: the number of acronyms increases too quickly), here are some additional standards that are commonly used.

WS-Security

One of the main roadblocks to more universal acceptance of web services in production applications is the concerns raised about the security of messages as they fly across the Internet.  The WS-Security standard is a set of SOAP headers which are intended (and actually succeed, when used correctly) to define the authentication, encryption and signing that are used within a particular message.

While extensible enough to allow for custom security to be implemented, WS-Security natively supports a wide variety of security models, including Public Key Infrastructure (PKI) and Kerberos tokens.  And when combined with the Web Services Enhancements (WSE) toolkit, developers can easily integrate X.509 certificates, role-based security, WS-Addressing (see below for an explanation) and DIME attachmements (again, more information below). WS-Security is both mature and versatile enough to address all but the most stringent security requirements.

DSIG - Digital Signatures

Another aspect of the security associated with web service messages is known as non-repudiation.  In plain terms, this means that the recipient of the message is absolutely certain of who sent the message. As well, the recipient is absolutely certain that the content of the message has not been changed. If both of these conditions are met, then the message has the potential (not that I say potential here, not the reality...yet) of being a legally binding document. 

The typical approach accomplishing non-repudiation in a SOAP message is through digitial signing. Without getting into the dirty details, it involves hashing the message so that the change to a single byte will be detected by the processing.  For those who are interested in the details, check out here. But ignoring the specifics, the WSE toolkit makes it simple to digitally sign SOAP messages.

DIME - Direct Internet Message Encapsulation

Using jargon, DIME is a binary message format that is designed to encapsulate one or more payloads into a single message.  In real world terminology, the DIME protocol is used to include attached files to a SOAP message.

In order for DIME to be truly useful, any type of file must be attachable to a message.  This causes some temporary concerns.  Not every file is suitable for transmission in an XML format.  Consider for a moment the problems that would arise if the attached file contained characters that looked l ike an XML tag.  Also, binary formats, such as audio, video or images, can cause a problem when embedded in an XML document.  And since DIME is associated with the XML-based SOAP message, the underlying technology used to transmit the attachments must be compatible.  This means base64. 

Base64 is a encoding mechanism that allows arbitrary binary information to be translated into a format that is appropriate for both XML documents and e-mail.  The basic approach is to break the source data into 24-bit blocks.  Each block is translated into four characters with each character representing 6-bits of the 24-bit block.  The characters (there are 64 possible choices, hence the name base64) in the translation are present in all variants of both ASCII and EBCDIC, making it completely compatible with XML.

WS-Policy

One of the issues associated with services is determining which methods are exposed and which parameters. That problem is handled through WSDL.  Next up is determining the requirements and capabilities provided by a particular web service.  That information is made available through the Web Services Policy Framework, otherwise known as WS-Policy.

A WS-Policy document defines a set of assertions about the policy supported by a particular web service.  Possible assertions include encryption methods, the types of digital signatures which are recognized, and the presense or absense of particular WS-Security information. The assertions can be queried in real-time before constructing the service request.  The requester can then tailor the encryption, authentication and signing used on the message to match the needs of the web service.

WS-Routing

Workflow is an up and coming issue associated with SOAP messages.   At the moment, many web services are simple point-to-point solutions. In that situation, web services closely resemble Remote Procedure Calls (RPC). But the real future of SOA involve the processing of documents. And when a SOAP message contains a document, there is an implication that the document will need to pass through a number of steps before it has finished processing.  Say, for example, the document is a sales order. That document will need to be processed through the incoming sales service, the manufacturing service and the shipping service. And to keep track of the services in the process flow, a standard is required.

WS-Routing is a protocol that is used to route SOAP messages from service to service in a work flow.  It supports all of the common types of flow, including request/response, peer-to-peer and one-way messaging. At the moment, there are no tools to assist with implementing WS-Routing.  But if you can wait as as yet undermined amount of time, it would appear that this functionality is addressed in Indigo, Microsoft's upcoming web services middleware application.

WS-Referral

While WS-Routing handles the basic flow of messages, the interim steps are static.  Once defined, the message does what it has been programmed to do.  The WS-Referral protocol allows for the modification of the flow while the message is in transit.  As a result, this protocol and WS-Routing are usually found hand-in-hand.  Which also means that there are no currently available tools to help with implementing the protocol.  So you will either need to grit your teeth and bear it or wait for Indigo.

WS-Addressing

The WS-Addressing protocol is used to identify the endpoints of a web service.  The endpoint of a web service looks like the URL that is specified in the proxy.  It is the place to which the service request is made. Internally (at least to IIS) it defines the server and code that implements the methods exposed by the service. 

While in most circumstances, the endpoint of a web service is static (at least, it is most of the time), there are times when the actual endpoint needs to be more flexible. This is particularly true when the request passes through various routers and gateways.The WS-Addressing specification provides a means for redefining the endpoint of a web service as the message is in transit.

GXA - Global XML Web Service Architecture

While the term GXA still appears in a great deal of the older document from Microsoft regarding web services, the term has fallen out of general use.  It referred to a collection of extensible, orthogonal specifications related to providing require functionality for web service requests.  The standards included in GXA were WS-Security, WS-Routing, and WS-Referral.

SOA - Service Oriented Architecture

Service Oriented Architecture is the next great thing in web services.  Put simply, it is a style of designing applications based on the loosely coupled architecture that is forced by using services.  A service is a piece of software that exposes a set of methods that can be invoked (called an interface).  Because the only way to interact with the service is through the interface, the actual implementation and even the location of the service is irrelevant to the calling application.

But utimately, the benefits associated with using SOA, along with the details involved with designing and implementing a SOA-based are beyond the scope of this article.  The purpose of this article was to introduce you to some of the concepts behind web services and to explain some of the acronyms that inhabit this world. Future articles will dive more deeply into the concepts, benefits, issues and implementation details of SOA in the real world. I hope that you'll follow me on my journey down the rabbit hole.