Developing Streaming Pipeline Components - Part 4

This post I will focus on BizTalk's stream design, and how it allows for the XmlReader and XmlWriter coupling detailed in Part 3.  Once this topic has been covered, we are finally at a point where all of this background can be applied to achieving actual real world tasks.  Yes I know, about time...

This post I will cover the key BizTalk pipeline stream - the XmlTranslatorStream, and therefore will be a long post.  This post covers what is probably the most crucial details for understanding the whole streaming design.

...

Where I left off in my prior post, I had discussed the issues with needing to be able to chain the XmlReader and XmlWriter together at the right time - to move data through from the inbound stream to the outbound stream on demand.  In other words, when someone further down the pipeline from us needs some data to read - this is when we want this movement of data to occur.  Also, BizTalk requires that a message part's data is actually a System.IO.Stream type.

Introducing the XmlTranslatorStream

Instead of trying to re-format what I had already written up for the Toronto BTUG presentation I did on the same topic as this series, I thought I would steal my own slide.

This slide shows the key BizTalk stream when it comes to writing streaming pipeline components - the XmlTranslatorStream, and its relation to the base System.IO.Stream.

BizTalk Stream Design

 

As you can see, the XmlTranslatorStream has a number of parent classes in the inheritance hierarchy, and at each layer a piece of functionality is added.  I will go into the specifics of what occurs in each level later in this post, and how it is implemented internally.  The end result is that the XmlTranslatorStream can be provided with an XmlReader as its source of data, and when the XmlTranslatorStream is read it internally chains the reader's data through to an XmlWriter - resulting in an output from the XmlTranslatorStream that is identical to the inbound stream that the XmlReader is reading.

To put this in relation to BizTalk pipelines, this means that in a pipeline component we can now:

  • Attach an XmlReader to the inbound message's data stream
  • Create a new XmlTranslatorStream, and provide it with the XmlReader we created
  • Set the XmlTranslatorStream to the outbound message's data stream

Note: This does not explain how our custom logic is going to be performed.  I will go into this in a future post, but it usually involves passing a custom XmlReader to the XmlTranslatorStream, or deriving from XmlTranslatorStream and overriding the functionality it provides.

Once the next component in the pipeline gets a reference to its inbound message's data, it is going to be a reference to the XmlTranslatorStream that we set in the prior component.  This component, and all other components in the pipeline perform the steps in the bullet points.  Note that this is for general cases - I'm sure there are many cases where other steps will need to be performed in the pipeline component's main method.

Therefore, at the end of the pipeline when all pipeline components have been called by BizTalk (but no data has been read yet), we have a stream structure like is depicted in the following slide:

Wrapped Streams

This slide attempts to show that we end up with a series of layered streams.  This is for a receive pipeline, and it shows (from inner circle to outer):

  • A FileStream representing the message received by the file adapter (this isn't actually the type of stream the adapter gives us, but it's one that everyone is familiar with)
  • An XmlDasmStreamWrapper which is returned by the BizTalk XML Disassembler component
  • An XmlValidatingStream which is returned by the BizTalk XML Validator component
  • An XmlTranslatorStream returned by a custom pipeline component
  • Also, note that each inner layer of stream is "attached" to the next outer layer by the technique I discussed earlier.  That is, the outer stream contains an XmlReader which is reading over the inner stream as its source of data
  • Note that XmlDasmStreamWrapper and XmlValidatingStream both derive from XmlTranslatorStream, and therefore follow the same pattern as it

Now if you refer back to the "Streaming - Processing Execution Order" slide from Part 2, and the description I gave for how it executes, things should hopefully be starting to fall into place about how all of this operates.  Now when BizTalk starts reading the stream provided by the last component in the pipeline (represented by the blue circle in the above slide), this stream has no data to provide to the read request.  As a result, it reads from its internal XmlReader.  When the read method returns, it will push the data into its internal XmlWriter, and the data that the XmlWriter generates will then be given to the stream that requested the data.  But calling read on the outermost XmlReader in the chain will cause a number of calls before it returns...

You will notice that the only stream that has a physical backing store is the innermost FileStream.  All other streams are simply a wrapper around another stream.  Therefore, as each stream is read, it has no data to provide, so it spins its XmlReader, pulling data from the stream it is wrapping.  This in turn does the same thing until the read request makes it all the way into the innermost FileStream, which fulfils the read request as it actually has data to provide.  The call then propagates back down the stack, providing the data to the wrapping streams until it makes it all the way back to the outer stream and data is returned to BizTalk.

So far, in order to make it easier to understand where the XmlTranslatorStream fits in in the big picture and to be able to apply it to something developers can related to earlier rather than later, I have skipped going into the inner workings of it.  I have focused on how it can be used within the context of a BizTalk pipeline, and how data is passed from layer to layer of streams.  Now I will crack open the XmlTranslatorStream to reveal how it works in more detail.

Inside the XmlTranslatorStream

This section will show you how a call to the stream's Read method results in the internal XmlReader being read, and how that data is returned to the caller of Read.

I will now call upon another one of my presentation's slides.  This slide also shows the class hierarchy as did the one earlier, but also explains exactly what happens at each stage.  It details the method calls that will go on the stack as the request for data is fulfilled.  This slide also had shock and bore type animations, but unfortunately you won't be able to see them in this picture.  The main items to note are the red "no smoking" sign, which represents the XmlReader; the white one which represents the XmlWriter; and the drum which represents the internal buffer (more on this after the slide):

Data Through XmlTranslatorStream

The main thing to note which I haven't really covered for the chaining of the XmlReader and XmlWriter inside the XmlTranslatorStream is that there is also an internal buffer (represented by a MemoryStream), which is what the internal XmlWriter is actually outputting to.  This buffer is provided in the base class of XmlTranslatorStream - the XmlBufferedReaderStream.

So, to talk through what the slide is showing - the steps that occur in a single Read call to the XmlTranslatorStream:

  • Someone calls Stream.Read on the XmlTranslatorStream, passing in how many bytes it would like
  • The abstract Read method is implemented up at the EventingReadStream class.  (I will detail more about this class and how its features can be useful in a future post.)  It raises an event and then calls its abstract ReadInternal method, passing in all the parameters provided to the Read call
  • The abstract ReadInternal method is implemented up at the XmlBufferedReaderStream class.  The buffer is checked for data
  • If the buffer has enough data to fulfil the request, it is taken from the buffer.  Otherwise, the buffer needs more data, so XmlBufferedReaderStream calls its abstract ProcessXmlNodes method, providing it with the number of bytes it still requires after taking what was left from the buffer (the buffer may have fulfilled some of the data request)
  • The abstract ProcessXmlNodes method is implemented up at the XmlTranslatorStream.  When it is called by the XmlBufferedReaderStream, the XmlTranslatorStream needs to push more data into the buffer via the XmlWriter.  But the XmlWriter is given its data to write from the XmlReader.  Therefore, the XmlReader is read, and the new node under the reader is copied over to the XmlWriter, hence filling the buffer.  Each time the XmlReader's node is copied to the XmlWriter, the number of bytes in the buffer is compared against the number requested in the ProcessXmlNodes call.  If the buffer is still not full enough, more read/write calls are made.  When there is either enough data in the buffer, or the reader becomes end of file (EOF) before the number of bytes can be fulfilled, the call returns, passing the number of bytes that were pumped into the buffer during the ProcessXmlNodes call.  The XmlBufferedReaderStream then returns the data to the EventingReadStream as a result of the ReadInternal call, which then returns the data to the caller of the Read method, and the call completes.

That's it!  Those steps occur every time the stream is read.  The key point to take out of this is that we are pulling data through an XmlReader every time you read the stream (well, technically, every time you read the stream and the buffer cannot fulfil the request).  This is our main extensibility point - without utilising this extensibility, all we are doing is slowing the process down by consuming and replicating the data for no reason!

In the next post I will cover how we can take advantage of this extensibility and get the XmlTranslatorStream and the XmlReader we provide it to perform our processing logic.  By doing this we are moving away from an in memory approach and towards a streaming approach, which is exactly what we are trying to achieve.  I will demonstrate this by showing how BizTalk pipeline components use this technique for their processing.

[Update 22nd May 2008 - updated incorrect spelling of XmlBufferedReadStream to XmlBufferedReaderStream. The slide deck screen shots are still incorrect.]