Developing Streaming Pipeline Components - Part 2

In Part 1 I provided a few comparisons of using streaming versus in memory approaches to pipeline component development.  In this post I will move into detailing the concepts behind developing streaming pipeline components (as in how is it technically different from the point of view of a pipeline component), and how a different mindset is required to understand how it works.  I won't yet delve into the really technical details as it's probably best to understand these concepts before looking into how this is achieved at a lower level.

...

One of the hardest concepts to grasp when moving from developing with an in-memory mindset to a streaming mindset is that in general your pipeline component itself does not modify or process the message data.  With a streaming approach, the message is modified as it passes through a stream (or a series of chained streams).  Developers are used to looking at the pipeline designer surface in Visual Studio and thinking that "component 1 will process it, then BizTalk will pass the resulting message to component 2 ... etc until the last component in the pipeline processes the message, and then BizTalk will put the resulting message in the message box database (or deliver it to the send adapter in the case of a send pipeline)".  This analysis is actually fairly correct if all the components in the pipeline do NOT process the message in a streaming manner.  However, with a pipeline utilising streaming components it is much different.

When taking a streaming approach, we want to be able to process the message in small chunks so as not to load much data into memory.  The way this is generally achieved is by:

  • Using a custom implementation of System.IO.Stream
    • NOTE: by custom implementation, this also includes the custom implementations that BizTalk provides out of the box.  I will go into deep detail for some of these in future posts, as well as writing your own custom ones 
    • Custom streams wrap other streams.  The inner stream becomes the source of data for the outer stream
  • Using a custom implementation of System.Xml.XmlReader (same rules apply to custom as above), coupled with a custom implementation of Stream.  I.e. you will always need a custom implemented stream, but not always need the custom XmlReader - it can depend on how/where you implement your logic

The processing that you intended your pipeline component to do in the Execute method (or whichever method applies to the type of component you are writing) is actually placed in logic that is contained in either the XmlReader or Stream.

  • NOTE: Execute is called for standard pipeline components
  • For dis-assembler components, BizTalk calls Disassemble AND GetNext, but the equivalent of the Execute method in terms of where to set the message part Data property is GetNext
  • For assembler components, BizTalk calls AddDocument AND Assemble, but the equivalent of the Execute method in terms of where to set the message part Data property is Assemble
  • Any reference to Execute is referring to the applicable method for the component type as stated above

Moving this logic into the XmlReader or Stream allows the logic to execute as the message is read by BizTalk.  And no - we don't just simply move the location of the logic from the pipeline component to one of these XmlReader or Stream classes - we need to change the way we actually achieve it too, otherwise we are just moving the issue.  As stated earlier, designing and writing these classes will be covered in future posts.

To tie our custom streams in with BizTalk, the BizTalk message part allows setting the Data property, which is actually of type Stream.  Therefore, we can set this property to our custom stream implementation and as a result when the stream is read our processing will execute.

Therefore, when a pipeline that contains components which have been designed in a streaming manner is executed by BizTalk, the behaviour is this (NOTE: this ignores per-instance configuration calls to Load which are a result of overriding properties in the send/receive location.  I have posted here about how per instance affects calls to Load):

  1. The Load method is called by BizTalk for each component in order of pipeline location
  2. The Execute method is called on component 1 by BizTalk
  3. Component 1 creates a new instance of a custom Stream (and optionally a custom XmlReader) and sets this as the return message part's Data property.  The source of data for the reader/stream is the set to the inbound stream.  The custom reader/stream will perform the actual processing (Eg. modification to data or promotions to the message context based on the data etc).  However, this will not occur until the stream that was passed back starts to be read by the next component in the pipeline, or BizTalk (in the case where it was the last component in the pipeline)
  4. BizTalk calls component 2's Execute method, passing to it the message returned from component 1.  Therefore, when component 2 accesses the message's Data property, it is actually retrieving a reference to the stream attached by component 1
  5. Component 2 does the same as component 1.  Note that no data has been read yet, but each component is wrapping another layer of stream around the stream it received
  6. Steps 4 and 5 are repeated for each component in the pipeline, passing in the message returned by the prior component
  7. Depending on whether the pipeline is a receive or a send pipeline - the BizTalk messaging engine (for receive pipelines) or the BizTalk send adapter (for send pipelines) retrieves a reference to the message, which was returned by the last component in the pipeline
  8. The message's Data stream is read by BizTalk or the send adapter.  This is the outermost stream in the layers of wrapped streams.  Therefore the data output by this stream is the net effect of all modifications made by inner streams (I.e the modifications made to the data by each inner stream are present in the data that is read here)
  9. For receive pipelines, the resulting data is pushed into the message box database.  For send pipelines, the data is sent by the adapter to the external system

In an attempt to visualise the difference, here are a couple of screen shots from the slide deck of my presentation at the Toronto BTUG.  This first one shows a pipeline (laying horizontally to show execution will occur L-R) with in memory style components.  The red cogs represent both a call to the component's Execute method, and that the processing logic is going to occur at the same time as the Execute call (the slide was animated to show the left most component executes, then each one individually through to the right most one):

In Memory Execution

Next is the corresponding slide for the streaming pipeline.  This time, the yellow lightning bolts indicate the call to the component's Execute method (again, the slide was animated to show the Execute call to the left most component through to the right most one).  However, this time the red cogs (which represents the processing each component is doing) are separate, and don't actually appear until after the right most component's Execute method has completed.  The database shape represents the BizTalk message box database, and the red "no smoking" sign represents fresh air for all, and also the BizTalk messaging engine which is reading the stream at the end of the pipeline.  In the slide version, which bamboozled viewers with a display of amazing PowerPoint animations never seen before at a local user group, once the last Execute method had run, the no smoking sign spun (indicating it was reading the outermost stream, which would have been set by the last component in the pipeline), pulling data through all wrapped streams that had been wired up by the pipeline components.  This resulted in the cogs spinning as well (indicating the processing was taking place in the streams as the data was read through them).  The small arrows indicate the data moving from the left side (from the receive adapter) to the right side as it is read, and then up to the message box as it is incrementally pushed into it.  I hope all that saw this wacky display of animations appreciated it, even if not for the technical knowledge it was attempting to convey, but for the fact it took me ages to figure out how to use PowerPoint animations.

Streaming Execution

 

 

Well I think that about covers enough of the high level details before being able to dig down into the specifics of how the custom implementations of Stream and XmlReader work.  The next post will get into these details, but may also seem a bit of a detour at first.  However it is also key to understanding the overall picture.  This post will look into "chaining" the XmlReader and XmlWriter together.