Pipeline Components: Per-Instance Configuration and Load calls

As most people that use BizTalk are probably aware, BizTalk 2004 and up allow the properties of a pipeline to be overridden in each send/receive location.  In BizTalk 2006 there is a nice UI via the admin console for doing this.

However, what is NOT immediately clear to developers of pipeline components is how this affects when BizTalk calls your Load method at runtime.  Due to the way this has been implemented by BizTalk, you need to be aware of this as your pipeline components need to explicitly handle this situation in the Load method.

In my opinion BizTalk handles per-instance configuration in a pretty ugly way.  When your component's Load method is first called, it is passed a property bag containing all properties that were set in the pipeline designer (and saved via your Save method).  This property bag does NOT contain any per-instance properties.  Per-instance properties are provided to your component via a second Load call.  In my opinion BizTalk should merge the 2 property bags for you and simply call your Load method once, with the resulting property bag.

What makes matters worse is that when per-instance Load calls are made, they only contain the properties which were overridden.  This potentially introduces bugs into your Load method.  Consider the following code for the Load method:

   1: public void Load(Microsoft.BizTalk.Component.Interop.IPropertyBag propertyBag, int errorLog)
   2: {
   3:     object temp;
   4:     propertyBag.Read("StringToReplaceProperty", out temp, 0);
   5:     StringToReplace = Convert.ToString(temp);
   6: }

This code will work fine for the first Load call which contains all saved properties (assuming all properties were saved when the pipeline was created - they can be missing if you add an extra property to the component and do not re-drag your component onto the designer).

However, when a per-instance Load call is made, if the actual "StringToReplace" property had not been overridden in the port, but instead some other property in the component had been, then the result after the per-instance Load call is that you will have set your property back to null!

The way to avoid this is to write a helper method which allows specifying a default value.  Then you can pass the property itself in as the default value.  If you do this, the per-instance read of the property bag returns null, so you simply return the default value, which is the value that the property was already set to by the first Load call.

I recommend writing a specific helper class and placing it in an assembly that you can use from all pipeline components to avoid having to deal with the extremely ugly IPropertyBag interface BizTalk presents to you.  Then you can add type specific read methods such as:

  • ReadStringProperty(propertyBag, propertyName, defaultValue)
  • ReadBoolProperty(propertyBag, propertyName, defaultValue)
  • ReadEnumProperty(propertyBag, propertyName, defaultValue)

This will make your code much more readable than creating variables to be used for out parameters.

Once you override properties using per-instance properties, the sequence of Load calls changes from a single call for each component.  Summarised below is the behaviour you will now see based on the pipeline component type.  NOTE: The easiest way of seeing what is going on here is to break in the Load method and view the call stack.  Right click and select "Show External Code".  This way you can see the BizTalk and System methods in the stack.

The Load behaviour you should now experience is:

  1. Design time properties are Loaded when the pipeline is created.  As a result, the Load method of each component is called from the first component in the pipeline to the last to load the design time properties (I.e. the VS designer configured properties)
  2. For standard pipeline components, per-instance properties cause a Load call immediately before the Execute method is called
  3. For dis-assembler components
    • Per-instance properties cause a second Load call which is triggered by the possible need of BizTalk to call to IProbeMessage.Probe (this interface can be optionally implemented by dis-assemblers as the dis-assemble stage is "first match").  NOTE: This call is made even if the component does NOT implement the IProbeMessage interface!
    • A Load call is made before the Disassemble method
    • A Load call is made before every call to GetNext
  4. For assembler components, per-instance properties cause a second Load call to be made prior to the call to AddDocument.  Therefore there are 2 simultaneous Load calls, then a call to AddDocument, then a call to Assemble
  5. If per-instance configuration has been specified for any properties in the pipeline, per-instance Load calls are made to every component in the pipeline, even if that specific component has no overridden properties

Developing Streaming Pipeline Components - Part 1

Hello thrillseekers.  Well here it is - Part 1 of ??? on developing streaming pipeline components for BizTalk Server.  My apologies for the delay of this post after causing mass hysteria with the earlier announcement of a series of posts on Developing Streaming Pipeline Components.  This photo was taken outside my apartment shortly after the announcement was made.

Mass Blogging Hysteria

This post I will cover the basics of what a "streaming approach" actually means, and why we ideally need take a streaming approach to pipeline component development.  I will perform some comparisons of an in memory approach to a streaming approach from a number of perspectives.  Then in the follow-up posts I will dive into the detail of how this is achieved.

...

So, when documentation etc. refers to taking a streaming approach to pipeline component development, it basically means that the messages being processed by the pipeline components do not load the entire message into memory - they work on the message "chunk by chunk" as it moves through the pipeline.  This has the obvious benefits of consuming less memory, which in a BizTalk system that is processing either large messages or a high volume of messages becomes critical.

Below is a table which compares an "in memory" approach to a streaming approach - from various perspectives.  I will get into the details in subsequent posts which will explain more about these areas.  There may be areas I have missed but these are the ones that come to mind:

Comparison of... Streaming In Memory
Memory Usage per Message Low, regardless of message size High (varies depending on message size)
Common Classes Used to Process XML Data

Built in and custom derivations of:

XmlTranslatorStream, XmlReader, XmlWriter

XmlDocument, XPathDocument, MemoryStream, VirtualStream
Documentation Poor – many un-supported and un-documented BizTalk classes Very good - framework classes

Location of “Processing Logic” Code

- “Wire up” readers & streams via Execute method

- Actual execution occurs in the readers & streams as the data is read through them

Directly from the Execute method of the pipeline component
Data Re-created at each wrapping layer as data is read through it

Read in/modified/written out at each component prior to next component being called

 

Now lets look at the advantages of both streaming and in-memory approaches:

Streaming In Memory
Low memory use Fast when small message size (i.e. When server memory consumption is not stretched)

By utilising the built in BizTalk classes, some functionality exposed in the out of box pipeline components can be embedded in your custom components

Developers generally have experience using these classes
Easy to add/re-use functionality by utilising the decorator pattern with Stream and XmlReader Classes are generally fully featured
E.g. XPath, XSLT fully support the standards
  Often quicker/easier to code
  Developers generally more familiar with this practice

 

And now for the limitations of each approach:

Streaming In Memory

Scenarios that require caching large amounts of data are generally not supported or defeat the purpose (i.e. The cache takes up memory anyway)

E.g. Certain XPath expressions

High memory use

This can cripple a server’s throughput when processing large messages or a large number of messages

Due to this, a streaming approach is recommended

Poor documentation/unfamiliar development patterns to many developers  
Built in BizTalk pipeline components not designed with extensibility in mind. Often a re-write is required to add a small amount of functionality  

  

As stated earlier, the details in the table are what comes to mind - I may have omitted some.  But basically, when you look at the advantages and limitations of each approach (note that I have included development factors in here as well as runtime performance), it looks like an in-memory approach takes the points!?

However, you will notice that a number of these relate to documentation and developer experience etc.  So although for runtime performance we want to use a streaming approach; for development factors, it appears more advantageous (as in development time and complexity) to go the in-memory approach.  I would say that this is generally true, and is the main reason that this is the route most often taken by developers.

That said, once you become experienced at developing with a streaming "mindset", and also once you build up an internal library of re-useable classes to assist you, most of the extra development time is removed and you have the runtime benefits that the streaming approach buys you.

 

So, I have outlined the pros and cons of each approach and also provided a comparison of each approach from both a technical workings perspective and a development perspective.  Stay tuned for the next post where I will start looking into how to technically achieve a streaming approach when developing your own custom pipeline components.

Developing Streaming BizTalk Pipeline Components

I recently presented at the Toronto BizTalk User Group on the topic of "Streaming Pipeline Component Development in BizTalk".  I found this to be a hard topic to present, as the content is possibly easier to absorb when reading it at your own speed and being able to take the time to soak it all in.  It is also a reasonably in-depth topic, and as a result assumes a prior knowledge of BizTalk pipeline design.

Anyway, I plan to blog the content as a series of posts.  I hope these posts will serve to fill in the gaps of an area of BizTalk that is not well known or documented.

These posts will cover:

  • An overview of streaming design & development – benefits and design differences
  • A detailed look into BizTalk pipeline's streaming design, and the underlying classes that BizTalk exposes to provide this
  • A look into BizTalk’s publicly available classes which are useful for pipeline components

My knowledge in this area is a result of developing a set of EDI pipeline components for BizTalk 2004 & 2006 to replace the Base EDI Adapter.  These components were developed prior to BizTalk 2006 R2, and support the original BizTalk 2000 - 2006 EDI schemas.

 

I will update this page with links to the articles once they have been posted.

[Update 28th April 2008 - links below]

Part 1 - Streaming versus In memory comparisons & pros/cons

Part 2 - Execution of custom processing logic differences

Part 3 - Coupling the XmlReader and XmlWriter

Part 4 - Deep dive into the XmlTranslatorStream

Part 5 - Implementing custom logic by sub-classing the XmlReader and XmlTranslatorStream

WMI (What Magnificent Information)

Yesterday I blogged about a class I came across in the SystemInformation namespace called PowerStatus. It was working great for me until I tested it on a machine with three UPS's plugged into it. The three UPS's were not for that machine of course :) they were in a rack with a bunch of controllers etc. We wanted to monitor all three UPS's and find out how much life was left in the weakest, so we can shut down the system and save all data before any of them go.

The problem with Power Status was it seemed to give me an average of all three batteries and on my laptop it got nothing. So I asked my network of developers and Andrew pointed me to WMI (Windows Management Instrumentation). Well there is a wealth of information in there. And it can all be called nativley from .Net. I thought I would post some sample code. Gives me a place to come look for it next time I need something similar.

 This code gets a collection of Batteries from the current machine and fills a listbox with their Names and remaining life.

ConnectionOptions options = new ConnectionOptions();

options.Impersonation = ImpersonationLevel.Impersonate;

options.Authentication = AuthenticationLevel.Default;

options.EnablePrivileges = true;

 

ManagementScope connectScope = new ManagementScope();

connectScope.Path = new ManagementPath(@"\\" + Environment.MachineName + @"\root\CIMV2");

connectScope.Options = options;

connectScope.Connect();

 

SelectQuery msQuery = new SelectQuery("Select * from WIN32_Battery");

ManagementObjectSearcher searchProcedure = new ManagementObjectSearcher(connectScope, msQuery);

 

this.listBox1.Items.Clear();

foreach (ManagementObject item in searchProcedure.Get())

{

    this.listBox1.Items.Add(item["DeviceID"].ToString() + " - " + item["EstimatedRunTime"].ToString());

}

Notice the query in the middle of this code that is looking in Win32_Battery click here to see what else you can query out there.

PowerStatus

The project I am working on right now is a software factory that will eventually allow it's users to build software that controls robots and other devices in a laboratory.

Today I wrote a service that monitors the power on the machine it's running. The idea is the service will raise an event if the machine loses power, and will then monitor the remaining life of the UPS. This will allow the system to save any data to disk before the UPS gives up the ghost.

To write this I used a class that is new to .NET 2.0 called PowerStatus. This is a very useful little class.

First of all you can check various useful properties relating to your machines power supply.

Also you can handle the PowerLineStatusChanged event which will fire when the Power State changes.  In this event you can tell if the Power is connected or not and if the computer is about to resume or suspend.

There is a great code example and sample application here

 

The DataSource Attribute

You can use the DataSource attribute to call your TestMethod multiple times with different data each time.

Lets say for example I have an order details table like the one in Northwind.

  • OrderId
  • ProductId
  • UnitPrice
  • Quantity
  • Discount

In your application somewhere you have a method that calculates the cost of an item. Given the OrderId and ProductId it will return the Price for that line item. You want to write a test for this method, easy enough.

<TestMethod()> _
Public Sub TotalPriceTest()
Dim price As
Decimal
Dim MyDac As New
MyWindowsApplication.MyDac

    MyDac.Load()
    price = MyDac.getTotalPrice(10329, 38)

    Assert.AreEqual(price, CDec(4005.2))

End Sub

The problem is this test only tests one particular row. What about the case where there is no discount. I guess we have to write another unit test with a different orderId and productId. Hey we could create a file that contains a bunch of variations and then write a loop in the unit test to go through each one.

Or...

We could use a DataSource Attribute along with the TestContext Class.

When you add the DataSource Attribute to a TestMethod it will retrieve all the rows from the table identified for that connection and call your testmethod for each row. Inside your TestMethod you can use the TestContext Object to get information about the current test run. ie: the DataRow. Take a look at the example below.

<TestClass()> Public Class DataTests

  Private testContextInstance As TestContext

  '''<summary>
  '''Gets or sets the test context which provides
  '''information about and functionality for the current test run.
  '''</summary>
  Public Property TestContext() As TestContext
   Get
      Return testContextInstance
   End Get
   Set(ByVal value As TestContext)
     testContextInstance = Value
   End Set
  End Property

  <TestMethod()> _
  'DataSource Attribute to get the Data from Northwind -> OrderDetailTestData
  <DataSource(
"Data Source=localhost;Initial Catalog=Northwind;Provider=SQLOLEDB;Integrated Security=SSPI;", "OrderDetailTestData")> _
  Public Sub OrderDataTest()
    Dim price As Decimal
    Dim expectedPrice As Decimal
    Dim MyDac As New MyWindowsApplication.MyDac

      MyDac.Load()
      'Use the TestContext Object to get the current DataRow for the current test run
      price = MyDac.getTotalPrice(
CInt(Me.TestContext.DataRow("OrderId")), CInt(Me.TestContext.DataRow("ProductId")))
      expectedPrice =
CDec(Me.TestContext.DataRow("Price"))

      Assert.AreEqual(price, expectedPrice)

  End Sub

End Class

Now just populate the OrderDetailTestData table with all the different data you want to pass through your method.

Delegates Delegates

Here is something you may not have noticed before.

I'm sure you have utilized the following code to create an event handler in your code.

VB.NET  AddHandler MsgArrivedEvent, AddressOf My_MsgArrivedCallback
C#      MsgArrivedEvent += new MsgArrivedEventHandler(My_MsgArrivedEventCallback);

I have found it quite useful to create a method called AddHandlers and one called RemoveHandlers. This way I can easily turn on and off event handling. It's very nice when handling events like ColumnChanged in a Dataset. When I want to push data into the Dataset without the events firing I can just call RemoveHandlers. Then when I'm done call AddHandlers. It's analogous to enforceconstraints.

Here is an example in VB.

Public Sub AddHandlers()
   With Me
.OrderEntity.OrderData
      AddHandler .Order.ColumnChanged, AddressOf Me.onOrderColumnChanged
   End
With
End Sub

Public Sub RemoveHandlers()
  
With Me.OrderEntity.OrderData
      RemoveHandler .Order.ColumnChanged, AddressOf Me.onOrderColumnChanged
   End With
End Sub

If you use something like this be careful. Why you ask? You could turn Handlers on twice in a row inadvertently.

For example:
From the columnchanged event handler you call CalculateTotal which will RemoveHandlers, total the items, recalculate the tax, and then call AddHandlers.
Also from the columnchanged event handler you call CalculateTax which needs to RemoveHandlers, total the taxes for all items, and then call AddHandlers.
How clever of you to reuse the code in CalculateTax. pseudo code sample below.

CalculateTotal( amount )
  RemoveHandlers()
  'Total values
  if taxable
    CalculateTax(amount)
  AddHandlers()

CalculateTax( amount )
  RemoveHandlers()
  'Calculate tax
  AddHandlers()

The problem with this scenario is it will call RemoveHandlers twice then AddHandlers twice. The side effects are:

  1. Two handlers get added to the event, therefore the handler will be fired twice each time the event fires. 
  2. You may not have noticed to double event firring. If may call Removehandlers in another method and assume the handlers are turned off, but there is still one handler so RemoveHandler fixed problem one :) but your handler is not disabled like you thought.

There is one simple solution so you don't have to worry about it. Call RemoveHandlers at the beginning of AddHandlers.

Or be more careful.

VB vs C#

This blog entry is not a battle or comparison of the two languages. I love both of them the same.

What I did want to pass along is this great page I found. Actually I think someone may have told me about it but I can't remember who, if someone did point me to this I would like to publicly thank you.

If you, like me switch between these languages often you find your self sometimes trying to use the wrong keyword or needing to look up the equivalent. “I know it's internal in C# but what is it in VB again?” I could ask my friend :) but they forget too.

This page is a very useful resource for just such an occasion. VB.net and C# Comparison. If you use both languages often you will want to make this a favorite.

Getting the DataRow that is bound to an Infragistics UltraGridRow

Here is one of those posts I make because I keep forgetting how to do it. Now that I have posted it I won't forget. Perhaps someone else will be searching for this little code snipit and I will have helped them out. :)

If you have an Infragistics Grid bound to a Dataset, and you want to get the DataRow that the UltraGridRow references. Here is what you do.

Cast the UltraGridRow's ListObject property to a DataRowView and get the DataRow from that.

I'll do the code sample in VB this time.

Dim GridRow As UltraGridRow  'Let's assume you have the GridRow perhaps passed in an event argument
Dim OrderRow As
Myapp.Orders.OrderData.OrderRow

OrderRow = CTypeCType
GridRow.ListObject, DataRowView  ).row , Myapp.Orders.OrderData.OrderRow  )

This will return Null if it's an UltraGridGroupByRow.

Tech Ed Bloggers

There are a bunch of ObjectSharp people and Friends of ObjectSharp at Tech Ed this week. Their Blogs are making good reading. I feel like I'm there. If you wish you had gone to Tech Ed or wonder what you are missing. Check out the Blogs below.

ObjectSharp People

Friends of ObjectSharp

Everyone else