Presentation -> Using BizTalk 2006

Short notice, but I am doing a presentation tomorrow (Thurs. Feb 9) at the Metro Toronto .Net Users Group.
You can read more about it here:
http://www.metrotorontoug.com/User+Group+Events/231.aspx

Demos will include dealing with binary documents (Word and Excel) in BizTalk 2006.
The demos will also include:

- The new Sharepoint and POP3 adapters for BizTalk 2006.
- Parsing Excel documents to XML in BizTalk.
- Integration of Sql Server 2005 Service Broker with BizTalk 2006.
- Some of the new and improved features of BizTalk 2006.

So if you are in the area, feel free to drop by.

BizTalk 2006 -> Example using the new POP3 and Sharepoint Adapters

Below is an example using the new POP3 and Sharepoint Adapters (Sharepoint Adapter now out of the box) for BizTalk 2006. This example demonstrates the following:

1) Processing binary documents in BizTalk.
2) A method to process multiple attachments in an incoming mail message.
3) Improvements to the Sharepoint Adapter.

You can download the code at the end of this blog.

The example goes something like this.

A candidate is applying for a job. The candidates resume (Word) and an informational spreadsheet (Excel) are attached to an email message as below:

The resume is in word format, and the excel spreadsheet is as below:

The mail message with attachments is read by BizTalk server. BizTalk server will then add a new item to a Sharepoint Document library. The Word document from the mail message is added as the item's document and the Excel spreadsheet is parsed to populate the First Name and Last Name columns as below:

The sample works as below:

1) A BizTalk Receive Port/Receive Location is configured using the POP3 receive Adapter as below:

Note: The Apply MIME Decoding property is set to false. This is because we want the raw MIME message to be delivered to an orchestration. The two attachments of the message will be dealt with in the orchestration (see below).

2) An orchestration as below will then subscribe to the incoming encoded MIME message:

3) In the orchestration, a receive pipeline (ConstructCandidateInfo Construct Shape) is executed to extract the Excel message out of the encoded MIME message and to also parse the Excel message to an XML format.

The pipeline looks as below:

Note: A MIME decoder is used in the pipeline to extract out the Excel Attachment
Note: The ODBC File decoder  is used in the pipeline to parse the Excel Attachment to an  XML message. This is so the FirstName and LastName can be used to populate the First Name and Last Name columns in the Sharepoint document library list.
Note: Executing receive pipelines in an orchestration is a new feature for BizTalk 2006.

The code to execute the pipeline in the orchestration is as below:

// Execute the Pipeline -> ReceivePipelineCandidateInfo.
// This will create a message with the XML in it
varPipelineOutPutMessages = 
Microsoft.XLANGs.Pipeline.XLANGPipelineManager.ExecuteReceivePipeline(typeof(BizTalkPop3AndSharepointExample.ReceivePipelineCandidateInfo), 
msgMimeCandidate);
msgCandidateInfo = null;
varPipelineOutPutMessages.MoveNext();
varPipelineOutPutMessages.GetCurrent(msgCandidateInfo);
 
4) In the orchestration, a receive pipeline (ConstructResume Construct Shape) is executed to extract the Word document.

The pipeline looks as below:

Note: A MIME decoder is used in the pipeline to extract out the Word Attachment

The code to execute the pipeline in the orchestration is as below:

// Execute the Pipeline -> ReceivePipelineCandidateResume.
// This will create a message with the Word Document in it
varPipelineOutPutMessages = 
Microsoft.XLANGs.Pipeline.XLANGPipelineManager.ExecuteReceivePipeline(typeof(BizTalkPop3AndSharepointExample.ReceivePipelineCandidateResume), 
msgMimeCandidate);
msgCandidateResume = null;
varPipelineOutPutMessages.MoveNext();
varPipelineOutPutMessages.GetCurrent(msgCandidateResume);

// Get the FirstName and LastName that are in the Parsed XML Excel message.
// set these into the WSS ConfigPropertiesXml
varXmlDomCandidateInfo = msgCandidateInfo;
varXmlNode = varXmlDomCandidateInfo.SelectSingleNode("//FirstName");
strFirstName = varXmlNode.InnerText;
varXmlNode = varXmlDomCandidateInfo.SelectSingleNode("//LastName");
strLastName = varXmlNode.InnerText;
// Set the first names and last names so they will
// appear in the Document Library Columns
strWSSConfigPropertiesXml = "<ConfigPropertiesXml><PropertyName1>First Name</PropertyName1><PropertySource1>" + strFirstName + 
"</PropertySource1><PropertyName2>Last Name</PropertyName2><PropertySource2>" + strLastName + "</PropertySource2></ConfigPropertiesXml>";
msgCandidateResume(WSS.ConfigPropertiesXml) = strWSSConfigPropertiesXml;
// Set the name of the File, when created in the Sharepoint Document Library
msgCandidateResume(WSS.Filename) = strFirstName + " "  + strLastName + ".doc";

Note: msgCandidateResume(WSS.ConfigPropertiesXml) = strWSSConfigPropertiesXml;
is used to set the First Name and Last Name columns in the document library.

5) The msgCandidateResume message is then sent to the WSS library via a configured
Send Port using the Sharepoint Adapter as below:

Note: That the Column 01, Column 01 Value, Column 02, Column 02 Value, FileName are not configured.
These are dynamically set by the orchestration as explained in 4).

You can download the code HERE. Also look at the ReadMe

Conclusion

1) The new adapters and adapter enhancements for BizTalk 2006 really open up the doors for integrating all sorts of different applications. For a complete list of adapters that will be shipped with BizTalk 2006, please go HERE  
2) You can use the new features of BizTalk 2006 (like executing Receive Pipelines in an Orchestration), to easily process complex messages in BizTalk.
3) Adrian Hamza and his team have developed a top notch, feature rich Sharepoint adapter for BizTalk 2006. For more information on the Sharepoint adapter, please visit Adrian's blog HERE. Adrian also has recorded a series of WebCasts on the Sharepoint Adapter:

WSS Adapter Training Videos:

WSSAdapter-PropsInOrchestration.wmv (28.58 MB)
WSSAdapter-SendReceiveCBR.wmv (10.25 MB)
WSSAdapter-SetupAndConfig-Short.wmv (9.58 MB)
WSSAdapter-InfoPathIntegration.wmv (15.11 MB)

You can download these at BetaPlace (BizTalk Server 2006 section)
http://beta.microsoft.com.

To get access to the BizTalk Server 2006 area at BetaPlace, please goto :
http://www.microsoft.com/biztalk/evaluation/bts2006beta.mspx

BizTalk 2006 and the Sql Server 2005 Service Broker Adapter (Beta 2)

Jesús Rodríguez has finished a Beta 2 version of  Service Broker Enhancements Adapter for BizTalk Server 2006. They are looking for beta testers, so please go HERE if interested. Service Broker is an exciting new feature for Microsoft SQL Server 2005. Now you can integrate BizTalk Messaging with Service Broker Messaging, using this important new adapter.

 

BizTalk, Another Custom XSLT mapping sample.

The below mapping example could most likely be accomplished using a series of BizTalk maps. In the maps,  Table Looping / Table Extractor / Looping functoids etc. could be used to perform the transformation. Sometimes it makes sense to craft your own custom XSLT to execute the complete transformation. The sample input / output messages for the transformation as below. Also included is the custom XSLT for the transformation. You can also download the source code (at end of this blog entry). 

INPUT Message:

--> For each "contenedor" node, in the output message must repeat the sequence of segments ORC + RQD (According to the standard HL7 v.2.3.1)
--> For each "prueba" node, in the output message must repeat the sequence of segments ORC + OBR
--> For each "parametro" node, in the output message must repeat the segment NTE

For the below output, the order of the nodes is also important. The order relates back to the input message.

OUTPUT Message should be:

MSH||||.|
ORC||||.|
RQD||||.|
ORC||||.|
RQD||||.|
ORC||||.|
OBR||||.|
NTE||||.|
NTE||||.|
ORC||||.|
OBR||||.|
NTE||||.|
NTE||||.|
NTE||||.|


The custom XSLT for the mapping/transformation looks like below:

The above XSLT can be placed in a scripting functoid in the map as discussed HERE
Or the custom XSLT can be placed in a separate file, with the map's property ->  Custom XSLT Path pointing to the file (see below)

Download the solution HERE

Conclusion: In some cases it is much simpler to write your own custom XSLT for a BizTalk map.
Note: You can also perform a transformation in a BizTalk map using a combination of custom XSLT and links/functoids (see below links)
Note: Using functoids or a combination of functoids in a map are sometimes not enough to perform a transformation or a portion of a transformation. Using custom XSLT for the transformation or a portion of the transformation may be the answer.

Below is more information on using custom XSLT in BizTalk maps:

Extending Mapper (BizTalk Server SDK Samples)

Creating Nodes from Flat Data in BizTalk 2004 Maps

Logical Message Splitting in BizTalk Server 2004

Implementation of Message Transformation Normaliser Pattern in the BizTalk 2004

Sorting XML Nodes with the BizTalk Mapper

Debugging XSLT

New BizTalk 2006 5 day Course -&gt; Building BizTalk 2006 Solutions Best Practices

I am in the process of writing a new BizTalk 2006 course as below:

Preamble

BizTalk is all about messaging and how messages are processed and routed between a companies internal applications (for example: Sap/Mainframe/Peoplesoft/ Database/JD Edwards/Web Services etc.) and how messages are exchanged with trading partners (B2B). BizTalk provides an excellent framework for implementing many common messaging patterns such as:
a) Scatter/Gatherer pattern
b) Splitter pattern
c) Aggregator pattern
d) Message Broker pattern
e) etc.
The objective of this course is to provide students with all the necessary tools to implement messaging patterns using BizTalk 2006.

Below are some of the topics with labs for the course:

a) Various methods to Create Schemas for XML and Flat files messages.
b) Mapping using the BizTalk Mapper, custom XSLT and other alternatives
will be discussed to perform complex transformations.
c) An introduction into building Custom Adapters
d) An introduction into building Custom Pipeline Components
e) Interacting with the BizTalk Messaging Engine
f) Creating and using Custom Pipelines
g) Orchestrations.
h) Using xpath in Orchestrations.
i) Web Services and BizTalk
j) Error Handling with BizTalk
k) Debugging BizTalk Solutions
l) Deploying BizTalk Solutions
m) Business Activity Monitoring and BizTalk
n) Business Rules Engine
o) Implementing Common Enterprise Integration Patterns with BizTalk.
Some of the patterns discussed and put into practice are:
Message Broker
Content Based Routing
Splitter Pattern
Normalizer Pattern
Aggregator Pattern
 
Additionally the new features of BizTalk 2006 will be highlighted and used in Labs.

It is quite a bit for a five day course (not sure if all of it will make it in). From my experience using the product and teaching a BTS 2004 course for the past year and a half, the above is what I would like to cover in the course.

If you have a suggestion for additional topics for the course, please reply to this entry. I would love to hear them.

Update: This course is available now. Building BizTalk Server 2006 Solutions Best Practices

 

BizTalk, MIB, Acord and 5 minutes of Marketing

My blog entries are usually technical in nature. This entry is more of a marketing entry discussing how BizTalk 2004/2006 can be used to implement the following specific business scenario:

1) An insurance underwriter requires information (for example -> medical history) about an insurance applicant.

2) An internal application captures applicant information provided by the underwriter such as:
a) Applicants First Name/Last Name/ Middle Name.
b) Birth date.
c) Birth place.

3) The internal application then sends this information to a company called MIB. The underwriters query parameters (Last Name/ First Name etc) are then used to query a data repository at MIB. A result set of the query is then returned to the internal application from MIB. This result set contains information such as the medical history of the applicant.

4) The results from MIB are then displayed to the insurance underwriter to aid in the insurance application process.

Presently many North American insurance companies send/receive information to/from MIB over a dial up connection. Flat Files are sent and received back from MIB.

By December of 2006, all users of the current dial up process are to switch over to a new process provided by MIB as below:

All insurance underwriter requests are to be transmitted to MIB using a secure HTTP connection. The requests are to be sent in XML messages that conform to the Acord schemas. A synchronous HTTP response (XML) message will then be sent back over the same secure HTTP channel as the request. This response message contains the results of the query.

Below is a description of how the above new process can be implemented by BizTalk 2004 or BizTalk 2006.

1) Create a new BizTalk project that contains the Acord XSD schemas as below:


2) Typically a mainframe application may dump the underwriter requests to disk in a flat file or different format.
A configured BizTalk FTP or File Receive Port is then configured to pick up the files.
 
Note: The original request may be delivered by a different transport or stored to intermediate location as below:


HTTP
MSMQ
MQSeries
Sql Server
Oracle
DB2
Host Applications  -> IBM, mainframe zSeries (CICS and IMS) and/or midrange iSeries (RPG)
Host Files -> Data adapter for host file systems on IBM mainframe zSeries VSAM datasets and midrange iSeries physical files
etc.


BizTalk contains many out of the box adapters that can be used to receive messages from all the above transports/locations and others not listed. If the needed adapter is not provided out of the box, then the BizTalk Adapter framework can be used to author your own custom adapter. Third party adapters can also be purchased.

3) Create internal XSD schemas to represent the internal messages.
For example, an XSD schema using flat file extensions would be created  to represent flat files produced by the mainframe application.

4) Create BizTalk map(s) to transform the internal flat files to Acord XML messages.

5) Create the necessary BizTalk orchestrations for the business process. 

6) Install a certificate on the BizTalk machine for the MIB secure HTTP requests using the Certificate Snap-in.

7) Create a BizTalk Solicit-Response Send port with a configured HTTP adapter for the MIB secure HTTP requests. The Send port is configured to use the installed certificate from step 6).

8) Create BizTalk map(s) to transform the Acord XML response messages to internal messages.

The above pretty well covers all the required BizTalk objects.

So why use BizTalk for the above MIB request/response loop or a similar business process:

1) This is what BizTalk server does. It integrates a companies internal applications together and also provides B2B capabilities.

2) A number of turnkey solutions can be purchased that do the same communication loop with MIB. These turnkey solutions are probably more expensive and proprietary than a BizTalk Solution. There is little coding involved to build the above BizTalk solution and there is complete control over the process as a developer(s) is building it and maintaining it. For changes or modifications to the BizTalk process, a configuration change is sometimes all it takes.

For BizTalk 2006 pricing go to HERE
For BizTalk 2004 pricing got to HERE

I have heard claims that it took more than 1000 man hours to complete the above process in BizTalk. This is completely untrue as it took me a fraction of 1000 hours to implement the above MIB communication loop using BizTalk.

3) The BizTalk server that hosts the above process can be used to host other developed Application Integration or
B2B business processes.

4) BizTalk server is increasing exponentially in popularity. More and more developers have BizTalk experience.

5) BizTalk provides many useful out of the box features such as:

a) Business Activity Monitoring
b) Rules Engine
c) Fault Tolerance/Load Balancing
d) Tracking
e) etc. etc.


Note: I do not mean to sugar coat the process. The BizTalk maps for the MIB requests and response messages are somewhat complex. The complexity of the maps can be attributed directly to the large size of the Acord XSD schemas. There are larger XSD schemas. In my opinion, the HL7 XSD schemas are more complex then the Acord XSD schemas.

Processing a Large Flat File Message with BizTalk and the SqlBulkInsert Adapter

Below discusses the second demo for a presentation that I did at the Business Process Integration & Workflow Conference in Redmond during the week of Oct 4.

The title of the presentation was entitled:
Handling Large Messages in BizTalk

The first demo that I did for the presentation can be found here:
Processing a large message and a faster, less CPU intensive splitter pattern.

This entry compares two different methods for processing large flat files in BizTalk.
The size of the flat files being tested ranged in size from: 8MB to 201MB.

At the end of this entry is the download for the SqlBulkInsert Adapter code and the BizTalk Test Project to
perform the tests.

A section of the flat file message being processed is as below. About half of the columns are cut off:

The above is the flat file representation of the XML Production orders message that were used in the first Demo.
Each line of the flat file message contains a production order. The production order data (columns) are separated by commas and each production order line (record) is separated by a Carriage Return/Line Feed.

To process the flat file message, two different implementations were used. In both implementations, the flat file is pre-processed so that it is in a format for further processing -> For example mapping or splitting.


Implementation One: Processing with a Flat File Schema and Custom Receive Pipeline

This implementation uses the standard method of using a XSD Schema with flat file extensions to represent the above flat file message. A custom receive pipeline utilizing a flat file disassembler transforms the flat file into an internal XML message. This XML message will be published into the MessageBox database. If you have not processed a flat file in BizTalk before, try this example in the SDK.

This is just a short recap of how this implementation works and is really for the benefit for those not familiar with BizTalk or who have never processed a flat file in BizTalk.

a) Flat File is picked up by a Receive Location.
b) Receive Location is configured to use a Custom Receive Pipeline
c) Custom Receive Pipeline will convert (disassemble) the flat file, into an internal XML message as below:


d) This XML Message is then published into the MessageBox database
e) Once the XML Message is published into the MessageBox database, any number of Orchestrations or send Ports can subscribe to this message and process this message. This message could then be mapped into another message or split into separate messages as in the first demo.


Implementation Two: Processing with the SqlBulkInsert Adapter

From a performance perspective there are three areas where the above implementation can be improved. This is especially true, if the incoming flat files messages are large greater than 20 MB.
Areas of improvement are listed below:

1) The pipeline processing time.
2) CPU utilization is at 100% for pipeline processing.
3) Flat files when converted into XML files can double or triple in size. For example when a 201MB flat file production order message is converted into an internal XML message, the size of this internal XML message expands to 767MB. This large 767MB message is then published into the MessageBox database. The size of the XML message is greater because of added tags, elements and attributes.

Therefore one approach would be to skip the pipeline processing for the large flat file message. Pipeline processing is a necessary and great feature in BizTalk, but will be skipped in this implementation to increase performance.

A Custom Adapter (SqlBulkInsert Adapter) will be used to pre-process the flat file.
This adapter can only be used with a One-Way Receive Port. The property pages of the Receive Location are as below:


How this implementation works:

a) This adapter behaves similar to the out of the box File Adapter used in a Receive Location. It will poll a directory on a Hard Drive for files to be picked up and processed.
 
b) The FileReceiveFolder Property is set to a directory on a hard drive.
For example -> C:\BTSBulkLoad\Files\ReceiveForSqlBulkLoad
Flat Files dropped into this folder will be picked up to be processed.

c) The FileMask Property is set so the adapter will only pick up only files with the mask.
For example -> *.txt

d) The FileErrorFolder Property will be populated with error files.
For example -> C:\BTSBulkLoad\Files\FileErrorFolder.
If some of the rows(data) in the flat file message are malformed, these will not be processed , but will be placed in a file for viewing. This is analogous to a recoverable interchange. Rows in the flat file that are correct in structure will be processed.

For example, if two of the rows in the flat file cannot be processed (Missing Columns, missing delimiters etc) a file is produced with the row(s) that could not be processed.

11506,10020,TST GF 01 GRD 01,7045936818561,20.25,20.25,58.25,58.250359010995972,3 Inch Core,,,
11665,10020,TST GF 01 GRD 01,36,23584,23584,1432.7045936818561,1432.7045936818561,20.25,20.25,5

Additionally another file is produced with a more detailed explanation of why the row(s) could not be processed as below:

Row 16 File Offset 3822 ErrorFile Offset 0 - HRESULT 0x80004005
Row 25 File Offset 6422 ErrorFile Offset 493 - HRESULT 0x80004005

The Max Number of rows with errors can be configured (explained below). If this number is exceeded, the operation will fail as a whole.

Once a File has been picked up, the SqlBulkInsert adapter will use the below properties to delegate the processing of the flat file to a Sql server stored procedure. The large File will not be submitted to the BizTalk engine to be processed. Therefore the BizTalk pipeline processing will be skipped for the large file.


e) The SqlConnectionString Property is set with the connection string to the sql database that will process the flat file.
For example -> packet size=4096;integrated security=SSPI;data source="(local)";persist security info=False;initial catalog=BTSSqlBulkLoad 

f) The SqlStoredProcedureName Property is set to the name of the stored procedure that will process the flat file.
For example -> LoadInsertProductionOrders

The declaration of the stored procedure is as below:

CREATE Procedure [dbo].[LoadInsertProductionOrders]

@pathAndFileName  varchar(500),
@pathAndErrorFileName varchar(500)


g) The SqlStoredProcFileNameParameter property will set with the name of  the parameter that accepts the file to be processed.
For example -> @pathAndFileName

At run time when the adapter picks up the file, it will generate the following to populate the @pathAndFileName stored procedure parameter:
For example -> C:\BTSBulkLoad\Files\ReceiveForSqlBulkLoad\FlatFileToProcess.txt

Note: Only a pointer to the flat file to be processed is passed to the stored procedure.
Note: UNC paths should also work, but have not actually tried.


h) The SqlStoredProcFileErrorParameter property is set to the name of the parameter in the stored procedure
to indicate the location to where the error files should be placed. See part d)
For example -> @pathAndErrorFileName

At run time when the adapter picks up the file, it will set the @pathAndErrorFileName stored procedure parameter to a Value.
For example -> C:\BTSBulkLoad\Files\FileErrorFolder


At run time when the File Receive portion of the adapter has picked up a file to process, the following code is then called in the adapter to invoke the stored procedure:


internal static string callBulkInsertStoredProcedure
   (string connectionString,
   int commandTimeOut,
   string storedProcedureName,
   string fileName,
   string fileErrorName,
   string fileNameParameterName,
   string fileErrorNameParameterName)
{
 System.Text.StringBuilder sbForReturnedXML = new System.Text.StringBuilder();
 SqlConnection sqlConnect = new SqlConnection();
 sqlConnect.ConnectionString = connectionString;
 SqlCommand sqlCommand = new SqlCommand();
 sqlCommand.CommandType = System.Data.CommandType.StoredProcedure;
 sqlCommand.CommandText = storedProcedureName;
 
 // Dynamically set the name of the parameters that should be called in the custom stored procedure.
 // This is because each stored procedure may have parameters with different names.
 sqlCommand.Parameters.Add(new System.Data.SqlClient.SqlParameter(fileNameParameterName, System.Data.SqlDbType.VarChar,500));
 sqlCommand.Parameters.Add(new System.Data.SqlClient.SqlParameter(fileErrorNameParameterName, System.Data.SqlDbType.VarChar, 500));
 sqlCommand.Parameters[fileNameParameterName].Value = fileName;
 sqlCommand.Parameters[fileErrorNameParameterName].Value = fileErrorName;
   
 sqlCommand.Connection = sqlConnect;
 sqlCommand.CommandTimeout = commandTimeOut;
 System.Xml.XmlReader xmlReader = null;
 sqlConnect.Open();

 try
 {
  xmlReader = sqlCommand.ExecuteXmlReader();
  xmlReader.MoveToContent();
  string xmlProductionOrderIdNode;
    
  while (!xmlReader.EOF)
  {
   xmlProductionOrderIdNode = xmlReader.ReadOuterXml();
   sbForReturnedXML = sbForReturnedXML.Append(xmlProductionOrderIdNode);
  }

  }
  catch (System.Exception ex)
  {
   System.Diagnostics.Debug.WriteLine("Error Occured in Adapter " + ex.Message);
   throw ex;
  }

  return sbForReturnedXML.ToString();
 } 
}

The above is just basic ADO.NET code to call the stored procedure that will return back a small
XML message generated by the stored procedure. This XML message can contain any information. This small XML message will
be published into the MessageBox database. (More about this below).

The stored procedure configured in the receive location is executed. This is where the real processing of the flat file takes place.
Note: This stored procedure could be altered in any way to process the flat file. The below is just an example.

CREATE Procedure [dbo].[LoadInsertProductionOrders]

@pathAndFileName  varchar(500),
@pathAndErrorFileName varchar(500)

-- This stored procedure will take the passed path of the flat file (produced by the the adapter) to process.
-- For example c:\directory\FlatFile.txt.
-- The code in this procedure will then Bulk Insert this flat file into a Sql Server table.
-- This stored procedure was originally created in Sql 2000, but was moved to Sql 2005
-- for the following reasons:

-- a) Sql 2005 has introduced a new argument for Bulk Insert Clause [ [ , ] ERRORFILE = 'file_name' ] 
-- This will place malformed rows into an error file.

-- b) Sql 2005 has introduced Try Catch Blocks in T-Sql. In Sql 2000, if there were any malformed
-- rows in the flat file, an error would be raised to the .Net Helper Component and the operation would cease.
-- With the Try Catch Block in Sql 2005, no error is raised to the .Net Helper Conponent and processing continues.


As

-- Flat File rows will be inserted into this temp table.
-- Note: Could of used a Format File  file, but this temp table matches the structure of the flat file.
-- Eventually, the rows in this temp table will be inserted into a permanent table.

 

CREATE TABLE #tempInsertNewProductionOrders (
 [trk_unit_id] [int] NOT NULL ,
 [pro_product_id] [int] NOT NULL ,
 [actual_grade] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [actual_basis_weight] [float] NULL ,
 [actual_length_at_turnup] [float] NULL ,
 [actual_length] [float] NULL ,
 [actual_weight_at_turnup] [float] NULL ,
 [actual_weight] [float] NULL ,
 [required_width] [float] NULL ,
 [actual_width] [float] NULL ,
 [required_diameter] [float] NULL ,
 [actual_diameter] [float] NULL ,
 [actual_core] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [actual_property_1] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [actual_property_2] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [actual_property_3] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [update_timechain] [smallint] NULL ,
 [update_time] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [update_user_id] [int] NULL ,
 [position_index] [int] NOT NULL ,
 [comment] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [required_length] [float] NULL ,
 [required_weight] [float] NULL ,
 [actual_mfg_grade] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [actual_moisture] [float] NULL ,
 [actual_caliper] [float] NULL ,
 [actual_colour] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [actual_finish] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 [set_number] [int] NULL ,
 [position_percent] [int] NOT NULL ,
 [tare_weight] [float] NULL ,
 [user_scale_weight] [float] NULL ,
 [wire_side] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
 [trkc_length_adjust_type_id] [int] NULL ,
 [actual_compression] [float] NULL ,
 [actual_hardness] [float] NULL ,
 [sch_prod_order_id] [int] NULL ,
 [trk_set_item_id] [int] NULL ,
 [trk_unit_id_package] [int] NULL )


declare @return int
declare @rowcount int, @error int

declare @sqlStatement varchar(8000)


-- Create the dynamic sql that contains the Bulk Insert Statement to Bulk Load the flat file into the Temp Table
-- An Argument could of been used to control the max number of error rows. As below:
-- MAXERRORS [ = max_errors ]
-- MAXERRORS, Specifies the maximum number of errors that can occur before the bulk copy operation is canceled.
-- Each row that cannot be imported by the bulk copy operation is ignored and counted as one error. If max_errors is not specified, the default is 10.


set @sqlStatement = 'BULK INSERT #tempInsertNewProductionOrders'
set @sqlStatement = @sqlStatement +    ' FROM ' +  ''''+ ltrim(rtrim(@pathAndFileName))  +''''
set @sqlStatement = @sqlStatement +  ' WITH '
set @sqlStatement = @sqlStatement +     '  ( '
set @sqlStatement = @sqlStatement +  ' FIELDTERMINATOR = '  +   '''' + ',' + '''' + ','
set @sqlStatement = @sqlStatement  + ' ROWTERMINATOR = ' +  ''''  +  '\n'   + '''' + ','
set @sqlStatement = @sqlStatement  + ' ERRORFILE = ' +  ''''  +  ltrim(rtrim(@pathAndErrorFileName ))   + ''''
set @sqlStatement = @sqlStatement  +   ')'


-- Temp Table for Sending back results

create table #results (rowsprocessed int,
           ErrorCode int,
           BatchIdentifier varchar(50))

-- Create a unique GUID, that will be stored in each row.
-- This is to differentiate the rows from other batches.
-- This GUID, will be returned back to the Adapter in a small XML message:

declare @newid uniqueidentifier
set @newid = newid()
declare @UniqueIdentifier varchar(50)
set @UniqueIdentifier = replace(convert(varchar(50), @newid),'-','')

 

Begin Try
 -- Excecute the Bulk Insert Statement
 exec (@sqlStatement)

 -- Insert the rows from the temp table into the Permanet Table.
 -- For each row, also set the Batch Guid.
 Insert into InsertNewProductionOrders
 Select  #tempInsertNewProductionOrders.*,@UniqueIdentifier from #tempInsertNewProductionOrders

 Select @rowcount = @@rowcount,@error = @@error 
 Insert into #results values(@rowcount,@error,@UniqueIdentifier)
  
End Try
Begin Catch

    -- Catch any errors and re-raise 

    DECLARE @ErrorMessage NVARCHAR(400);
    DECLARE @ErrorSeverity INT;
    DECLARE @ErrorState INT;


    SELECT @ErrorMessage = ERROR_MESSAGE();
    SELECT @ErrorSeverity = ERROR_SEVERITY();
    SELECT @ErrorState = ERROR_STATE();

    RAISERROR (@ErrorMessage, -- Message text.
               @ErrorSeverity, -- Severity.
               @ErrorState -- State.
               );
End Catch;

-- Send back a small informational XML message to the Adapter.
-- This XML message will be published in the Messagebox Database:

WITH XMLNAMESPACES ( DEFAULT 'http://SqlBulkInsert')
Select rowsprocessed,  ErrorCode,BatchIdentifier,
  (Select Distinct sch_prod_order_id  ProductionOrderId
   from #tempInsertNewProductionOrders  ProductionOrder
   For XML Auto ,Type )
from #results  Results
For xml auto, ROOT('BulkInsertResults')

-- Note with Sql 2005 can:
-- a) Place a Root node around the returned XML
-- b) Specify a Target Namespace.
-- c) Do nesting with an XML Auto Statement.

GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF

A sample of the XML returned from this stored procedure is as below:

The above XML message is then passed back to the Adapter. This Small message is then published into the MessageBox Database.
Note: When authoring this stored procedure, any information can be returned in this XML message.


Once this small XML message is published into the MessageBox database, any number of orchestrations can subscribe to this message. In this particular case, the main message (Now residing in a table in a Sql Server database) is to be split into separate messages by using the distinct production orders returned back in the above
XML message. A discussion of this pattern can be found here. The productions order records in the Sql Server table could also be transformed (mapped) by using Select Statements with XML Auto and XML Explicit clauses.

Results of Tests

The following hardware was used: Laptop with 2.0 GHz processor.
Windows XP operating System hosting a Windows 2003 VPC image with 1.3 MB of memory allocated.
This VPC image hosted the BizTalk 2006 Server and a Sql Server 2005 instance hosting the BizTalk databases.
The local Sql Server instance also hosted the database where Bulk Insert operation occurred.

Test for processing one Message

Only one flat file message was processed at a time. Only one implementation was tested at a time:
Either the PipeLine (Implementation one) or the SqlBulkInsert Adapter (Impelementation Two).


Results for Pipeline Processing

Size of Flat File Time for Pipeline Processing

XML File Size (Published into the MessageBox)

Number of Rows (Production Orders)  in Message
201 MB 25 Minutes 767 MB 807428
108 MB 13 Minutes 413 MB 435076
56 MB 7 Minutes 213 MB 224130
30 MB 4 Minutes 114 MB 120234
15 MB 3 Minutes 56 MB 62234
8 MB 2 Minutes 31 MB 29123


Results for SqlBulkInsert Adapter

Size of Flat File Time For SqlBulkInsert Adapter Load Number of Rows (Production Orders)  in Message
201 MB 3 Minutes 807428
108 MB 1 Minute 15 Seconds 435076
56 MB 50 Seconds 224130
30 MB 15 Seconds 120234
15 MB 8 Seconds 62234
8 MB 5 Seconds 29123


Discussion of Results

Implementation One: Processing with a Flat File Schema and Custom Receive Pipeline

This is the usual method to process a flat file in BizTalk. Besides disassembling (converting) the flat file into an XML format, receive pipelines perform many other important operations such as:
a) Decoding
b) Disassembling
c) Validation
d) Custom operations using Custom pipeline Components. One example would be Unzipping a file.
etc.

The Good

a) It works!
b) From a design point of view, this is the preferred method as the pipeline can perform a number
of operations on the original message before the final message is published into the messagebox database.
c) Much more complicated flat files can be disassembled in a pipeline. The flat file used in the demo, is simple in structure.
An example of a more complicated delimited and positional flat file is as below, with Header, Detail and Footer Records:

PO1999-10-20
US        Alice Smith         123 Maple Street    Mill Valley    CA 90952
US        Robert Smith        8 Oak Avenue        Old Town       PA 95819
Hurry, my lawn is going wild!
ITEMS,ITEM872-AA|Lawnmower|1|148.95


The Bad

For large Flat File Messages:
a) the operation is somewhat slow (see times above),
b) CPU is pinned the entire time the flat file is being processed in the pipeline.
c) The original flat file message when converted to a XML, more than triples in Size, for example:
201 MB (flat file) transforms to 767 MB (XML). Depending on the operation being performed, a 767 MB message may be unmanageable (for example as a source message in a map).


Implementation Two : Processing with the SqlBulkInsert Adapter

This implementation was discussed in detail above.

The Good

a) Much faster in pre-processing the large messages (see times above).
b) Will not pin the CPU on the BizTalk machine (pipeline processing is skipped)
c) Large XML messages are not published into the messagebox database.
d) There really is not much to this -> A custom adapter and a stored procedure.


The Bad

a) A more complex flat file (see complex flat file message above) cannot be processed with this implementation. This is a limitation of the Bulk Insert Statement.
Other techniques could be investigated to Load structurally more complex flat files into Sql Server including:
i) DTS (Data Transformation Services). Sql 2000 and Sql 2005
ii) Sql Server Integration Services. Sql 2005
b) For the initial message the receive pipeline processing is skipped. If the incoming flat file was encoded or zipped up then the incoming flat file could be first be processed by another receive location/pipeline to decode or unzip, and then routed to a folder where the receive location is configured to use the SqlBulkInsert adapter.
c) The process (orchestration) that subscribes to the small XML message published into the MessageBox, has to be configured to connect to the database where the main message is stored in a relational sql table. This might mean just configuring Solicit - Send Send ports using a Sql Adapter
d) If using a Bulk Insert Statement to load in the flat file, Sql 2005 might have to be used. (See comments in the Stored Procedure) 


Final Thoughts

I have only spent about two hours on the adapter and it is not ready for production (It is a prototype).
I have also tested by processing two large messages simultaneously. Therefore if you are interested in 
using it, download the code and modify or just start from scratch. The code in the adapter is relatively straightforward.

The adapter was originally created with the BizTalk Server 2004 Adapter Wizard and uses the common adapter base classes and interfaces.

This technique (Adapter) could also be used to Bulk load an XML message into Sql Server.
Download the Code HERE. Read the ReadMe.Txt before installing and running.

 

 

 

 

 

Will Windows Workflow Foundation Replace BizTalk Server ?

The answer is no. Here is an excerpt from David Chappell's excellent whitepaper: Introducing Microsoft Windows Workflow Foundation: An Early Look 

Use Windows Workflow Foundation when:

An application will itself host workflows. Windows Workflow Foundation lets workflow be built into an application, allowing the workflow to be deployed and managed as a native part of the application. Because it's focused on integrating diverse applications rather than providing a general workflow framework, BizTalk Server always runs orchestrations within the BizTalk Server process.

The business process being implemented requires human workflow. BizTalk Server addresses system workflow, and so it lacks Windows Workflow Foundation's support for things such as state machine workflows and dynamic update. A scenario that requires both human workflow and more complex system integration services could be addressed by using Windows Workflow Foundation and BizTalk Server together, however. For example, the Office "12" support for document-centric workflows, based on Windows SharePoint Services, might be used for the human aspects of the problem, while BizTalk Server handles the system integration aspects. The two can interoperate using the BizTalk Server Adapter for SharePoint.
The workflow will execute on a client system. BizTalk Server is a server-focused product, and so it's less well-suited to run on desktop machines.


Use BizTalk Server when:

Solving an EAI problem that requires communication with diverse applications on diverse platforms. Because of its focus on cross-platform integration, a large set of adapters is available for BizTalk Server that allows communication with a range of other software. Windows Workflow Foundation is focused solely on workflow, not EAI, and so it doesn't provide these things.
B2B services are required. Windows Workflow Foundation doesn't address this area, while BizTalk Server provides tools for working with trading partners, accelerators for RosettaNet, SWIFT, and other industry standards, and more.
BPM services, such as Business Activity Monitoring (BAM), are required. While the Windows Workflow Foundation tracking infrastructure can be used to create these services, BizTalk Server provides important extras, such as tools that let information workers define their own BAM views.
A complete management infrastructure and support for increased scalability are required. Unlike Windows Workflow Foundation, BizTalk Server includes a full set of tools for administering and scaling a production environment.

 

 

BizTalk 2004 and 2006 -> Processing a large message and a faster, less CPU intensive splitter pattern.

The following is a comparison of splitter orchestration patterns using BizTalk 2004 SP1 and BizTalk 2006 Beta 1. You can download the samples discussed below at the end of this blog entry and install and test on a BizTalk 2004 installation and/or BizTalk 2006 installation. Included in the download is a large test message (approximate size of 133 MB).

Some details of the tests are as below:

1) All orchestration implementations split or debatch the same message.
2) The xml message to split is approximately 133 MB in size (on disk)
3) There are a total of 80570 production orders in the message.
4) An abbreviated message with just one complete production order looks like the below.

5) Each production order in the message contains a production order id. For example: sch_prod_order_id = 10031
6) There are 51 distinct production order ids in the message.
7) In each splitter pattern implementation, 51 messages are split out from the main message, as defined by the 51 distinct production order ids. Each split message contains all the productions orders with one distinct production order id.
8) The size of the split messages range in size from 111KB to 23MB.
9) Four Different Orchestration Splitter pattern implementations were tested. The patterns were tested on a BizTalk 2004 SP1 installation with 1GB of memory and a BizTalk 2006 Beta 1 installation. All BizTalk databases (i.e. MessageBox etc.) ran on the same physical machine as the BizTalk servers. My BizTalk 2006 Beta 1 installation was installed on a VPC image. The image had only 700 MB of memory assigned to it, and needed more memory for a fair comparison with the BizTalk 2004 SP1 installation. Therefore the results for the BizTalk 2006 installation are incomplete.

Implementation One : Orchestration Splitting with a Map

The details on this splitter pattern can be found HERE. To quickly recap, a large production order message is processed by an orchestration and first run through a map utilizing custom XSLT to produce an xml message with the distinct production order ids as below:

<ns0:DistinctProductionOrders xmlns:ns0="http://SqlSplitterHelper">
  <ProductionOrder ProductionOrderId="10001" />
  <ProductionOrder ProductionOrderId="10003" />
  <ProductionOrder ProductionOrderId="10002" />
</ns0:DistinctProductionOrders>
  
Each iteration of a loop in the orchestration is used to read a distinct production order id. Additionally, in each iteration of the loop, a map is used  to filter out a split message containing productions orders with that iterations distinct production order id.

Implementation Two : Orchestration Splitting using a SQL Server database

One of the problems faced with Splitting the large messsage using a map, is the CPU utilization of the BizTalk server. During the life of a splitter map orchestration instance, the CPU is maxed out at 100% (Note: This was true with the BizTalk 2004 installation but did not seem to be with the BizTalk 2006 installation). If there is only one BizTalk Server in the group, this limits the number of orchestration instances that can be running simultaneously. Therefore why not use something else to split the message and not rely on CPU resource intensive XSLT? One approach is to use a SQL server database to help split the large message. Querying 80000 or more rows is fast and efficient in Sql server and each message can easily be split using a Select Statement (with a For XML Auto or XML Explicit clause). One of the problems is how to get the 133 MB message from a BizTalk Orchestration into a Sql Server database efficiently (explained below). Below are the steps done in the orchestration to get this pattern to work:

1) The large production message comes into the Sql splitter orchestration.
2) The large message is assigned to a variable of type : System.XML.XMLDocument

varXMLDomMessageToProcess = msgProductionOrdersToSplit;
 
3) A .Net helper component was called from the orchestration passing as one of the parameters the OuterXML of System.Xml.XmlDocument variable as below:

varstrReturnedXML =  varObjSqlHelper.GetDistinctProductionOrders(varXMLDomMessageToProcess.OuterXml,varStrSqlConnectString,varGuidOrchestrationInstanceID);

Note: Ultimately a stored procedure is called by the .Net helper code to process the large production order message. A .NET helper component (called within an Atomic Scope shape) is used to call the stored procedure in place of Send/Receive shapes and a Send port using a Sql Adapter for the following reasons:
a) To cut down on the number of items that have to be persisted.
b) Passing the large string of XML to the stored procedure via a Send Port and Sql Adapter did not seem to work with the large message (Note: I did 
not try this with BizTalk 2006 only BizTalk 2004).

4) The .Net Helper method, receives the string of XML (Note: This is the entire 133 MB message) and calls a stored procedure passing as one of the parameters the string of XML (Note: The stored procedure is called only once with the entire 133MB message). The Stored procedure then inserts all the production orders into a Sql Server relational table. It also returns back a string of XML with a list of distinct production order ids. The code for the .Net helper component that is called from the orchestration is as below:

public string GetDistinctProductionOrders(string xmlProductionOrderMessage,
      string connectionString,
     System.Guid orchestrationIdentifier)
{
 string distinctOrdersXML;
 // Set up SqlConnection,SqlCommand and Parameters to call Stored procedure -> InsertProductionOrders
 SqlConnection sqlConnect = new SqlConnection();
 sqlConnect.ConnectionString = connectionString;
 SqlCommand sqlCommand = new SqlCommand();
 sqlCommand.CommandType = System.Data.CommandType.StoredProcedure;
 // Below is name of the Stored Procedure that will be called
 sqlCommand.CommandText = "[InsertProductionOrders]";
 // Add the Parameters needed to call the stored procedure
 sqlCommand.Parameters.Add(new System.Data.SqlClient.SqlParameter("@xmlString", System.Data.SqlDbType.Text));
 sqlCommand.Parameters.Add(new System.Data.SqlClient.SqlParameter("@orchestrationIdentifier", System.Data.SqlDbType.UniqueIdentifier));
 // Set the XML passed to the method to the @xmlString Parameter that will be receive by the stored proc.
 sqlCommand.Parameters["@xmlString"].Value = xmlProductionOrderMessage;
 // Set the UniqueIdentifier (Guid) so calls to splitter proc will get correct records for that orchestration instance
 sqlCommand.Parameters["@orchestrationIdentifier"].Value = orchestrationIdentifier;
 sqlCommand.Connection = sqlConnect;
 // This will take a few minutes, so set to 0 to wait indefinitely
 sqlCommand.CommandTimeout = 0;
 // open connection to DB
 sqlConnect.Open();
 // Use an XmlReader to get back the Distinct XML of ProductionOrdersIds from the Stored Procedure
 System.Xml.XmlReader xmlReader = sqlCommand.ExecuteXmlReader();
 // Now use the XmlReader to get the XML returned by the stored procedure
 xmlReader.MoveToContent();
 string xmlProductionOrderIdNode; 
 System.Text.StringBuilder sbForReturnedXML = new System.Text.StringBuilder();
 while (!xmlReader.EOF)
 {
   xmlProductionOrderIdNode = xmlReader.ReadOuterXml();
   sbForReturnedXML = sbForReturnedXML.Append(xmlProductionOrderIdNode);
 }
 // Get all the XML from the String Builder
 distinctOrdersXML = sbForReturnedXML.ToString();
 xmlReader.Close(); 
 sqlConnect.Close();
 // Return the Distinct list of Production orders in XML Format
 // Note: The calling orchestration will add the Root node and targetnamespace to this xml document.
 return distinctOrdersXML;

5) The .NET helper component from above passes the large message down to a Sql Server Stored procedure (all 133MB in one call to the stored procedure).
A Sql Server 2000 database was created with one table and two stored procedures to perform the initial insert of the XML into a relational table and to 
split out the messages. The objects inside of the database are:

Table -> InsertNewProductionOrders. To hold the contents of the production order XML in a relational table.
Stored Procedure -> InsertProductionOrders. Uses an OpenXML statement to take the XML and insert into the InsertNewProductionOrders table.
Stored Procedure -> GetSplitProductionOrderMessage. Is called to split out each individual message.

The Stored procedure (InsertProductionOrders) inserts the XML message into relational table (InsertNewProductionOrders) using an OpenXML statement. It 
then executes a select statement (with a For XML clause) to return the list of distinct orders in the table. Other benefits of using OpenXML with BizTalk can be found HERE and HERE.

Stored procedure -> InsertProductionOrders is as below:


Create Procedure InsertProductionOrders

-- Will take string of passed XML and insert into a table

@xmlString Text, -- This is the String of XML for all the production orders
@orchestrationIdentifier uniqueidentifier  -- Identifier for matching records to correct orchestration 
instance

as

DECLARE @iDocProdOrders int

-- Use OpenXML to insert the orders into the table

EXEC sp_XML_PrepareDocument @iDocProdOrders OUTPUT, @xmlString, 'xmlns:ns0="'">http://SqlSplitterHelper"/>'
if @@error <> 0 goto err_handler

Insert Into InsertNewProductionOrders
Select *, @orchestrationIdentifier 
From OpenXML (@iDocProdOrders,'//ns0:ProductionOrder',1)
              With (trk_unit_id int '@trk_unit_id',
 pro_product_id int '@pro_product_id',
 actual_grade varchar (80) '@actual_grade',
 actual_basis_weight float '@actual_basis_weight',
 actual_length_at_turnup float '@actual_length_at_turnup',
 actual_length float '@actual_length',
 actual_weight_at_turnup float '@actual_weight_at_turnup',
 actual_weight float  '@actual_weight',
 required_width float  '@required_width',
 actual_width float  '@actual_width',
 required_diameter float '@required_diameter',
 actual_diameter float  '@actual_diameter',
 actual_core varchar (80) '@actual_core',
 actual_property_1 varchar (255) '@actual_property_1',
 actual_property_2 varchar (255) '@actual_property_2',
 actual_property_3 varchar (255) '@actual_property_3',
 update_timechain smallint '@update_timechain',
 update_time datetime '@update_time',
 update_user_id int  '@update_user_id',
 position_index int  '@position_index',
 comment varchar (255) '@comment',
 required_length float '@required_length',
 required_weight float '@required_weight',
 actual_mfg_grade varchar '@actual_mfg_grade',
 actual_moisture float '@actual_moisture',
 actual_caliper float '@actual_caliper',
 actual_colour varchar '@actual_colour',
 actual_finish varchar '@actual_finish',
 set_number int  '@set_number',
 position_percent int '@position_percent',
 tare_weight float  '@tare_weight',
 user_scale_weight float '@user_scale_weight',
 wire_side char (3) '@wire_side',
 trkc_length_adjust_type_id int '@trkc_length_adjust_type_id',
 actual_compression float  '@actual_compression',
 actual_hardness float  '@actual_hardness',
 sch_prod_order_id int  '@sch_prod_order_id',
 trk_set_item_id int '@trk_set_item_id',
 trk_unit_id_package int '@trk_unit_id_package')  
if @@error <> 0 goto err_handler  

EXEC sp_XML_RemoveDocument @iDocProdOrders
if @@error <> 0 goto err_handler

-- Now return back the list of Distinct Production Orders from the
-- table. The orchestration instance id is used to return back the correct Production orders
-- if two or more orchestration instances are running simultaneously.

Select distinct sch_prod_order_id as 'ProductionOrderId'
From  InsertNewProductionOrders as ProductionOrder
Where  OrchestrationIdentifier = @orchestrationIdentifier
For XML Auto

Return 0

err_handler:

Return -1


6) As with the map splitter orchestration, each iteration of a loop shape processes a distinct ProductionOrderId. (Note: the list of distinct ProductionOrderId's was returned by the InsertProductionOrders stored procedure discussed above). In each iteration of the loop, stored procedure GetSplitProductionOrderMessage is called to retrieve a split message containing productions orders with one distinct production order id. A Send and  Receive Shape are used in the orchestration that use a Static Solicit - Response Send port to call stored procedure GetSplitProductionOrderMessage,passing it the following parameters:

@ProductionOrderID int,  -- To filter out the production orders for that message.
@orchestrationIdentifier uniqueidentifier, -- GUID of orchestration instance.
@deleteIndicator  int = 0 -- To indicate whether to delete this orchestrations instances records,
-- so as not to bloat table InsertNewProductionOrders with unecesarry records.

Stored procedure GetSplitProductionOrderMessage, then uses a For XML Auto clause to return back the split message as below:
 
Create Procedure GetSplitProductionOrderMessage

@ProductionOrderID int,
@orchestrationIdentifier uniqueidentifier,
@deleteIndicator  int = 0

as

Select *
From  InsertNewProductionOrders As  ProductionOrder (nolock)
Where sch_prod_order_id = @ProductionOrderID
and OrchestrationIdentifier = @orchestrationIdentifier
For XML Auto -- , XMLData
if @@error <> 0 goto err_handler

if @deleteIndicator = 1
Begin
 Delete InsertNewProductionOrders
 Where OrchestrationIdentifier = @orchestrationIdentifier
 if @@error <> 0 goto err_handler
End


Return 0

err_handler:

Return -1

Implementation Three and Four : Singleton Orchestrations Instances of the Map Splitter and Sql Helper Splitter

For those not familiar with BizTalk, multiple instances of the same orchestration type can be running simultaneously. For example, if the Map Splitter Orchestration subscribes to a receive port that has just accepted two large production order xml messages, two separate running instances of the Map Splitter Orchestration will be launched simultaneously to process the two messages (Note: this is not always true and is dependent on server load). Both the Splitter Orchestration using a map and Splitter Orchestration using a Sql Helper Database are resource intensive at certain point(s) during their lives as orchestration instances. If multiple orchestration instances are running simultaneously on the same BizTalk host instance machine, then at some point the host instance may run out of memory or CPU cycles. Details about when and why these patterns are resource intensive will be discussed below in the results section. To help alleviate the problem, one solution is to configure an orchestration to limit itself to one running instance. One straightforward method is as below:

1) A correlation type in an orchestration is created as below. CorrelationTypeForSingletonReceive is configured to correlate for all incoming messages on a physical receive port that an orchestrations logical receive port is bound to.

2) A Correlation Set is created in the orchestration that derives from the correlation type that was discussed above:

3) The first Receive Shape in the orchestration will initialize the above correlation set. No more orchestration instances of this type will be launched until this orchestration has completed.


4) An outer loop (Loop for Singleton) is then invoked with an expression of 1 == 1. This never ending loop will process all incoming production order messages, but will process only one at a time. Two nested loops are contained within the outer loop. Loop for Distinct Orders (collapsed in the below image) contains the necessary shapes to split and send out messages. Loop For Subsequent Messages will then pick up a new production order message that is ready for processing, but will only do so when the previous message has been completely processed and split into the 51 separate  messages. A receive shape in this loop has its Follow Correlation Set property set to the Correlation set that was derived in step 2) and initialized by the first receive shape of the orchestration. This will pick up a new production order message to process.

5) Once a new incoming message is picked up, it will then go to the top of the outer loop to be split into separate messages.

6) One problem with the above singleton orchestration, is that new production order messages may be received and queued up in the messagebox while the singleton orchestration is processing another production order message. These new production order messages are subscribed to by the running singleton orchestration instance. But if this orchestration instance fails, then these queued up unprocessed messages will be suspended, in an unconsumed state. These Zombie messages will then have be dealt with. In this implementation, a Catch block is used to catch any errors in the orchestration. Inside the catch block is a loop shape. A receive shape in this loop has its Follow Correlation Set property set to the Correlation set that was derived in step 2) and initialized by the first receive shape of the orchestration. This loop will then pick up any of the unconsumed messages. These unconsumed messages could then be submitted back to the singleton  orchestration. Additionally in BizTalk 2006, Suspended messages can be subscribed to by orchestrations, so it is possible that this technique could be used.

Results of Splitter Tests

Note: for all tests a series of System.Diagnostics.Debug.WriteLine were used in the orchestrations to capture the progress and processing times of the orchestration instances.
DebugView, was used to view the WriteLine statements in real time.

BizTalk 2004 Results

The following hardware was used: Laptop with 2.0 GHz processor and 1GB of memory.
Operating System :  Windows XP SP2
This machine hosted the BizTalk server and a Sql Server with all the relevant BizTalk databases. The SplitterHelper database that is used by the orcSplitterUsingSql.odx and orcSplitterUsingSql_Singleton.odx orchestrations was installed on the BizTalk Server machine, but separate tests were also done with a remote SplitterHelper database installed on a Laptop with a 866 MHz processor and 512MB of memory. The BizTalk server machine and the remote Sql Server database were connected over a LAN.

Tests for processing One Message:

In the below result table, only one 133 MB production order message was processed at a time. Only one orchestration was enlisted and started for each test. Total times begin when an orchestration instance starts and ends when the entire message has been split. They do not include the initial publication of the message into the messagebox. The split messages were not sent out through a send port.
Note: The orchestrations can be configured to send out the split messages.
Note: Orchestrations orcSplitterUsingMap_Singleton.odx and orcSplitterUsingSql_Singleton.odx are not included in the results because only one message  is being processed. They will have near identical results to orcSplitterUsingMap.odx and orcSplitterUsingSql.odx.

Below is a legend for the tests:

All times are in minutes:seconds, for example 3:45

Orchestration Name:
The name describes the pattern used.

Local SplitterHelper DB:
For the Map pattern orchestrations, Not Applicable
For the Sql pattern orchestrations, YES if a local SplitterHelper database was called on the BizTalk machine, NO if the SplitterHelper database was  called on the remote Sql Server SplitterHelper database.

Get Distinct Orders:
For the Map pattern orchestrations, time to generate the distinct production order xml message using a map with a Custom XSLT template.
For the Sql pattern orchestrations, time when the orchestration instance starts to when the .Net helper component returns the distinct production order xml message to the calling orchestration. This time includes the SplitterHelper Database processing.

Time to Split:
For the Map pattern orchestrations, time to split out all 51 messages using a map.
For the Sql pattern orchestrations, time to split out all 51 messages using the SplitterHelper database GetSplitProductionOrderMessage stored procedure.

Total Time:
For all patterns, total time to process the entire message.

Orchestration Name Local SplitterHelper DB Get Distinct Orders Time to Split Total Time
orcSplitterUsingMap.odx N/A 8:36 13:49 22:15
orcSplitterUsingSQL.odx Yes 3:01 3:54 6:55
orcSplitterUsingSQL.odx No 4:23 1:20 5:43


Discussion of Results

orcSplitterUsingMap
The CPU was pinned at 100% the entire time this orchestration was running. The map to get the distinct list of production orders and the map to split out the orders are responsible for the CPU utilization.

The good:
a) The whole splitting operation is contained within this orchestration. This simplifies administration and maintenance operations.

The bad :
a) With The CPU pinned at 100%, this limits and/or slows down other operations on the BizTalk server.
b) The total time for the orchestration to complete is not stellar at over 22 minutes.

orcSplitterUsingSql (local SplitterHelper DB)

While the above orcSplitterUsingMap is CPU intensive, the orcSplitterUsingSql is memory intensive, for the following reasons:
a) The operation to load the message in an XML Dom in the orchestration, uses a large amount of memory as the OuterXml is passed to the .NET helper component.
b) The OpenXML call in stored procedure -> InsertProductionOrders , uses a large amount of memory  (basically the same reason as above) and is also CPU intensive. 

The good:
a) This pattern is much faster. Taking only approximately 5-6 minutes to complete.
Note: Sometimes this pattern took longer, up to 12 minutes, because BizTalk was hogging all the memory and  would not give up enough memory to Sql Server and the SplitterHelper database. But the remote Splitter Database had much better results. See below.
b) The CPU is not pinned at 100%.

The bad:
a) Uses quite a bit of memory. If more than one orcSplitterUsingSql instance is running simultaneously on the same server instance, then there will be out of memory issues. (This will be discussed in more detail below)
b) The Sql OpenXML call is memory and CPU intensive. (But the SplitterHelper database can be installed on another machine than the BizTalk Server. See below )
c) The SplitterHelper database is just something else to maintain. Even though it is only a temporary/scratch database, for a fully fault tolerant  solution, this database should be installed in a clustered environment.
d) The structure of this production order message is simple and the Sql OpenXML call is relatively straightforward to write. But with a more complicated message, there would be more T-SQL to write in the stored procedures.
e) The split messages in this case are simple in structure and a For XML Auto clause is sufficient to return the proper xml. For more structurally complex split messages, a For XML Auto clause might not be able to return the desired XML. To get more control of the returned XML, a For XML Explicit clause can be used.   

Note: The new XML features in Sql 2005 will help tremendously in processing XML messages in Sql Server.

orcSplitterUsingSql (remote SplitterHelper DB)

The above results for orcSplitterUsingSql (local SplitterHelper DB) can be applied here also. The only difference is that the SplitterHelper database is located on a different physical server than the BizTalk Server. When the Sql  OpenXML call is executed, the high memory and CPU needs of this operation will be applied to this remote sql server instead of the BizTalk Server.  This helps the BizTalk server a great deal as this heavy Sql OpenXML processing is delegated to another server. Comparing times, orcSplitterUsingSql (remote SplitterHelper DB) in general has better times (4-5 minutes)  compared to the (5-6) minutes for the orcSplitterUsingSql (local SplitterHelper DB). Obviously having the SplitterHelper database on another physical machine helps tremendously. The Sql Server installed on the BizTalk machine has to process the persistence points and other BizTalk operations. Therefore having the SplitterHelper database installed on another machine really helps.
Note: Better times can be realized, the remote lightweight machine hosting the SplitterHelper database was a laptop with a only a 866MHz processor and 512MB of memory.

Test for processing Two Simultaneous Messages:

The same hardware set up was used as the tests for processing one message.

For more of a real world scenario, two identical 133 MB large messages were submitted simultaneously to be processed at the same time. Only one orchestration type was enlisted and started for each test. Only the total time to split the two messages are recorded in the below table. Total times begin when the first orchestration instance starts and ends when the two messages have been split. They do not include the initial publication of the message into the messagebox. The split messages were not sent out through a send port. A file receive location was used to submit the two messages at the same time. The receive location was first disabled, two 133MB messages were copied into the folder. The receive location was then enabled.

Orchestration Name

Local SplitterHelper DB

Total Time To Process
orcSplitterUsingMap.odx

N/A

45:23
orcSplitterUsingMap_Singleton.odx

N/A

42.11
orcSplitterUsingSQL.odx

Yes

22:49
orcSplitterUsingSQL.odx

No

22:46
orcSplitterUsingSQL_Singleton.odx

Yes

13:54
orcSplitterUsingSQL_Singleton.odx

No

11:38


Discussion of Results


orcSplitterUsingMap

Two instances of this orchestration were launched simultaneously. While watching the debug window, the two orchestrations ran neck and neck, basically splitting the exact same message at the same time. The memory on my machine was just about maxed out. If I added a third message to be processed, I do believe the machine would have run out of memory. The time to split two messages basically doubled, from splitting one message.

Goods and Bads: Basically the same comments as with the one message tests (see above)

orcSplitterUsingMap_Singleton

As explained above, this orchestration is configured to only ever allow one running instance of itself. Therefore it had similar results to the single message tests. The times were just doubled as two large messages were processed sequentially. From above, two instances of the orcSplitterUsingMap could be running at the same time, but any more and the orcSplitterUsingMap_Singleton will have to be used.

Goods and Bads: Basically the same comments as with the one message tests (see above)

orcSplitterUsingSql (local SplitterHelper DB)  and orcSplitterUsingSql (remote SplitterHelper DB)

Grouped these two together because they had very similar results. This was fun to watch in the debug window. Two orcSplitterUsingSql orchestration instances would be started up at almost exactly the same time. I then had debug statements to see when the .Net helper component was called by the orchestration instances. Both orchestration instances would then try to call it a the same time, but there was just not enough memory to go around, and one instance eventually won out and started to call the stored procedure on the SplitterHelper Database. In HAT, I could see the losing orchestration instance dehydrating and rehydrating for another retry at the atomic scope shape. The losing instance upon rehydration, would then start at the top of the orchestration (last persistence point). This pattern repeated a few times, with the dehydration and rehydration of the loser orchestration instance. The winning orchestration instances call to the .Net helper component completed and now the losing orchestration instance started again to call the .Net helper component. The winning orchestration instance was now in a position to start splitting messages. The winning instance would split between 0 and 17 messages, and the loser instance would repeat the cycle of trying to call the .Net Helper  component, dehydrate and then rehydrate. Eventually, nothing was happening, but then the BizTalk host instance automatically recycled itself (see below  error messages and warnings from the event log). After the host instance recycled and started up, the loser orchestration instance successfully called  the .Net Helper component, and then the two orcSplitterUsingSql instances happily started calling the Splitter Database at the same time to split their respective messages.

Below are some of the errors from the event log before the automatic recycling of the BizTalk host instance:

The Messaging Engine received an error from a transport adapter while notifying it with the BatchComplete event.

The Messaging Engine failed while writing tracking information to the database

The Messaging Engine encountered an error while deleting one or more messages

The Messaging Engine failed to retrieve the configuration from the database

There was a failure executing the receive pipeline: "Microsoft.BizTalk.DefaultPipelines.XMLReceive" Source: "XML disassembler" Receive Location:  "SQL://redlake/SplitterHelper/" Reason: Exception of type System.OutOfMemoryException was thrown

The Messaging engine failed to process a message submitted by adapter:SQL Source URL:SQL://redlake/SplitterHelper/. Details:Could not find a matching  subscription for the message. This error occurs if the subscribed orchestration schedule or send port has not been started, or if some of the message properties necessary for subscription evaluation have not been promoted. Please refer to Health and Activity Tracking tool for more detailed information on this failure

Finally two warning messages appeared in the event log, announcing the recycling of the BizTalk server instance:

All receive locations are being temporarily disabled because either the MessageBox or Configuration database is not available. When these databases become available, the receive locations will be automatically enabled.

An error has occurred that requires the BizTalk service to terminate. The most common causes are an unexpected out of memory error and an inability to connect or a loss of connectivity to one of the BizTalk databases.  The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.

Note: No split messages were lost or duplicated when the host instance was recycled. BizTalk maintains the state of the orchestration instances with  persistence points.

Note: I sometimes became impatient waiting for the host instance to recycle itself. I then went into the Windows Task Manager and clicked on the Processes Tab. I would them find BTSNTSvc.exe process, right mouse buttoned on it to launch the pop up menu and clicked on End Process to terminate  the BizTalk host instance (Note: This is a real nasty thing to do and I do not recommend it). After manually terminating the BTSNTSvc.exe process the host instance automatically restarted itself and then finished off processing the two orchestration instances.

Note: As explained above, the two orchestration instances were able to successfully complete after the host instance was recycled. One of the problems (I think), was that the winning orchestration instance was still holding onto a good chunk of memory (even after it called the .Net Helper component). But after the host instance recycled, the losing orchestration instance was able to get enough memory to process.

Note : As explained above the .Net Helper component is called within an Atomic Scope shape. A property on the Atomic Scope shape is called Batch and can be set to either True or False. If this property was set to True, and two orcSplitterUsingSql instances are running, only one instance is allowed to call the .Net Helper component and the other has to wait (I observed this by having a Debug statement inside the  .Net Helper component). If it is set to False, then by watching the debug statements, both can call it simultaneously. Either way, true of false, the host instance had the same behavior of recycling itself. Setting it to True might be more beneficial in this case. HERE is a scenario where you might want to set it to False.

Note: Sometimes both orchestration instances dehydrated and rehydrated a number of times before one successfully called the .Net Helper component. The different patterns of dehydrating and rehydrating were observed when the batch property on the atomic scope shape was changed and when the two orchestration instances were launched at different times.

Note: HERE is another scenario similar to the above on processing and mapping a large message.

The good:
a) The good news is that no messages were lost, due to persistence points, and both orchestrations instances successfully completed. But other than that, there is really nothing good here.

The bad:
a) Recycling of BizTalk Hosts instances on a regular basis, is not acceptable in a production environment.
b) This is not a good pattern for multiple simultaneous large messages to be processed. If there was multiple instances running in the Host, then they could share the load and take on one large message each. But at some point there might be more large  messages to process then available host instances (also you might run out of money buying more Enterprise versions of BizTalk Server). More memory on the BizTalk machine might help also. Moving the BizTalk databases to another physical machine would also help.
Note: One fix is to allow only one running singleton instance of this orchestration (see below).

orcSplitterUsingSql_Singleton (local SplitterHelper DB)  and orcSplitterUsingSql_Singleton (remote SplitterHelper DB)

As explained above, this orchestration is configured to only ever allow one running instance of itself. Therefore it had similar results to the single message tests. The times were just doubled as two large messages were processed sequentially.

The Good:
1) This pattern can handle simultaneous large  messages, by processing one at a time.
2) This pattern is much less CPU intensive then the map patterns.
3) Other orchestrations and operations can be running on the BizTalk server, while the sql splitter orchestration instance is splitting messages.
4) The host did not recycle, because only one orcSplitterUsingSql_Singleton orchestration instance is ever running.

The Bad:
Basically the same comments as with the one message tests (see above)

The overall winner:

If more than one large message is being submitted in the same timeframe, then the singleton orchestrations must be used. This is somewhat dependent on hardware for each BizTalk instance and the number of BizTalk instances assigned to the host running the orchestrations. The performance and time winner is clearly the orcSplitterUsingSql_Singleton (remote SplitterHelper DB), but the downside to this method is that a SplitterHelper database must be maintained and the extra code (.Net and T-Sql) that must be written to get this to work. Additionally, performance will be improved if the SplitterHelper database is installed on a dedicated machine.

BizTalk 2006 Tests

Some of the same tests were performed against the BizTalk 2006 installation. BizTalk 2006 was installed on a VPC Windows 2003 image (running on a Windows XP host) with only 700MB assigned to the VPC image. The BizTalk 2006 installation was handicapped compared to the BizTalk 2004 installation and not all the tests were run. Until the tests using BizTalk 2006 can be performed on the same hardware as the BizTalk 2004 installation, the comparisons are not accurate.

The patterns I do have results for are as below (one message being processed):

orcSplitterUsingMap -> 48:36
orcSplitterUsingSql (remote SplitterHelper DB) -> 17:01

Some of the observations I did make are:
The map to get the distinct list of production orders was much quicker on BizTalk 2006. 2 minutes compared to over 8 minutes on BizTalk 2004. When Splitting the messages using a map, the CPU was not pinned, but this mapping operation was very slow. I am attributing it to the fact that BizTalk 2006 being installed on a VPC image with only 700MB of memory assigned to it. 

Final Thoughts

From reading -> Large messages in BizTalk 2004, what's the deal? it discusses 
performance problems processing large messages with BizTalk 2004. Therefore, something to consider is using Sql Server to help with message processing in BizTalk:
1) This post has discussed, a BizTalk splitter pattern that has used Sql Server to help process a large message.
2) This POST discusses using Sql Server to help with a BizTalk aggregator pattern.
3) This POST discusses processing a large flat file using a Sql Server BCP operation to help split the message. 
4) A future title of a post could be: Extending BizTalk Mapping and Transformations using Sql Server 2000/2005. A good demo for this future post, would be passing two large XML messages down into a Sql Server stored procedure, then using T-Sql to perform the actual mapping of the messages.
5) As mentioned above. Sql 2005 has quite of number of improvements and enhancements for processing XML.

I really would like to try the tests discussed in this entry on a proper BizTalk 2006 installation (I just do not have the time or spare machine to do so at the moment).  From reading the documentation and from my few observations, there are real performance improvements with large messages and BizTalk 2006. 

Please read all the posts in the below blogs. These are written by the BizTalk team members at Microsoft. Having an understanding of how BizTalk works internally is a must before you start your first BizTalk project.
BizTalk Performance Blog:
Core Engine Blog

The below whitepaper is a good read for performance and BizTalk
Microsoft BizTalk Server 2004 Performance Characteristics Whitepaper

You can download the samples discussed above HERE LINK and try on a BizTalk 2004 and/or BizTalk 2006 installation. Read the ReadMe.Txt before installing and running.

 

 

 


 

BizTalk Schema Presentation

I am doing a BizTalk Presentation on Wed June 29 in Toronto.
Details can be found below.

Theme for the night -> BizTalk Schema DeepDive

First Presentation:

An Introduction to BizTalk Schemas

One of the first steps in designing a Messaging or Integration system is to have an understanding of the messages that will exchanged between the various participating applications.A paradigm to describe this is SOA (Service Oriented Architecture) and Contract First. In BizTalk, Schemas are used to define the XML, Flat File, EDI, and other types of messages that will be exchanged between the various applications.

Some of the topics discussed and demonstrations will include:

1) Why we need Schemas from a Business perspective and BizTalk perspective.
2) Various methods to create XML Schemas in the BizTalk development environment.
3) Using xsd schemas from other sources.
4) Uses for Schemas in BizTalk.
5) Property Schemas in BizTalk.
6) How to split messages in BizTalk using an Envelope Schema.

Second Presentation:

BizTalk Schemas and Non XML Message Formats

In Today's world, many different types of messages formats exist, for example Flat File, EDI, HL7. A .NET object can also be used to represent and store information about a particular message. In BizTalk, there are many mechanisms in place to create and consume these various types of messages.

Some of the topics discussed and demonstrations will include:

1) Creating Schemas to represent XML that will be deserialized into .NET objects and to represent xml serialized from .Net Objects
2) Flat File Schema Creation
3) Other Types of Schemas (for example HL7) that are available in BizTalk Accelerators.

To sign up please go here to the Toronto BizTalk Usergroup site:

http://www.btug.biz/