Black Box Testing, Service Boundaries and Persistence

When testing persistence, I often write an NUnit test that programmatically creates a new entity, jams some data into (hard coded into my test) then calls a data access layer to persist it, then I create a new entity and ask my DAL to load it from the database (the same identifier I used to create it). Then I just compare the entities.

A developer I work with today showed me what he does. He creates an XML file with some test data and has a generic utitlity class to persist that into the database. He then creates a new entity

Chicken and Egg, TDD, Class Modeling, DevDrivenTesting, ModelDrivenTesting

NUNit Testing Practices

Chicken and Egg, TDD, Class Modeling, DevDrivenTesting, ModelDrivenTesting
-Create A Test First, and use it to code gen the class you want to implement.
-Create A Class First, and use it to code gen a stubbed test.
-Model a Class, Capture Meta Data about the way it's supposed to work, and then generate both the class and unit tests

Resuable (almost automatic) Transactions

Can't afford the high costs of COM+ performance overhead in the distributed transaction co-ordinator, but still want somewhat automatic transactions? Same connection? Same Transactions, but different Dac's?

DacBase dacs = new DacBase[3];

dacs[0] = new OrderDac();
dacs[1] = new CustomerDac();
dacs[2] = new EmployeeDac();


trans = DbHelper.BeginTrans();
for i = 1 to dacs.length
{
 dacs[].update(trans)
}

trans.Commit();


OrderDac.Update(entity, trans)
CustomerDac.Update(entity, trans)

Setting Boundaries for Services

So what does it mean to design an autonomous service.  Based on my previous post, there are two possible issues to consider.  First, the service needs to have a life outside of the client making the request.  Second, the service needs to be self-healing in that any dependence on the actual endpoint of services that are used must be mitigated. To put this second point into an example, if Service A invokes Service B, then Service A must be capable of discovering Service B should Service B move.  Service A should not be dependent on any manually updated configuration information to use Service B. Unfortunately, neither of these two considerations really help to determine what the boundaries of an autonomous service should be. 

To get a grasp on the criteria that we use for bounding a service, consider the following hierarchy.

Service Hierarchy Diagram

Figure 1 - Service Hierarchy

The process service is a high-level interface where a single service method call invokes a series of smaller steps.  These smaller steps could be either another process or a call to a business entity service.  Eventually, at the bottom of each of the paths, there will be one or more business entity services. These business entities don't contain any data, but instead interact with a data source through a data representation layer.  Each of the blocks in the hierarchy above the level of the data source *can* be a service.  Whether they are or not is one of the questions to be answered.

Follow the data

The definition I have found most useful for identifying the boundary for a service is one across which data is passed.  If there is no data moving between the caller and the callee, there is little need for a service-based implementation. Consider a service that provides nothing but functionality with no data.  One that, for example, takes a single number and returns an array of the prime factors.  While such a service could definitely be created, the rationale for implementing it as a service is thin.  After all, the same functionality could be embedded into an assembly and deployed with an application.  Worried about being able to update it regulary?  Place it onto a web server and use zero-touch deployment to allow for dynamic updating. So when trying to define the services, follow the data. 

Given that little nugget of wisdom, take another look at the hierarchy in Figure 1.  For someone to call a process service, some data must be provided.  In particular, it needs to be passed sufficient information for the process to 'do its thing'.  Want to invoke the “CreateOrder” process service?  Give the service enough information to be able to create the order.  This means both customer and product details.  When defining the business services involved in the process (the next level in the hierarchy), the same type of examination needs to be made.  Look at the places in the process where data is passed.  These data transfer points are the starting point for boundary definition.  

Keep it Chunky

The other criteria I use for defining service boundaries is based on the relatively nebulous concept of 'chunkiness'.  The basic premise goes back to the first tenet of services.  That is, calls into a service may be expensive.  This is not surprising given that the movement of data across process or system boundaries is usually part of the process.  As a result of the potential delay, the calling applications performance is improved by keeping the number of service calls to a minimum.  This runs counter to the 'normal' coding style of setting properties and invoking methods on local objects. 

Once the data flow has been identified (the object sequence diagram is actually quite useful in this regard), look at the interactions between two classes.  If there is a series of call/response patterns that is visible, that interaction is ripe for coalescing into a single service call. 

The downside of this approach is potentially providing more information that would normally be needed.  Say that the normal call/response pattern goes something like the following:

Order o = new Order(customerId);
OrderLine ol;
ol = o.OrderLines.Add(productId1, quantity1);
ol.ShipByDate = DateTime.Now.AddDays(2);
ol = o.OrderLines.Add(productId2, quantity2);

In order to support the creation of order lines both with and without a custom shipby date, the parameter list for any service would have to change.  But there is a solution.  One of the strengths of XML is its flexibility in this regard.  The acceptible schema can be different.  These differences can then be identified programmatically and the results changed as needed.  For this reason, we usually pass XML documents as the parameter for service calls. 

The result of this is a sense of where the boundaries of a service should be. First, look at the data passed between objects.  Identify any series of calls between two objects.  Then group the data passed through these calls into a single service using an XML document as the parameter. 

Will this logic work for every possible case?  Maybe not.  But more often than you think, this kind of design breakdown will result a decent set of boundary definitions for the required services.  The one drawback frequently identified by people is that this approach does not directly consider where the data is stored.  While this is true, it is not that imperative.  Accessing a data source can either be done through a separate service (identified by this analysis process) or through local objects.  In other words, the segragation of data along business or process service boundaries is not necessarily a given. Nor, as it turns out, is it even a requirement. 

How can I do a "Join" between two DataTables?

It's tempting to think that with objects like “DataView” and methods like “Select” that DataSets would support this option, but the bottom line is that they don't. A DataView is a sorted and/or filtered list of rows from 1 table only. Also, the Select method is on the DataTable and as such - can't look at other tables.

There are a few options to consider.

  1. Do the join in SQL. SQL is good at this kind of operation. It would require you to have a new DataTable with the composed list of columns from the two tables - and a DataAdapter with at least a SelectCommand specified to support the fill. You would perform your join in the SelectCommand.CommandText. If you are also retrieving data into 2 separate DataTables for other reasons, then of course you have some redundant data and also some synchronization issues between the 2 individual tables and the joined table. The joined table will not reflect changes in the 2 base tables until you update the database and re-execute the Fill on the joined table.
  2. Like the above solution, you could create a 3rd table with the composed list of columns in your DataSet but instead of loading from the dataset, you could copy the data from the base tables yourself using the AddRow method. If the two base tables share the same primary key, you could try to “merge” the data from the two DataTable into the 3rd DataTable.
  3. If the two tables to be joined are in a master detail relationship you can use expression columns to lookup data in a parent DataTable or aggregate records from a Child Table. For example if you have a Customer DataTable with a stateId column, which is a foreign key to a State DataTable with stateId and stateName columns, you can add a computed DataColumn to the Customer DataTable with an expression of Parent.stateName. This new column will be kept in sync if the underlying name changes in the State DataTable or if the stateId is changed on the Customer DataTable to point to a different State. Similarly you can look up values on a child DataTable from a parent DataTable but since there can be 1 or more rows you will typically need aggregates like Sum, Avg, Min, Max, etc. in you expression. The DataColumn.Expression property online help is valuable for the types of expressions you can use.

How do I create a Crystal Report from a DataSet

http://www.tek-tips.com/gfaqs.cfm/pid/796/fid/3940

How can I improve the loading of my datasets?

Datasets maintain some internal indexes to improve performs for things like finds and selects.

When you are loading more than 1 row into a DataTable - with a DataAdapter.Fill or other technique, you can turn this index maintenance off by doing a

MyTable.BeginLoadData()

Where MyTable is the reference to your DataTable, which could be

MyDataSet.Tables[0].BeginLoadData()

When you are done loading the data, don't forget to turn the index maintenance back on with.

MyDataSet.Tables[0].EndLoadData()

When loading multiple tables, you should turn on/off 1 table at a time.

Another thing that Datasets do while having rows added to tables is to validate them. Things like primary key uniqueness, foreign key referential integrity and nulls in columns with AllowDBNull = false are some examples of things that must be checked. Again, you can save some cycles by turning this off during loading of a dataset and turning it back on afterward. This can be done with:

MyDataSet.EnforceConstraints = false

And of course when you are done loading, you can perform a:

MyDataSet.EnforceConstraints = true

Of course you may get a “ConstraintException” on this last line if there are problems with your data. Otherwise, you'd get this exception as soon as the offending row is loaded. On a related note, you can check DataSet.HasErrors and each DataTable.HasErrors for any errors. For each table you can call DataTable.GetErrors() to get a list of rows with errors. Each row has a RowError property that contains any error text related to the entire row, and also a GetColumnError() method that you can use to test each column for a column specific error message.

You know you're a geek when...

The other day while sitting on the couch eating some M&M's with my 5yr old daughter she asks me what “MGM” means....and starts to try to pronounce “megum“ - she hasn't quite got the idea of acronyms yet.

I explain how “&” means “And” and it's “M and M's”. Claire comes back at me with “well why didn't they just use plus (+)”. The next 10 minutes are me explaining concatenation to her and how it differs from addition. I'm pretty sure she's going to answer her kindergarten teacher's question of  “What is 4 AND 5” with “45” next week at school....”and that if you really were looking for 9 you should have asked properly 'what is 4 PLUS 5' not AND”. I can hardly wait for parent teacher interviews.

Hmmm, maybe I should teach her what “and” means in terms of boolean logic.

New Smart Client Reference Application - IssueVision

This is a new smart client reference application from Microsoft. Actually it was created by Vertigo for Microsoft - where Susan Warren now works (former Queen of ASP.NET). This is not a rewrite of TaskVision which is a common question. It was built to show off some advanced topics for Smart Client apps in conjunction with the recent DevDays events that have been going on in the U.S. but unfortunately haven't made it up to Canada due to some overloaded efforts going into VS Live.

You can download this from Microsoft although it's not the easiest thing to find.

Some of the interesting highlights:

  • focus on security....some wrapped up DPAPI classes.
  • Application Deployment and Updating

This app wasn't built with the recently released offline application block since the timing wasn't right - but nevertheless, a good fresh reference app worth looking at.

Building Maintainable Applications with Logging and Instrumentation

I'm doing this MSDN webcast in a few weeks

10/05/2004 1:00 PM - 10/05/2004 2:00 PM (EasternTime)

In this session we'll cover the world of logging and instrumenting your application. We'll discuss the various .NET framework components as well as higher level services as provided by the Exception Management Application Block, the Enterprise Instrumentation Framework and the Logging BLock. We'll discuss the various issues with persisting information in file logs, the event log, and WMI Performance Counters. We will also compare other alternative technologies such as log4net. We'll also discuss best practices for loging and instrumenting your application and provide some considerations for when and where it makes good sense to instrument your application from experiences in the field.

Update: The slides, samples and livemeeting recording links can all be found here.