Resuable (almost automatic) Transactions

Can't afford the high costs of COM+ performance overhead in the distributed transaction co-ordinator, but still want somewhat automatic transactions? Same connection? Same Transactions, but different Dac's?

DacBase dacs = new DacBase[3];

dacs[0] = new OrderDac();
dacs[1] = new CustomerDac();
dacs[2] = new EmployeeDac();


trans = DbHelper.BeginTrans();
for i = 1 to dacs.length
{
 dacs[].update(trans)
}

trans.Commit();


OrderDac.Update(entity, trans)
CustomerDac.Update(entity, trans)

Database Access Layers and Testing

I've been doing a lot of testing lately. A lot. I'm building a database agnostic data access layer. It has to be as performant as using typed providers, and even support the optional use of a typed provider. For example, we want to allow developers to build database agnostic data access components (dac). Sometimes however, there is just too much difference and to write standard Sql requires too much of a sacrifice. So in these cases, we want to allow a developer to write a high performance dac for each of the dac's, giving them ultimate control to tweak the data access for one database or another - and use a factory at runtime to instantiate the correct one. Of course they have to implement the same interface so that the business components can talk to an abstract dac. So normally developers can talk to their database through the agnostic DbHelper, but when they want, they can drop down to SqlHelper or OracleHelper.

We also want to support a rich design time environment. Creating DataAdapters and SqlCommands in C# isn't fun. Good developers botch these up too easily - not creating parameters right - or worst not using parameters at all and opening themselves up to sql injection. The typed DbCommand and DbDataAdapter's allow for a rich design time painting of sql and generation of parameters when used on a sub-class of System.Component. Of course, developers aren't stuck with design time - they are free to drop into the code when they want to.

In building this data access layer, I've being doing deep research and testing on a lot of different data access blocks - including ones I've authored in the past. I've taken stuff from examples like PetShop and ShadowFax and of course looked at the PAG groups Data Access Block, including the latest revision. I'm very happy with where the code stands today.

One of the features missing from all of these is a streaming data access component. In some cases we have a need to start writing to the response stream of a web service before the database access is complete. So I've tackled this problem using an event model which works nicely. You can put “delegate“ or “callback“ labels on that technique too and they'll stick.

One of the interesting “tricks” was to use the SqlProvider as an agnostic provider during design time. We wanted to allow developers to use design time support and so I went down the path of creating my own agnostic implementors of IDataAdapter, IDbConnection, IDbCommand, etc. etc. The idea was that at runtime, we'd marshal these classes into the type specific provider objects based on the configuration. I was about 4 hours into this exercise when I realized I was pretty much rewriting SqlCommand, SqlDataAdapter, SqlConnection, etc. etc. What would stop me from using the Sql provider objects as my agnostic objects? At runtime, if my provider is configured for Oracle, I use the OracleHelper's “CreateTyped“ commands to marshal the various objects into Oracle objects, of course talking to them through the Interface. As a shortcut, if my provider is configured for Sql at runtime, I just use the objects as they are.

The neat fall out feature from this is that you can write an entire application against SqlServer, using SqlHelper if you like, and if you are thrown a curve ball to use Oracle, the only managed code changes are to change your reference from SqlHelper to DbHelper and everything else all works. Mileage will of course vary depending on how many times you used Sql'y things like “Top 1“. Just as importantly however, developers using this data block learn only 1 type of data access and the same technique applies to all other databases.

One of the sad things is how this thing evolved. In the beginning there was bits of no less than 3 data access blocks in this class library. I spun my wheels quite a bit going back and forth and doing a lot of prototyping under some heat from developers who need to start writing dac's. Because I was starting from chunks of existing code, somehow NUnit tests didn't magically happen. So I've spent the past few days working to a goal of 90% coverage of the code by NUnit tests. It's tough starting from scratch. Not only have I been finding & fixing lots of bugs, you won't be surprised that my testing has inspired the odd design change. I'm really glad I got the time to put the effort on the NUnit tests because sooner or later those design changes would have been desired by developers using this block - and by that time - the code would have been to brittle to change. Certainly some of my efforts would have been greatly reduced had I made these design changes sooner in the process. I'm not new to the fact that catching bugs earlier makes them cheaper to fix - but nothing like getting it pounded home. I'll be pretty hesitant to take on another exercise of editing some existing code without first slapping on a bunch of NUnit tests that codify my expectations of what the code does.

 

ASP.NET Whidbey at CTTDNUG Tonight.

I'm presenting an overview on ASP.NET 2.0 tonight at CTTDNUG.

There isn't a great abstract on the site - and in fact, I will physically be unable to do the objectspaces stuff since the new version of VSNET CTP doesn't even have it in it anymore. Don't read into that - objectspaces will still be coming out - at some point. I should be able to give some nice objectspaces PPT's if the crowd is interested - but I'm guessing that Demo's are going to be more enjoyable.

So I am going to do my best ScottGu thrie impersonation and give a good solid demo lap around ASP.NET. IDE Improvements, Master Pages, the new datasource stuff, Site Navigation, Security, Personalization, SqlCaching.

ADO.NET rant

Why is SqlDbType in the System.Data namespace when all the other provider specific types are in their own specific provider namespace? There is definitely some ugliness going on here. I'm not sure it's entirely a mistake.

As an aside, why is it SqlDbType and not SqlType? I can understand why OleDbType is named the way it is, but OracleType and OdbcType seemed to be named appropriately. Maybe it has something to do with the fact there is a System.Data.SqlTypes namespace and that would be just too close for comfort in the naming. Ok, so why isn't there an OdbcTypes namespace or any other types namespace for that matter? And shouldn't System.Data.SqlTypes be under System.Data.SqlClient.SqlTypes?

So what's the deal with IDataParameter and it's descendent IDbDataParameter? All of the typed provider parameter implementations implement both of these. Shouldn't they be collapsed into one? Even the IDbCommand.CreateParameter method has to return a IDbDataParameter. Furthermore, the online help for IDbDataParameter says it includes stuff for mapping to dataset columns. That's odd because the only 3 members of IDbDataParameter are Precision, Scale and Size. I'm pretty sure none of those having anything to do with Datasets. In fact, it's IDataParameter that provides this mapping in the SourceColumn member. Sheesh

Talk about your inconsistencies. It's still in WinFx from what I can tell. Please someone tell me there is a reason for this madness.

 

Debunking Dataset Myth

Many people think that datasets are stored internally as XML. What most people need to know is that Datasets are serialized as XML (even when done binary) but that doesn't mean they are stored as XML internally - although we have no easy way of knowing, it's easy to take a look at the memory footprint of datasets compared to XmlDocuments.

I know that if datasets were stored as XML, then in theory, datasets should be larger since BeginLoadData/EndLoadData implies there are internal indexes maintained along with the data.

It's not easy to get the size of an object in memory, but here is my attempt.

long bytecount = System.GC.GetTotalMemory(true);
DataSet1 ds =
new DataSet1();
ds.EnforceConstraints =
false;
ds.Order_Details.BeginLoadData();
ds.Orders.BeginLoadData();
ds.ReadXml("c:\\test.xml");
bytecount = System.GC.GetTotalMemory(
true) - bytecount;
MessageBox.Show("Loaded - Waiting. Total K = " + (bytecount/1024).ToString());

long bytecount = System.GC.GetTotalMemory(true);
System.Xml.XmlDocument xmlDoc =
new System.Xml.XmlDocument();
xmlDoc.Load("c:\\test.xml");
bytecount = System.GC.GetTotalMemory(
true) - bytecount;
MessageBox.Show("Loaded - Waiting. Total K = " + (bytecount/1024).ToString());

I tried these examples with two different xml files - both storing orders & orderdetails out of the northwind database. The first example was the entire result set of both tables. The dataset memory size was approximately 607K. The XmlDocument was 1894K, over 3 times larger. On a second test, I used only 1 record in both the order and order details tables. The dataset in this case took 24K and the XmlDocument took 26K, a small difference.  You will notice that in my dataset example I have turned off index maintenance on the dataset by using BeginLoadData. Taking this code out resulted in a dataset of 669K, an increase of approximately 10%. An interesting note is that if you put in a BeginLoadData and EndLoadData, the net size of the dataset is only 661K. This would imply that leaving index maintenance on during loads is inefficient in memory usage.

The speed of loading from XML is a different story.  Because the XmlDocument delays (I'm assuming) the parsing of the XmlDocument, the time to load of the full dataset from an XML file is 1/3rd of the time to load the DataSet from XML. I would be careful in being too concerned about this. Loading a dataset from a relational source like a DataAdapter that involves no Xml parsing and is much faster.

If you load up Anakrino and take a look at how the Dataset stores it's data, each DataTable has a collection of columns, and each column is in fact a strongly type storage array. Each type of storage array has an appropriate private member array of the underlying value type (integer, string, etc.). The storage array also maintains a bit array that is used to keep track of which rows for that array are null. The bit array is always checked first before going to the typed storage array and returns either null or the default value. That's pretty tight.

Datasets vs. Custom Entities

So you want to build your own entity objects? Maybe you are even purchasing or authoring a code-gen tool to do it for you. I like to use Datasets when possible and people ask why I like them so much. To be fair, I'll write a list of reasons to not use datasets and create your own entities - but for now, this post is all about the pros of datasets. I've been on a two week sales pitch for DataSets with a client so let me summarize.

  • They are very bindable.
    This is less of an issue for Web forms which don't support 2 way databinding. But for Win forms, datasets are a no brainer. Before you go and say that custom classes are just as bindable and could be, go try an example of implementing IListSource, IList, IBindingList and IEditableObject. Yes you can make your own custom class just as bindable if you want to work at it.
  • Easy persistence.
    This is a huge one. Firstly, the DataAdapter is almost as important as the DataSet itself. You have full control over the Select, Insert, Update and Delete sql and can use procs if you like. There are flavours for each database. There is a mappings collection that can isolate you from changes in names in your database. But that's not all that is required for persistence. What about optimistic concurrency? The DataSet takes care of remembering the original values of columns so you can use that information in your where clause to look for the record in the same state as when you retrieved it. But wait, there's more. Keeping track of the Row State so you know whether you have to issue deletes, inserts, or updates against that data. These are all things that you'd likely have to do in your own custom class.
  • They are sortable.
    The DataView makes sorting DataTables very easy.
  • They are filterable.
    DataView to the rescue here as well. In addition to filtering on column value conditions - you can also filter on row states.
  • Strongly Typed Datasets defined by XSD's.
    Your own custom classes would probably be strongly typed too...but would they be code generated out of an XSD file? I've seen some strongly typed collection generators that use an XML file but that's not really the right type of document to define schema with.
  • Excellent XML integration.
    DataSets provide built in XML Serialization with the ReadXml and WriteXml methods. Not surprising, the XML conforms to the schema defined by the XSD file (if we are talking about a strongly typed dataset). You can also stipulate whether columns should be attributes or elements and whether related tables should be nested or not. This all becomes really nice when you start integrating with 3rd party (or 1st party) tools such as BizTalk or InfoPath. And finally, you can of course return a DataSet from a Web Service and the data is serialized with XML automatically.
  • Computed Columns
    You can add your own columns to a DataTable that are computed based on other values. This can even be a lookup on another DataTable or an aggregate of a child table.
  • Relations
    Speaking of child tables, yes, you can have complex DataSets with multiple tables in a master detail hierarchy. This is pretty helpful in a number of ways. Both programmatically and visually through binding, you can navigate the relationship from a single record in master table to a collection of child rows related to that parent. You can also enforce the the referential integrity between the two without having to run to the database. You can also insert rows into the child based on the context of the parent record so that the primary key is migrated down into the foreign key columns of the child automatically.
  • Data Validation
    DataSets help with this although it's not typically thought of as an important feature. It is though. Simple validations can be done by the DataSet itself. Some simple checks include: Data Type, Not Null, Max Length, Referential Integrity, Uniqueness. The DataSet also provides an event model for column changing and row changing (adding & deleting) so you can trap these events and prevent data from getting into the DataSet programmatically. Finally with the SetRowError and SetColumnError you can mark elements in the DataSet with an error condition that is can be queried or shown through binding with the ErrorProvider. You can do this to your own custom entities with implementation of the IDataErrorInfo interface.
  • AutoIncrementing values
    Useful for columns mapped to identity columns or otherwise sequential values.

This is not an exhaustive list but I'm already exhausted. In a future post, I'll make a case for custom entities and not DataSets, but I can tell you right now that it will be a smaller list.

Active Directory Application Mode

I haven't had much chance to use many of the cool things in Windows 2003 to date, but one of the new things (that incidentally also runs on XP Pro) is a new mode of Active Directory called Application Mode - in total ADAM. I'm finally getting to do some real playing around with this for a large application I've just started working on for a client.

It's basically a standalone active directory that is ideal for storing your own users and roles etc. to be used by your application in an active directory style - even if your company isn't using active directory. If you do go to AD down the road - it's a simple migration for your app. ADAM also acts as an LDAP server as well which makes it a bit more open. You can really put whatever you want into ADAM as it's schema is extensible (not unlike Active Directory). The idea though is that you can have multiple instances of ADAM installed on your server - each containing data specific to a unique application - while AD would store more globally required data throughout the enterprise.

It's pretty typical to store this type of application specific data historically into a SQL database. While that's possible, ADAM - and more specifically the underlying AD is more geared to this type of data. A relational DB remains an ideal choice for transactionally updated data, but ADAM is a great place to store any kind of administrative data that is, for the most part, written to once, and then read frequently by your application.

I'm going to be playing more with this, and specifically doing some performance testing and seeing what kind of improvements can be made by using it in the middle tier, caching some of the data in a wrapper object that is hosted in COM+ and pooled.

As an aside, I find it kind of strange that Whidbey - and specifically the new ASP.NET membership/roles stuff that is built in doesn't use ADAM - but instead opts for the classic database solution. Fortunately the membership/role model in ASP.NET Whidbey is an extensible provider model so I may just take a crack at creating my own provider that uses ADAM.

I should probably google that now as someone has probably already been there and done that.

Where is the holistic vision?

Microsoft is needs a holistic view of their platforms and development tools. I didn't see that this week, I doubt any of us will.

What I did see was 7 data access techniques in various stages of ready now, coming soon, and coming much later:

  • Use a DataAdapter to issue SQL  or stored procedures. Useful for working with a DataSet and doing optimistic updates.
  • Use a DataReader to issue a SELECT or stored procedure to walk through rows 1 at a time. Used for fast streaming of read only data.
  • SqlDataSources. From what I can tell, this is a terrible thing. I've seen this in two places so I'm not sure they are the same. The first demo I saw of this it was just called a “Data Source“ It was in a windows forms application and it looks  like a typed dataset (and in fact is/uses them) but it also embeds SQL queries (and updates, inserts and deletes) into the actual typed dataset. But I use datasets through the various tiers of my application from UI to data access and I don't want this marshaled throughout the application. I talked to Paul Yuknewicz from the VB team about this. I told him it would be okay in the design time to do this - but you have to split these out into two classes. He said I could, but he couldn't show me. He said the typed data set and the data access classes could be in separate namespaces. When I told him I needed them in separately assemblies because my datasets go down to the win forms client, but the data access sits in my middle tier and hits the database. He took this as a good feature request, but he didn't give me a good feeling about it. I mentioned this to another guy on the Data team and he said that these Data Sources would not be for me. I suppose if you're building a dog house, it doesn't have to be well architected like a sky scraper. Unfortunately I've had to do many renovations of dog houses in the past.
  • You can now embed some of the logic you'd do post retrieval to massage some data into the database. Yes, create a C# stored proc and return the data that way. You probably want to do this if the processing reduces the amount of ASCII (or the visibility - plain text - no encryption) to be sent along the wire.
  • SQLXML - not new but improved and gaining momentum. Can you say new language to learn? XQuery, XPath. This is nice if the data you deal with outside of the database is in XML. Also good for doing master/detail updates in 1 round trip. I like this.
  • ObjectSpaces. I saw a preview of this at PDC 2 years ago. I'm going to another session to learn more details in an hour. It's changed since then but the message is the same. It's an Object to Relational mapping scheme. You define how your tables relate to your objects - and then you merrily use your objects and ObjectSpaces figures out the SQL to send to the database. Why would you do this? Well if your app embeds a lot of business logic in your objects - which it should damn it. This technique lacks SQL fidelity and you have little direct control over the SQL emitted. Hmm. If I don't know what SQL will be issued how will I determine the best indices? How does the SQL Server team feel about this? I've already heard from Michael Pizzo that you can't expect the same performance as native SQL.
  • WinFS. Longhorn includes a new file system. My interpretation is that it's less of a file system and more of an intermediate object model that sits over top of NTFS and the next version of SQL. WinFS let's you view your SQL data as files. A record is a file. Not all files become records though. (I can hear DBAs everywhere sighing). The idea here is to “give users excellent windows experiences“ by getting data out of the silos of applications and bringing it to the shell to be integrated between applications the way users want it. I like the idea in theory but the security aspects scare me a bit. The nice thing is that this is coming in Longhorn - a long time from now so we have time to think about this more...and for MS to try and get it right. You can bind controls in your forms to properties on files (which are records - so those are just columns). Where is the business logic in that tier? Welcome back client server. Not sure how this all works on a LAN.

I was also saw at least 3 user interface models:

  • Traditional Smart Client Window Forms. These don't' change much in Whidbey.
  • ASP.NET Web Forms. Changing a bit. Still HTML coded into an aspx file.
  • Avalon XAML Forms. See my previous post about this. The key point is that it attempts to bring the benefits of Web Forms to Windows Forms and vice versa.

Let's not get into Pocket PC, Smart Phones, WML, Tablet Ink or Media Center. That would make this blog more ridiculous than it already is. Suffice it say that each of these 3 user interfaces all have their own databinding techniques and they are all different. I saw some of the ASP.NET Whidbey two-way databinding today and its nothing like WinForms binding. It's like that team never talks to the windows team. Sigh.

What I didn't see was anything about COM+, Enterprise Services. Where the hell was COM+. Does that all disappear with Indigo? Indigo is only supposed to be a communication platform right?

I guess it's my job as an architectural consultant to value each of these technologies, how they fit together ultimately - and a road map for doing something useful today - that will be useful tomorrow. Nobody at Microsoft has done this yet.

The scary thing is that it seems like all these teams are working in silos and not talking to each other enough. Seems to be a theme. Don't get me wrong, the are doing fabulous stuff but everybody has a different way of doing things. It seems like they spend lots of time developing their technology but not enough time building real applications that sit on top of them.

The good news is that its early for a lot of this stuff and lots of time to fix it and get it right. A lot of Microsoft guys were seeing some of this stuff for the first time themselves.

I suspect that the MSDN Prescriptive Architecture Group has some work cut out for them.

MS Professional Developers Conference...

...here we come.

Myself and fellow ObjectSharpies DennisLee and DaveLloyd are heading out Saturday morning for the PDC in L.A. Stay tuned for our perspectives on Whidbey (essentially .NET 2.0), Yukon (SQL Server 2004?), and Longhorn (Windows 2005?). Key pieces of technology I'll be researching are Indigo (new web services framework) and Avalon (new windows forms framework).

I'm also looking forward to seeing MSBuild and trying to understand how it will compare, compete or compliment NAnt. I still don't see anything about the next version of Visual Source Safe - what's up with that?

If you are in town, look me up and I'll buy you a beer.