Accessing Delete Row Information

I recently had the need to access the information in a row marked as “Deleted“ in a DataSet as part of the update process.  The first attempted netted a DeletedRowInaccessibleException exception.  It is not possible to access the current information for a data row once the row has been deleted. 

However, the very notion that there is 'current' information is a clue to the solution.  If it were possible to specify the version of the information to be retrieve, then it would be possible to use the original information.  As it turns out, this can be done through the indexer on a DataRow.  While the normal access mechanism for a DataRow would be

row[”ColumnName”]

to retrieve the original information in a deleted row, the desired version must be specified, as seen here

row[”ColumnName”, DataRowVersion.Original]

Working with Default Rows

Occasionally, a situation arises where the contents of a particular data table within a Dataset *might* originate from more than one source.  For example, a client of ours has a reporting system where there is a set of default columns for each report type.  When a user creates a new report (stored in the Report table in the Dataset), the default columns (the ReportColumns table) are loaded from a DEFAULT_COLUMN table.  If any change is made to the information in ReportColumns (like changing a caption, for example), all of the column information is copied and saved in a REPORT_COLUMNS table.  However, if no change is made, no information needs to be saved.  This allows changes in the default columns to be automatically propagated to any report that hasn't been 'customized'.

All of this functionality is easy to add to the Load method of an entity based on this DataSet.  However, what happens when a Report that uses the default columns is deleted.  Normal processing would have the related ReportColumn rows deleted, followed by the corresponding Report row.  However, when the delete is performed on each of the ReportColumn rows, a DBConcurrencyException is thrown.  The message with this exception is as follows:

Concurrency violation: The DeleteCommand affected 0 records.

With a little bit of thought (actually, slight more than a little the first time this error was encountered :), the reason for the error became apparent.  The records in the ReportColumns table didn't exist in the database.  Nor should they, since they were originally populated from DEFAULT_COLUMNS, not REPORT_COLUMNS.  However, the DeleteCommand was trying to remove them from REPORT_COLUMNS.  And when the DataAdapter class attempts to delete a record and doesn't see any rows being changed, it assumes that a concurrency violation has taken place. 

The solution was to take a little more control of the deletion process.

dataAdapter.ContinueUpdateOnError = true;
dataAdapter.Update(reportsData);

if (reportsData.ReportColumn.HasErrors)
{
    DataRow[] drs = reportsData.ReportColumn.GetErrors();
    foreach (DataRow dr in drs)
        if (dr.RowError.Substring(21) == "Concurrency violation")
            reportsData.ReportColumn.RemoveReportColumnRow((ReportsData.ReportColumnRow)dr);
    reportsData.ReportColumn.AcceptChanges();

// If the dataset still has errors, then an exception needs to be thrown

    if (reportsData.ReportColumn.HasErrors)
       
throw new DataException("An exception was raised while updating the ReportColumn data table: " +
           
reportsData.ReportColumn.GetErrors()[0].RowError);
}

The ContinueUpdateOnError property is used to stop the aborting of the updates on a exception.  Once this is done, we check to see if the data table has any errors.  If so, we make sure that every row that has a concurrency violation is removed from the Dataset.  The AcceptChanges method call is required to update the HasErrors flag, if all of the errors had been eliminated in this manner.

Now is this a perfect solution?  No.  It is possible that a legitimate concurrency violation is ignored.  From the point of view of this application, it is considered an acceptable risk given the type of data being used.  If you wanted to trap the 'real' concurrency problems, you could select against the REPORT_COLUMNS table using the primary key (instead of all of the original values) when a concurrency violation is detected. If a record is found, then the concurrency problem is real.  Otherwise the exception can be ignored.

DataSet Serialization: Smaller & Faster

DataSets serialize naturally to XML quite well and you have lots of control over that. Typed DataSets have the XSD with some properties that control that (and of course you can do it programmatically too). But one of the common problems with remoting DataSets is that the default binary serialization is actually just the XML Serialization ASCII. Crude. Some people have even used this fact to extrapolate that the DataSet is internally XML - which isn't true.

This is improved in Whidbey. But until then, what's a Serializer to do?

Lots of people have done some great work to do customized binary serialization of datasets. To mention a few:

  • Microsoft Knowledge Base Article 829740 by Ravinder Vuppula where he demonstrates his DataSetSurrogate class which wraps up a DataSet and converts the internal items into array lists which can then be Binary Serialized by the framework by default. It also contains a ConvertToDataSet so that you can reverse the process of a De-serialized surrogate back into a dataset.
  • Dino Esposito has demonstrates a GhostSerializer in his MSDN Article for a DataTable which does a similar ArrayList conversion thing.
  • Richard Lowe's Fast Binary Serialization
  • Dominic Cooney's Pickle
  • Angelo Scotto's CompactFormatter goes a step further with a serializable technique that doesn't rely on the Binary Formatter so it works on the Compact Framework which is event more Compact than the BinaryFormatter
  • Peter Bromberg builds on top of the CompactFormatter to support compression using Mike Krueger's ICSharpCode SharpZiplib in-memory zip libraries

 

Getting access to the Deleted Rows

If you Delete a row in a DataTable it's not really gone. It's just in a private buffer thing that hides the data out of the traditional Rows Collection and that works good for databinding and the like. The deleted rows are kept so when you do a DataAdapter.Update() it knows which rows you wanted to delete in the persistent store.

So what if you are writing your own adapter or for other reasons just need to get at the deleted rows. The trick is to create a DataView with a RowState Filter of “Deleted“ and then you can look at the DataRowViews to get individual row/column data.

DataView deleteView = new DataView(MyDataSet.MyTable, "", "", System.Data.DataViewRowState.Deleted);

foreach (DataRowView drv in deleteView)

{

string someColumnValue = (string)drv["SomeColumn"];

}

Creating Computed Columns in a DataSet

DataSets are very powerful - and one of the features that makes it so powerful is computed columns. You can create a DataColumn that does not contain storage of data but rather computes a value dynamically.

Computations can be done by SQL on retrieval, however once the data is in a DataSet, those computations are outdated. As the data changes in your DataSet, it is desirable to have the latest results of a calculation.

One of the properties of a DataColumn is "Expression". This is a string that stores a computational expression. The data type of the column should match the data type of the computed value. These expressions can calculate simple values, or you can use aggregate functions across an entire set of data. You can even use computed columns to lookup a value in a parent table.

The source code below contains three snippets of code that add computed columns to some tables. Listing 1 is a simple expression that calculates the extended price on an Order Item by multiplying the quantity by a unit price. The column is first created and then added to the table. 

The first listing assumed there was not only a Quantity column on the OrderItem Table, but also that there was a Price column. In all likelihood, you would need to look this up on a related product table assuming the OrderItem table had a foreign key to a Product Table. If you have set your DataSet up with a DataRelation for this foreign key, you can perform this lookup within the DataSet. In Listing 2, we create a computed Price column on the OrderItem table that looks up the Price on the Product Table. The syntax for referencing the rows in a parent table is:

 Parent(DataRelationName).parentcolumn 

...where DataRelationName is the name of the DataRelation between the parent and child tables. To perform parent table referencing, it is mandatory that a DataRelation is established between the parent and child tables.

The final example in Listing 3, shows a computed column that is a total of all the order item amounts for a given OrderHeader. Assuming we have created a DataRelation for the foreign key between OrderHeader and OrderItem, we can perform this child table reference. The syntax is similar to the parent relation but you use the keyword Child.

 Child(DataRelationName).parentcolumn

You should note that when referencing a child table, you are normally referring to many rows (possibly many anyway). Therefore any expressions on a child table should be done within an aggregate expression (e.g. Sum, Min, Max, Avg, etc.).

The benefit of performing your calculations within the DataSet instead of the Database is that changing values in the base columns of the DataSet will have an instant impact on your calculations. For example, users may want to know the total of an Order before they actually save it in the database. In the second example, if you change the product_id being order, the unit price and extended price will automatically change to reflect the price of the new product.

Finally, if you place this code to modify your DataSet  within you Business Logic Layer, you are able to place the critical business logic of calculations in your business objects along with the rest of your business logic, not hidden away in your Data Access Layer or database stored procedures.

You should note that in the current version of the Visual Studio.NET IDE, there is a bug with computed columns and typed Datasets. You can add the computed column to your DataSet in the schema editor, but if you specify the expression, you will get compile time errors. You should specify the expression at runtime.

[Listing 1]
stDecimal = System.Type.GetType("System.Decimal");
DataColumn dcExtPrice = new DataColumn("ExtPrice", stDecimal, "Price * Quantity", MappingType.Attribute);
dcExtPrice.AutoIncrement = false;
dcExtPrice.ReadOnly = true;
DataSet1.Tables["OrderItem"].Columns.Add(dcExtPrice);
[Listing 2]
stDecimal = System.Type.GetType("System.Decimal");
DataColumn dcPrice = new DataColumn("Price", stDecimal, "Parent(drProduct_OrderItem).Price", MappingType.Attribute);
dcPrice.AutoIncrement = false;
dcPrice.ReadOnly = true;
DataSet1.Tables["OrderItem"].Columns.Add(dcPrice);

[Listing 3]
stDecimal = System.Type.GetType("System.Decimal");
DataColumn dcTotal = new DataColumn("Total", stDecimal, "Sum(Child(drOrderHeader_OrderItem).Price)", MappingType.Attribute);
dcTotal.AutoIncrement = false;
dcTotal.ReadOnly = true;
DataSet1.Tables["OrderHeader"].Columns.Add(dcTotal);

Windows Forms DataBinding

Here is a great article on binding in Windows Forms - specifically against a dataset. Lot's of gotcha's covered.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwinforms/html/databinding_winforms10_11.asp

How can I improve the loading of my datasets?

Datasets maintain some internal indexes to improve performs for things like finds and selects.

When you are loading more than 1 row into a DataTable - with a DataAdapter.Fill or other technique, you can turn this index maintenance off by doing a

MyTable.BeginLoadData()

Where MyTable is the reference to your DataTable, which could be

MyDataSet.Tables[0].BeginLoadData()

When you are done loading the data, don't forget to turn the index maintenance back on with.

MyDataSet.Tables[0].EndLoadData()

When loading multiple tables, you should turn on/off 1 table at a time.

Another thing that Datasets do while having rows added to tables is to validate them. Things like primary key uniqueness, foreign key referential integrity and nulls in columns with AllowDBNull = false are some examples of things that must be checked. Again, you can save some cycles by turning this off during loading of a dataset and turning it back on afterward. This can be done with:

MyDataSet.EnforceConstraints = false

And of course when you are done loading, you can perform a:

MyDataSet.EnforceConstraints = true

Of course you may get a “ConstraintException” on this last line if there are problems with your data. Otherwise, you'd get this exception as soon as the offending row is loaded. On a related note, you can check DataSet.HasErrors and each DataTable.HasErrors for any errors. For each table you can call DataTable.GetErrors() to get a list of rows with errors. Each row has a RowError property that contains any error text related to the entire row, and also a GetColumnError() method that you can use to test each column for a column specific error message.