Accessing Delete Row Information

I recently had the need to access the information in a row marked as “Deleted“ in a DataSet as part of the update process.  The first attempted netted a DeletedRowInaccessibleException exception.  It is not possible to access the current information for a data row once the row has been deleted. 

However, the very notion that there is 'current' information is a clue to the solution.  If it were possible to specify the version of the information to be retrieve, then it would be possible to use the original information.  As it turns out, this can be done through the indexer on a DataRow.  While the normal access mechanism for a DataRow would be

row[”ColumnName”]

to retrieve the original information in a deleted row, the desired version must be specified, as seen here

row[”ColumnName”, DataRowVersion.Original]

Working with Default Rows

Occasionally, a situation arises where the contents of a particular data table within a Dataset *might* originate from more than one source.  For example, a client of ours has a reporting system where there is a set of default columns for each report type.  When a user creates a new report (stored in the Report table in the Dataset), the default columns (the ReportColumns table) are loaded from a DEFAULT_COLUMN table.  If any change is made to the information in ReportColumns (like changing a caption, for example), all of the column information is copied and saved in a REPORT_COLUMNS table.  However, if no change is made, no information needs to be saved.  This allows changes in the default columns to be automatically propagated to any report that hasn't been 'customized'.

All of this functionality is easy to add to the Load method of an entity based on this DataSet.  However, what happens when a Report that uses the default columns is deleted.  Normal processing would have the related ReportColumn rows deleted, followed by the corresponding Report row.  However, when the delete is performed on each of the ReportColumn rows, a DBConcurrencyException is thrown.  The message with this exception is as follows:

Concurrency violation: The DeleteCommand affected 0 records.

With a little bit of thought (actually, slight more than a little the first time this error was encountered :), the reason for the error became apparent.  The records in the ReportColumns table didn't exist in the database.  Nor should they, since they were originally populated from DEFAULT_COLUMNS, not REPORT_COLUMNS.  However, the DeleteCommand was trying to remove them from REPORT_COLUMNS.  And when the DataAdapter class attempts to delete a record and doesn't see any rows being changed, it assumes that a concurrency violation has taken place. 

The solution was to take a little more control of the deletion process.

dataAdapter.ContinueUpdateOnError = true;
dataAdapter.Update(reportsData);

if (reportsData.ReportColumn.HasErrors)
{
    DataRow[] drs = reportsData.ReportColumn.GetErrors();
    foreach (DataRow dr in drs)
        if (dr.RowError.Substring(21) == "Concurrency violation")
            reportsData.ReportColumn.RemoveReportColumnRow((ReportsData.ReportColumnRow)dr);
    reportsData.ReportColumn.AcceptChanges();

// If the dataset still has errors, then an exception needs to be thrown

    if (reportsData.ReportColumn.HasErrors)
       
throw new DataException("An exception was raised while updating the ReportColumn data table: " +
           
reportsData.ReportColumn.GetErrors()[0].RowError);
}

The ContinueUpdateOnError property is used to stop the aborting of the updates on a exception.  Once this is done, we check to see if the data table has any errors.  If so, we make sure that every row that has a concurrency violation is removed from the Dataset.  The AcceptChanges method call is required to update the HasErrors flag, if all of the errors had been eliminated in this manner.

Now is this a perfect solution?  No.  It is possible that a legitimate concurrency violation is ignored.  From the point of view of this application, it is considered an acceptable risk given the type of data being used.  If you wanted to trap the 'real' concurrency problems, you could select against the REPORT_COLUMNS table using the primary key (instead of all of the original values) when a concurrency violation is detected. If a record is found, then the concurrency problem is real.  Otherwise the exception can be ignored.

DataSet Serialization: Smaller & Faster

DataSets serialize naturally to XML quite well and you have lots of control over that. Typed DataSets have the XSD with some properties that control that (and of course you can do it programmatically too). But one of the common problems with remoting DataSets is that the default binary serialization is actually just the XML Serialization ASCII. Crude. Some people have even used this fact to extrapolate that the DataSet is internally XML - which isn't true.

This is improved in Whidbey. But until then, what's a Serializer to do?

Lots of people have done some great work to do customized binary serialization of datasets. To mention a few:

  • Microsoft Knowledge Base Article 829740 by Ravinder Vuppula where he demonstrates his DataSetSurrogate class which wraps up a DataSet and converts the internal items into array lists which can then be Binary Serialized by the framework by default. It also contains a ConvertToDataSet so that you can reverse the process of a De-serialized surrogate back into a dataset.
  • Dino Esposito has demonstrates a GhostSerializer in his MSDN Article for a DataTable which does a similar ArrayList conversion thing.
  • Richard Lowe's Fast Binary Serialization
  • Dominic Cooney's Pickle
  • Angelo Scotto's CompactFormatter goes a step further with a serializable technique that doesn't rely on the Binary Formatter so it works on the Compact Framework which is event more Compact than the BinaryFormatter
  • Peter Bromberg builds on top of the CompactFormatter to support compression using Mike Krueger's ICSharpCode SharpZiplib in-memory zip libraries

 

Getting access to the Deleted Rows

If you Delete a row in a DataTable it's not really gone. It's just in a private buffer thing that hides the data out of the traditional Rows Collection and that works good for databinding and the like. The deleted rows are kept so when you do a DataAdapter.Update() it knows which rows you wanted to delete in the persistent store.

So what if you are writing your own adapter or for other reasons just need to get at the deleted rows. The trick is to create a DataView with a RowState Filter of “Deleted“ and then you can look at the DataRowViews to get individual row/column data.

DataView deleteView = new DataView(MyDataSet.MyTable, "", "", System.Data.DataViewRowState.Deleted);

foreach (DataRowView drv in deleteView)

{

string someColumnValue = (string)drv["SomeColumn"];

}

DataSet Performance Tips with Indicies

Each of these tips relate to speeding up the searching of data in a dataset. Although indexes are created in a dataset they are not exposed. Knowing how to make ADO .net generate them gives you the power to enhance the performance of you DataBase application.

Tip #1: ADO .net generates indexes on the primary key(s).

If you are searching on primary key fields instead of using DataTable.Select(), use DataTable.Rows.Find(). The Find method will use the index to find the row. Knowing this may also make you think differently when defining primary keys.

Tip #2: Use a DataView.

DataViews will generate an index for a sorted column. If you find you are searching for data using a particular column. It would be a good idea to create a Dataview sorted by the column(s) you need to search. Then use the Dataview.FindRows() method to get at your data. The DataView will have created an index for the sort column, and FindRows() will use the index.

Tip #3: Be leary of Tip #2.

If you are trying to take advantage of DataView indexes it's important to know when the indexes are created. ADO .net generates an index each time you set the filter, sort, and rowversion properties of the DataView. So if you create a DataView and set these properties individually the indexes will be created serveral times. To avoid this be sure to use the DataView constructor that takes the table, filter, sort and rowversion as arguments. (See example below) This way the indexes will be created correctly the first time.

private void MakeDataView(DataSet ds)
{
   DataView dv = new DataView(ds.Tables["Suppliers"], "Country = 'CA'", "CompanyName", DataViewRowState.CurrentRows);
   dv.AllowEdit = true;
   dv.AllowNew = true;
   dv.AllowDelete = true;
}

Using a DataAdapter to create transaction history

In some mission critical applications, it's important create a log or audit trail against updates made against a specific table. Sometimes this could be done with triggers if you have all the information available, or perhaps your updates are encapsulated inside of stored procedures so you can add central code their. What you may not have known is that if you are using a DataAdapter to read & write to a table that needs to be audited, then you can accomplish the task with the DataAdapter.

The secret to this tip is that a SqlCommand object, like the Insert, Update and Delete command members of the DataAdapter can execute more than one SQL statement. In fact, DataAdapters use this technique by default to refresh a DataSet after an update or Insert. This is important to retrieve identity column values, other column defaults or perhaps values that have been modified by triggers.

To include additional SQL statements, separate them with a semi-colon. You have full access to the parameters collection as well. In the text below is an example update command text that has been modified to insert into a log table, that contains an columns: Action, OldCategory, NewCategory. This text can be assigned to the UpdateCommand's CommandText property inside the DataAdapter.

UPDATE    Categories
   set    CategoryName = @CategoryName,
          Description = @Description,
          Picture = @Picture
WHERE     (CategoryID = @Original_CategoryID) 
      AND (CategoryName = @Original_CategoryName);
SELECT    CategoryID, CategoryName, Description, Picture
 FROM     Categories
WHERE     (CategoryID = @CategoryID);
INSERT INTO CategoryLog
          (Action, OldCategory, NewCategory)
          VALUES ('Updated",@Original_CategoryName, @CategoryNam)

Creating Computed Columns in a DataSet

DataSets are very powerful - and one of the features that makes it so powerful is computed columns. You can create a DataColumn that does not contain storage of data but rather computes a value dynamically.

Computations can be done by SQL on retrieval, however once the data is in a DataSet, those computations are outdated. As the data changes in your DataSet, it is desirable to have the latest results of a calculation.

One of the properties of a DataColumn is "Expression". This is a string that stores a computational expression. The data type of the column should match the data type of the computed value. These expressions can calculate simple values, or you can use aggregate functions across an entire set of data. You can even use computed columns to lookup a value in a parent table.

The source code below contains three snippets of code that add computed columns to some tables. Listing 1 is a simple expression that calculates the extended price on an Order Item by multiplying the quantity by a unit price. The column is first created and then added to the table. 

The first listing assumed there was not only a Quantity column on the OrderItem Table, but also that there was a Price column. In all likelihood, you would need to look this up on a related product table assuming the OrderItem table had a foreign key to a Product Table. If you have set your DataSet up with a DataRelation for this foreign key, you can perform this lookup within the DataSet. In Listing 2, we create a computed Price column on the OrderItem table that looks up the Price on the Product Table. The syntax for referencing the rows in a parent table is:

 Parent(DataRelationName).parentcolumn 

...where DataRelationName is the name of the DataRelation between the parent and child tables. To perform parent table referencing, it is mandatory that a DataRelation is established between the parent and child tables.

The final example in Listing 3, shows a computed column that is a total of all the order item amounts for a given OrderHeader. Assuming we have created a DataRelation for the foreign key between OrderHeader and OrderItem, we can perform this child table reference. The syntax is similar to the parent relation but you use the keyword Child.

 Child(DataRelationName).parentcolumn

You should note that when referencing a child table, you are normally referring to many rows (possibly many anyway). Therefore any expressions on a child table should be done within an aggregate expression (e.g. Sum, Min, Max, Avg, etc.).

The benefit of performing your calculations within the DataSet instead of the Database is that changing values in the base columns of the DataSet will have an instant impact on your calculations. For example, users may want to know the total of an Order before they actually save it in the database. In the second example, if you change the product_id being order, the unit price and extended price will automatically change to reflect the price of the new product.

Finally, if you place this code to modify your DataSet  within you Business Logic Layer, you are able to place the critical business logic of calculations in your business objects along with the rest of your business logic, not hidden away in your Data Access Layer or database stored procedures.

You should note that in the current version of the Visual Studio.NET IDE, there is a bug with computed columns and typed Datasets. You can add the computed column to your DataSet in the schema editor, but if you specify the expression, you will get compile time errors. You should specify the expression at runtime.

[Listing 1]
stDecimal = System.Type.GetType("System.Decimal");
DataColumn dcExtPrice = new DataColumn("ExtPrice", stDecimal, "Price * Quantity", MappingType.Attribute);
dcExtPrice.AutoIncrement = false;
dcExtPrice.ReadOnly = true;
DataSet1.Tables["OrderItem"].Columns.Add(dcExtPrice);
[Listing 2]
stDecimal = System.Type.GetType("System.Decimal");
DataColumn dcPrice = new DataColumn("Price", stDecimal, "Parent(drProduct_OrderItem).Price", MappingType.Attribute);
dcPrice.AutoIncrement = false;
dcPrice.ReadOnly = true;
DataSet1.Tables["OrderItem"].Columns.Add(dcPrice);

[Listing 3]
stDecimal = System.Type.GetType("System.Decimal");
DataColumn dcTotal = new DataColumn("Total", stDecimal, "Sum(Child(drOrderHeader_OrderItem).Price)", MappingType.Attribute);
dcTotal.AutoIncrement = false;
dcTotal.ReadOnly = true;
DataSet1.Tables["OrderHeader"].Columns.Add(dcTotal);

Windows Forms DataBinding

Here is a great article on binding in Windows Forms - specifically against a dataset. Lot's of gotcha's covered.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwinforms/html/databinding_winforms10_11.asp

How can I do a "Join" between two DataTables?

It's tempting to think that with objects like “DataView” and methods like “Select” that DataSets would support this option, but the bottom line is that they don't. A DataView is a sorted and/or filtered list of rows from 1 table only. Also, the Select method is on the DataTable and as such - can't look at other tables.

There are a few options to consider.

  1. Do the join in SQL. SQL is good at this kind of operation. It would require you to have a new DataTable with the composed list of columns from the two tables - and a DataAdapter with at least a SelectCommand specified to support the fill. You would perform your join in the SelectCommand.CommandText. If you are also retrieving data into 2 separate DataTables for other reasons, then of course you have some redundant data and also some synchronization issues between the 2 individual tables and the joined table. The joined table will not reflect changes in the 2 base tables until you update the database and re-execute the Fill on the joined table.
  2. Like the above solution, you could create a 3rd table with the composed list of columns in your DataSet but instead of loading from the dataset, you could copy the data from the base tables yourself using the AddRow method. If the two base tables share the same primary key, you could try to “merge” the data from the two DataTable into the 3rd DataTable.
  3. If the two tables to be joined are in a master detail relationship you can use expression columns to lookup data in a parent DataTable or aggregate records from a Child Table. For example if you have a Customer DataTable with a stateId column, which is a foreign key to a State DataTable with stateId and stateName columns, you can add a computed DataColumn to the Customer DataTable with an expression of Parent.stateName. This new column will be kept in sync if the underlying name changes in the State DataTable or if the stateId is changed on the Customer DataTable to point to a different State. Similarly you can look up values on a child DataTable from a parent DataTable but since there can be 1 or more rows you will typically need aggregates like Sum, Avg, Min, Max, etc. in you expression. The DataColumn.Expression property online help is valuable for the types of expressions you can use.

How do I create a Crystal Report from a DataSet

http://www.tek-tips.com/gfaqs.cfm/pid/796/fid/3940