Datasets vs. Custom Entities

So you want to build your own entity objects? Maybe you are even purchasing or authoring a code-gen tool to do it for you. I like to use Datasets when possible and people ask why I like them so much. To be fair, I'll write a list of reasons to not use datasets and create your own entities - but for now, this post is all about the pros of datasets. I've been on a two week sales pitch for DataSets with a client so let me summarize.

  • They are very bindable.
    This is less of an issue for Web forms which don't support 2 way databinding. But for Win forms, datasets are a no brainer. Before you go and say that custom classes are just as bindable and could be, go try an example of implementing IListSource, IList, IBindingList and IEditableObject. Yes you can make your own custom class just as bindable if you want to work at it.
  • Easy persistence.
    This is a huge one. Firstly, the DataAdapter is almost as important as the DataSet itself. You have full control over the Select, Insert, Update and Delete sql and can use procs if you like. There are flavours for each database. There is a mappings collection that can isolate you from changes in names in your database. But that's not all that is required for persistence. What about optimistic concurrency? The DataSet takes care of remembering the original values of columns so you can use that information in your where clause to look for the record in the same state as when you retrieved it. But wait, there's more. Keeping track of the Row State so you know whether you have to issue deletes, inserts, or updates against that data. These are all things that you'd likely have to do in your own custom class.
  • They are sortable.
    The DataView makes sorting DataTables very easy.
  • They are filterable.
    DataView to the rescue here as well. In addition to filtering on column value conditions - you can also filter on row states.
  • Strongly Typed Datasets defined by XSD's.
    Your own custom classes would probably be strongly typed too...but would they be code generated out of an XSD file? I've seen some strongly typed collection generators that use an XML file but that's not really the right type of document to define schema with.
  • Excellent XML integration.
    DataSets provide built in XML Serialization with the ReadXml and WriteXml methods. Not surprising, the XML conforms to the schema defined by the XSD file (if we are talking about a strongly typed dataset). You can also stipulate whether columns should be attributes or elements and whether related tables should be nested or not. This all becomes really nice when you start integrating with 3rd party (or 1st party) tools such as BizTalk or InfoPath. And finally, you can of course return a DataSet from a Web Service and the data is serialized with XML automatically.
  • Computed Columns
    You can add your own columns to a DataTable that are computed based on other values. This can even be a lookup on another DataTable or an aggregate of a child table.
  • Relations
    Speaking of child tables, yes, you can have complex DataSets with multiple tables in a master detail hierarchy. This is pretty helpful in a number of ways. Both programmatically and visually through binding, you can navigate the relationship from a single record in master table to a collection of child rows related to that parent. You can also enforce the the referential integrity between the two without having to run to the database. You can also insert rows into the child based on the context of the parent record so that the primary key is migrated down into the foreign key columns of the child automatically.
  • Data Validation
    DataSets help with this although it's not typically thought of as an important feature. It is though. Simple validations can be done by the DataSet itself. Some simple checks include: Data Type, Not Null, Max Length, Referential Integrity, Uniqueness. The DataSet also provides an event model for column changing and row changing (adding & deleting) so you can trap these events and prevent data from getting into the DataSet programmatically. Finally with the SetRowError and SetColumnError you can mark elements in the DataSet with an error condition that is can be queried or shown through binding with the ErrorProvider. You can do this to your own custom entities with implementation of the IDataErrorInfo interface.
  • AutoIncrementing values
    Useful for columns mapped to identity columns or otherwise sequential values.

This is not an exhaustive list but I'm already exhausted. In a future post, I'll make a case for custom entities and not DataSets, but I can tell you right now that it will be a smaller list.

Comments

  • Barry Gervin February 10, 2004 3:31 PM

    Personally, I dislike datasets. Hold on, I'll qualify that - I dislike datasets as a replacements for a domain model.

    For all the reasons you mention, datasets can be a fine solution for a single page. I do NOT think that they belong in the center of a system's architecture. Furthermore, they greatly impede one's ability to perform serious business logic and rules checking.

    Although datasets may be ok for one-off situations, I try to dissuade developers from relying on them too much.

  • Barry Gervin February 12, 2004 4:57 PM

    I think it would be most valuable to itemize the issues that lead you to this opinion...which is not uncommon. As I mentioned, I'm working on a list myself to present both cases for when and when not.

  • Barry Gervin March 3, 2004 3:57 PM

    Other advantages are

    - Other MS products know how to work with DataSets. One example is the new Infopath SR-1, in particular with diffgrams (try to make your custom entities support diffgrams!). Future Office versions will also know how to deal with them.

    - Third party control vendors usually provide very good support for datasets. They don't test them with your custom classes.

    About Udi's comments, that's the usual reason to not to use DataSets. They make hard writing business logic with them.

    If you combine DataSets with a business rule engine, so you don't need to write business logic using the DataSets, then you have the best of both worlds. Take a look for example at Biztalk 2004's business rule engine.

    DataSets don't have very good press among OO heads. Now, if you change the word 'DataSet' to 'XML' then they start looking you differently, even if it's the same idea. DataSets are as necessarily bound to a database schema as XML files. That's the usual way of using it, but it's not the only one.

    Also, using custom entities can provide the idea that can model your database with objects, and that's the most dangerous trap in enterprise development today, as you start dealing with lazy/eager loading, partial instantiation, object identity, etc, etc, etc.

  • Barry Gervin March 9, 2004 7:03 PM

    I'm keeping this list bookmarked, so I'm adding a couple of more advantages ;)

    - Using 'ExtendedProperties' you can easily add metadata in runtime to the dataset that can be easily consumed. You can also do this with attributes in custom classes but it cannot be done in runtime, and they are more expensive and difficult to consume

    - There is a lot of community support around DataSets (articles, doc, books, etc), but there is no support for your own flavor of custom entities. Each programmer that needs to use your business logic layer will need to learn to use your components. That does not happen with Datasets. Any .NET programmer should know how to deal with them.

  • Barry Gervin March 13, 2004 3:02 PM

    Would you use a dataset, strongly typed or otherwise, to represent a single "row of data"/entity ?

  • Barry Gervin March 18, 2004 11:52 AM

    A single row? I haven't yet used a dataset for this case...and I'm not sure I would. In fact, this may seem strange, but I don't think I've even had en entity or a business entity that has fallen into that category in the past 2-3 years. Perhaps that's a sign that I'm building way to complex of systems - or that I'm not creating small enough or atomic enough entities. I do tend to think of entities as self contained (somewhat transactional) documents - that contain all the tables requried as part of the business transaction. For example, an Order. That's too easy - it has Order Header and Order Detail. What about Customer? Sure that sounds like a single record - but in those cases I've found that there always ends up being some "extra" info attached to the customer - like the list of their contacts/employees, or something else. Like I said - haven't had a case where the customer is on it's own.

    If I did have this case, now that I think of it, I'd probably be lulled into using a dataset out of practice - and in particularly if I'm writing a data access component and persisting it to the database - just to easy to use those data adapters to make this happen complete with the table mappings collection - it's just to fast to do it otherwise.

  • Barry Gervin April 10, 2004 3:50 PM

    Andres,

    Biztalk 2004 Rules Engine works fine with custom entities.

    "using custom entities can provide the idea that can model your database with objects, and that's the most dangerous trap in enterprise development today" - the ways to model a DB are very well documented, and some base themselves on object models.

    Furthermore, entities are pretty much just structures which represent a collection of data in different contexts. A "customer" entity may contain an Id, first name, last name, and ssn. Entities don't contain logic - besides basic validation logic ( ssn must be a specific format ).

    The logic of the system is split into discrete services and each are used as needed. Notification is a fine example of a service, persistence is another. The BL as a single "mush" of all logic is distasteful, IMHO, I expound on this here: http://udidahan.weblogs.us/archives/017820.html

    "That does not happen with Datasets. Any .NET programmer should know how to deal with them" - Any .Net programmer can figure out how to pass a simple class ( like "customer" ) to a service, so I don't see any problems there.

    Barry,

    "haven't had a case where the customer is on it's own" - it all depends on how you define customer, of course. In my architectures, each service has its own definition. Most of the time, the definitions between services are compatible - for instance when the development of all the services is part of a single version of a system. When the definitions aren't compatible, you move to a common message bus architecture ( this has been discussed ad nasium and can be easily googled ).

    About the issue of complex systems - well, I've been part of > 100 man year DoD development projects that were VERY complex ( not to mention the complexity inherent in size ). During the architecture and design sessions, datasets were suggested, analyzed, and discussed, within the organization and with various experts ( MS as well ), and were found to be lacking in several critical respects - performance being one of them.

    I shall reiterate, datasets aren't my first choice on any project, but often prove useful for demo code. Just recently I implemented a small project in under a month, using all the techniques of SOA mentioned in my blog, without a single dataset, table, view, or adapter.

    "just to easy to use those data adapters to make this happen complete with the table mappings collection - it's just to fast to do it otherwise" - Development and basic testing was done in under 2 weeks. We went into alpha with 1 open bug. No bugs were uncovered during alpha ! The open bug was ( obviously ) closed. Beta testing went off without a single bug too ! The project was done for MS Israel and is now live for over 2 weeks without a single problem. I'm not saying that this couldn't be done with datasets et al, just that I haven't heard of it happening ( and I've seen and heard about quite a lot of projects ).

    These are just the thoughts of a single developer, and all anecdotal "evidence" should be taken with a grain of salt ( or two ), but those hooked on datasets should try it a different technique, just to see what its like. I used to use datasets ( et al ) for EVERYTHING, now, I use them for nearly nothing.

  • Barry Gervin April 10, 2004 6:14 PM

    There is no doubt you can use datasets or custom entities to make a successful project - or an unsuccessful one for that matter, so comparing projects delivery dates doesn't help much with the education of custom entities vs. datasets.

    Can you elaoborate more on your evaluation of datasets - specifically the performance problems? Certainly there is a known issue regarding remoting datasets - even the binary serialization is done with XML so this is way too verbose. There is some good code examples to be googled of customizing the dataset serialization to be faster - and this has been fixed in Whidbey. Any other performance issues I'm in the dark over so please do elaborate on your eval.

  • Barry Gervin April 11, 2004 10:24 AM

    I'll get into the performance issues later ...

    just found Plip's analysis of this issue - its worth a look : http://weblogs.asp.net/Plip/archive/2004/04/11/111128.aspx

  • Barry Gervin April 11, 2004 9:28 PM

    Udi,

    If you are discussing which data structure you should use when exposing your data using a SOA, then custom classes are probably a better choice, as noone will use the DataSet's advantages as the users of that layer probably don't understand DataSets.

    Anyway, I'm curious on how do you handle for example the update of an order in your scenario, without a diffgram.

    I was discussing which data structure was better for entities that I need to use in my own application, and I want to bind, persist, filter, sort, etc

    >Any .Net programmer can figure out how to
    >pass a simple class ( like "customer" ) to a
    >service, so I don't see any problems there

    If I need to implement my own flavor of a data structure as capable as the dataset, then the user of my data structure needs to learn to use it.

  • Barry Gervin April 12, 2004 11:38 AM

    There was an interesting similar discussion on Andres Aguiar's weblog, at http://weblogs.asp.net/aaguiar/archive/2003/06/16/8757.aspx

    I agree both have advantages and disadvantages, but following OOP principles, encapsulation can provide both. After all, model objects are only a bunch of methods on top of a data container. The DataSet can be seen as a common "data ground" for these objects, enabling you to use dynamic data and binding when you need it, and throughout your BLL, use the custom entities, sparing the burden of "translating" your data between layers and components.

    Though that doesn't break performance issues (even making them worse). And it provides applications a straight access to the DAL (if you want to be able to use binding advantages, you have to leave the dataset accessible, so you have to bind all event and start heavy validation), small to medium applications could stand those issues. Did someone ever tried this approach?

  • Barry Gervin April 13, 2004 12:04 AM

    Plip should definately look at typed datasets - it looks like he's writing them himself. I've seen that often enough but he seems to go to a greater extent than most people do.

    Compared to a dataset (typed let's say - but doesn't really matter) his storage of data is ok for 1 or two instances of your "car". For larger result sets though this will be inefficient "row" based storage of your objects and not as efficient as the columnar storage of data in datasets using value type arrays like int[] for your "Id". There is some good performance to be had with column-based storage such as a dataset and if you are not doing a dataset and writing your own custom entities that may hold large collections of data - I would recommended implementing that approach.

  • Barry Gervin April 13, 2004 2:01 PM

    I think that the .Net petshop v3 went with a hybrid approach - custom classes to represent single entities, and datasets/datatables to represent many entities. I'm not sure that I particularly like that since you can't really go from one to the other. But it's worth a look anyway.

  • TrackBack April 13, 2004 2:12 PM

  • TrackBack April 13, 2004 5:01 PM

    <p>
    The <a href="http://www.lazycoder.com/weblog/index.php">lazycoder</a> puts up <a href="http://www.lazycoder.com/weblog/index.php?p=51">his take</a> on the whole <a href="http://objectsharp.com/Blogs/barry/archive/2004/02/10/273.aspx">custom classes versus datasets ( et al ) debate</a>. He draws the line on making strongly typed collections, for no other reason than performance - rather, performance pertaining to the amount of memory utilization.
    </p><p>
    The true fact of the matter is that the memory overhead you pay for using strongly typed collections is quite small. 900 Customer objects in a CustomerCollection is the same order ( as in big-O notation ) as 900 datarows in a datatable. If anything, your ability to control the memory utilization is greater when using the CustomerCollection since you control the footprint of each Customer. You have much less control over the memory footprint of the datarow.
    </p><p>
    So, if memory utilization is better with strongly typed collections ( or even the same assuming a common case ), memory utilization does not appear to be a valid concern for choosing datasets & co over strongly typed collections.
    </p><p>
    Development time is a valid concern, and is brought up quite often. From my experience, it does take less time to use datasets to get to the 80% functionality mark. However, that last 20% becomes that much harder because the ability to control datasets' behaviour is quite diminished. Thus, over the entire development lifecycle, I find that strongly typed everything works in my best interests.
    </p><p>
    One final note - I'm not against auto-generated code. Quite the opposite, really. I try to automate any parts of the development process that I can. However, I am against relying on code generated by something not under my control, and am dead set against not understanding how that code works. Code breaks, automated code breaks automatically.
    </p>

  • TrackBack April 14, 2004 3:30 AM

  • Barry Gervin April 14, 2004 10:38 AM

    This is great information, Barry, and I appreciate you sharing it.

    I also enjoy using typed datasets and have yet to build an application that could have been built more quickly or better using typed objects. That being said, I mainly build web applications and Intranet applications for small businesses, so I can't specifically speak to the needs of large applications.

    I can say, however, that with all the page caching and object caching currently built into .NET as well as what is coming in 2.0, my applications are just not hitting the database as much as they did in classic ASP. I also don't load 900 rows of anything for a web page, which has a short life anyway, so memory consumption has never been an issue. And, like you said, a quick Google will help one to find binary serialization solutions for DataSets if you need the added performance.

  • Barry Gervin April 14, 2004 11:05 AM

    <A href="http://weblogs.asp.net/plip">Plip</A> does a good job of showing a strongly typed ds alternative - a strongly typed custom entity. He addresses sorting and filtering benefits I mentioned of a dataset, although he doesn't implement any support for updating (i.e. storing original values or a version/aka timestamp)

    Another point that cam up in his feedback is support for null's and somebody suggested that dataset's have the same need to implement some null value handling - so I should have added to my list of ds benefits the built in support for null values.

    DataColumn's do have an "AllowDbNull" property which in the case of typed dataset's is derived to be true with minoccurs=0 in the xsd. But furthermore, the DataRow provides a IsNull(column) set of overloads to ask a column in a given row if it's null. Typed Dataset's offer Is<columnname>Null() varieties on the typed DataRow as well. Internally, this is handled on each nullible column with a bit array to store the null/not null values for each row. This is superior to any imaginary or default value technique which only works in a narrow set of business requirements or interpretations. I'll also mention that column value getters in datasets always check the bit array first to return null before going after the value array that stores the actual column value....a good technique worthy of emulation if you want to create your own custom entities.

  • Barry Gervin April 14, 2004 11:17 AM

    Just to get another thing down on paper.....

    One of the things mentioned about a custom entity in all of these discussion is the ability to do custom validation. You can implement a column value setter that checks the set value and perform some validation. A more general notion is the ability to add your own code to a custom entity whereas a typed dataset has no room for that - unless you want it regen'd when you change your schema and lose your changes.

    Developing custom entities is going to get a lot easier in whidbey with generics and persistence with ObjectSpaces. Datasets get better in whidbey tool. With partial classes, there will be support for a generated dataset as a partial class. You will be able to add your own code in another file which is the other part of the partial class and at compile time they'll get compiled together. This is a good pattern for anybody doing code-gen - and of course they'll also have their own individual source control histories which is nice.

    In the meantime, Dataset's get a bad rap as a use for a rich entity. If you don't want to throw out the baby with the bathwater, I've seen two techniques for adding your own code. The first technique I've seen with some limited success is to inherit from the typed dataset. You till have the problem that your ancestor is still a dataset so you can't engage any extra code through inheritance....but you can extend, override, hide/shadow stuff from the typed dataset and dataset parts of the hierarchy.

    The other option is to contain/host the dataset in your own entity class which could be derived from a base class (to acquire functionality through inheritance). This option lends itself more to handling events than overriding behaviour. You can still handle the various dataset events in a direct descendent of typed dataset however (like column & row changing/changed events) to perform validation and stop changes to your values.

    Datasets also have the option of getting uglier too in whidbey. Being able to add data access right inside your typed dataset which spawns the notion of a DbDataTable (a DataTable with Fill/Update methods and hence the seeds of it's own data-adapter built right in). I've been told that this is a rad option, possibly for the non-enterprise scale developer...and clearly not for me. I really hope I can craft an enterprise template to disable that option.

  • Barry Gervin April 14, 2004 11:22 AM

    BTW - this is a great exercise which as lots more to discuss and looking forward to exploring more details of both sides and just wanted to take a moment to thank the participants on both sides of this discussion - keep up the excellent work. On that note...any of you guys going to TechEd? Correct me if I'm wrong, but this would make a great panel discussion birds of a feather session. No?

  • Barry Gervin April 14, 2004 5:19 PM

    Adding another point of view, I think that the value of DataSets also depends on the kind of application you are building. If you are developing ASP.NET applications they have less value than if you are doing Windows Forms applications.

    One reason is because data binding in asp.net is much more primitive than in Windows Forms, and making it work as good as with datasets is difficult.

    But the main reason is that in Windows Forms you need a disconnected architecture. You need to retrieve data, keep track of the changes, and then persist it. Doing this with custom classes is a lot of work.

    If I want to build a business logic layer, I want to be able to use it from asp.net apps and windows forms apps. Doing that with DataSets is easy. Doing it with custom classes is not.

    BTW, I'll go to TechEd, but I'm not sure if this topic is big enough for a BOF, but we can try ;)

  • Barry Gervin April 16, 2004 7:48 AM

    Barry, Andres,

    Have either of you developed an entire system using strongly typed / custom classes ?

  • Barry Gervin April 17, 2004 6:35 AM

    Udi,

    I don't like the tone of the question ;) (I can ask you if you ever built an app using DeKlarit's DataSets ;), but anyway I'll answer it.

    We built a custom IDE with a database backend. The database was an old c-tree isam database, and the data access is done using a managed c++ layer on top of C code. We are using custom classes for that product. We basically don't need any DataSet feature. We don't bind, etc. It's not a tipical database-based business application.





  • Barry Gervin April 17, 2004 9:06 PM

    Udi,

    Good debating tactic - question the credentials of your opponent :)

    I've built systems using typed and untyped classes and datasets. That's 4 different scenarios. When I say "built", that can vary between being a consultant on an advisory basis, a contributing developer, and right up to lead designer/architect. Obviously I've only used datasets with .NET. Prior to .NET I've had experience with both typed and untyped classes in a various of platforms including Java, Delphi, and PowerBuilder (mostly untyped there).

    Most of my early work with .NET was untyped (including datasets) but I've increasingly seen more typed projects (both classes and typed datasets.

    Unfortunately, most of the time I see teams choose how they want to do things based on instincts, gut feel and antecdotal evidence...and the later mostly in defence of the first two. My recent work revolves around defining best practices and roadmaps to the most appropriate .net technology. The problem with bp's is that when they get communicated verbally ususally lose some of the fidelity of "when" or "why" to use or not to use or "how" to use a given technology....and that's what this thread is all about....filling in all of the when's & why's.

    So to say something like "Datasets" have poor performance so never use them is a huge dis-service if it's not qualified. Slow to load? Slow to remote? Slow to serialize? Slow to Develop? Not scalable? Those blanks need to be filled in.

    So far this is where I am on the performance issue: Slow to Load? Inconsequential as long as you load them with constraints & indexing turned off during load. Slow to remote? Yes if you don't override the built in binary serialization...or wait until Whidbey. Not scalable? I've got a case where datasets are actually more scalable than loading custom objects with a datareader. This is mostly accidental/slippery slope where a connection is being held open longer because it was more convenient for a developer to do some extra work during the datareader looping. The dataset solution doesn't give you the option to do extra workd "during the load" so it's a bit safer in that situation. .

    This last problem I don't consider a back breaker one way or the other though. The point is you can make mistakes using technique. You can't always blame your tools and you need to have processes in place to performance & load test any solution you do....developers are human so you have to surround them with processes like testing to validate the assumptions against expectations.

  • Barry Gervin April 18, 2004 1:39 PM

    Maybe we should have another thread about this.

    I think perhaps the main reason that some developers find strongly typed entities (datasets or otherwise) distasteful is not that they are strongly typed, but that there is no untyped access to it's contents?

    Certainly a lot of strongly typed entity implementation patterns, samples and code generators don't offer dual typed and untyped access to their contents. Of course there is always reflection but the thought of accessing data through reflection increases the distaste. Perhaps that is a reflection (pun intended) of the the reflection api not to mention it's performance. That's another thread in itself.

    Strongly typed datasets are built on top of the untyped dataset. So without resorting to reflection it's easy to walk the tables, columns and rows to collect not only data but meta data.

    I think that is a good metaphor for any implementation of a Typed Entity - that within the implementation there is an untyped mechanism. I'm hestitatnt to say "storage" but I'm certainly thinking about that.

    One of the things I find with teams that use code generation for typed classes is that somewhere along the line they wish for some extra code to be inserted that they didn't think about when they first defined their templates. Improving the template can often mean regenerating existing classes. It's a clear requirement for effective code generators to allow for regeneration of modify classes without overriding the modifications. Partial classes in whidbey solve this problem and in fact the new dataset generator and environment takes advantage of this and allows me to add my own code to user files that get compiled together into one class.

    A lot of functionality required for an entity doesn't necessarily have be code gen'ed and again, typed datasets are a good demonstration of this. The generated classes are inherited from something else (in this case, System.Data.DataSet). By building on top of the untyped functionality in the DataSet, you get lots of things for free - and my point is that these freebies are only possible by the fact that at the core of this type of entity is untyped data. For example, other than wrapper functions, there is nothing in a Typed DataSet's code that handles null values - but that is handled in the ancestor. Original Values, Modification State flags and error states are other biggies that come for free with a DataSet.

    I don't bring these up necessarily to advocate a dataset but to rather demonstrate the power of untyped cores to your entities and the things you can add for free in your ancestor (or external helper classes if need be) if there is an abstract untyped way of getting to the core data.

    I'm still thinking about how Plip's example is going to be elegantly refactored to support updates in the future. I can't see a good way out here....and in general, I'm trying to come up with some kind of universal requirements or best practices for custom entities.

  • Barry Gervin April 18, 2004 4:47 PM

    Andres,

    No tone. Just curious.

    Barry,

    Thanks, ( I actually wish I thought of it as a tactic ) but I'm apparently tactless :)

    I'm actually reflecting on why I so dislike datasets, strongly typed or otherwise, and haven't come up with an answer yet. It could be that I don't control how things get done ( that and the fact that I'm something of a control freak when it comes to my code ).

    Original Values, Modification State flags and error states aren't things that particularly matter to me, unless I'm working truly disconnected.

    I also find that my custom classes fit a lot better with my overall architecture point of view - SOA - than do datasets.

    Actually, I'm quite a misfit in the custom classes camp, advocating that these classes should contain only basic validation logic and having all work done in external services. Most advocates of the custom classes way of work are "Object Bigots" ( Fowler's words, not mine ) and, IMO, dislike datasets as well. THis is about as far as we go in the same camp. Once I pipe up about logic being outside the classes, I hear a *gasp*, and some variation of "but, but, but - WHERE'S YOUR ENCAPSULATION !? ".

    I've taken something of a detour on my "Road To SOA" series on my blog, and I'll try to get back to it. Maybe then what I have to say about custom classes will make more sense.

  • TrackBack April 19, 2004 10:46 AM

  • Barry Gervin April 19, 2004 6:14 PM

    Udi,

    Just a quick question. How do you handle updates with custom classes? Do you have some kind of optimistic concurrency support?

  • Barry Gervin April 20, 2004 10:38 AM

    I'll tell you why I dislike DataSets, they are huge objects containing a bunch of stuff that I don't need 95% of the time. I like Andres wrapper approach, but it still drags the DataSet along inside it. I'd rather have a small, strongly-typed class or struct and populate it using a DataReader.

    Ideally, I'd like MS to create an IDataSet interface with just enough defined members to allow us to fill our custom classes in the standard way.

  • Barry Gervin April 22, 2004 12:26 AM

    Scott,

    If you don't need the stuff the 95% of the time then you should not use the DataSets, of course.

    Anyway, it reminds me of a friend of mine, who is a C programmer, and uses that argument to not write code in .NET ;)

    Regards

  • Barry Gervin April 23, 2004 8:54 AM

    I'm doing some work on inventorying many of the MS Sample applications (IssueVision, TaskVision, FM Stocks, ShadowFax, Jaggle, PetShop, etc.) and what technologies (Win/WebForms, Web/Ent Services, etc), techniques (Datasets, custom classes, layers and partitions, etc), application blocks (DAAB, EMAB, etc.) they demonstrate.

    I stumbled onto one that was written by some folks at Infragistics. One of the design goals was to not use Datasets. I'll be doing some more review of this app over the coming weeks.

    http://windowsforms.net/articles/writingntierapps.aspx

  • Barry Gervin April 23, 2004 9:48 AM

    A couple of follow up comments on the Tracker above...it's not written by Infragistics, but it does use their winforms controls.

    More importantly, I had a quick peek at two common flaws in custom entity samples. Updateability with optimistic concurrency, and Handling of nulls.

    This app does use timestamps in the database and keeps those in the entity. Good practice if your db has them, but not a generic solution like datasets original values. The entities in this sample do not show any support for having the concept of nullible values in their entities.

  • Barry Gervin May 4, 2004 11:51 PM

    Updates are done like follows:

    PersistenceService.Update(myCustomer);

    Concurrency I handle in various ways depending on the situation but its usually one of the following:

    1. The sql of the update checks all values ( like what the data adapter generates ). I like this approach because I don't dirty my entities with unnecessary baggage.

    2. Add a datetime of last update value to the entity - simpler sql.

    3. Add a version number to the entity - simpler sql, like 2.

    I then simply check the number of rows updated, and if nothing has been updated, I raise an appropriate exception ( sometimes I call it a ConcurrencyException, most times I prefer an EntityHasNotBeenUpdatedException ).

  • Barry Gervin May 5, 2004 9:35 AM

    Udi,

    If you follow the first approach, then you need to have the old values available. Where do you store them?

    I mean, if you use ASMX to retrieve a Customer, change something, and send it back, how do you know the old values?

    Also, in a 'SOA', I could want to retrieve an 'Order' as a whole, add an OrderLine, remove another OrderLine, update the header, etc. How do you send those changes back to the middle tier?

    Regards,

    Andres

  • Barry Gervin May 5, 2004 4:07 PM

    Andres,

    Old values can be easily stored by holding a shallow copy of your data. Its usually in the cases where I get lazy ( which turns out to be quite often ) that I go with the other approaches.

    What you're looking for in your example is the ability to work disconnected. SOA doesn't try to solve that problem. A service might be something like AddOrderLineToOrder which would get the minimum needed information and coordinate all the work for you.

    Just to expound a little:

    public void AddOrderLineToOrder(int productId, int amount, double discount, int orderId)
    {
    // all of the following needs to be done in a transaction ( obviously )

    // fill a product entity using the productId - could use caching to improve performance
    // throw exception if no product exists

    // fill order entity using orderId - could use caching to improve performance
    // throw exception if no product exists

    // calculate new total sum of order using price found in product and discount and amount

    // create new orderline entity with all data now available
    // persist it and connect to order

    // apply any volume discounts or special offers - automatic 2 for ones for example
    // create new orderline entities as a result
    // persist them and connect them to the order too

    // update the new order
    // persist it

    // email the user about the wonderful specials he just got
    }

    There's enough to do one step at a time! god only knows what rules to employ and how to coordinate batch changes based on them.

    When all you've got to do for each action is a simple CRUD, then it makes sense to batch them. But when you've got heavy business logic/rules - or worse, those that change often - batching changes together to improve performance while creating more complexity is not the solution. That's IMO, of course :)

  • Barry Gervin May 6, 2004 11:25 AM

    Mm... That's a quite 'chatty' API, and you'll have a hard time to add all the lines in the same transaction (unless you use WS-Transactions).

    Also, how do you deal with updating a Customer? You need to send the shallow copy back to the server, unless you want to keep state there.

    Also, in my opinion, SOA does try to solve the 'work disconnected' scenario.

    Regards,

    Andres

  • TrackBack May 12, 2004 4:07 PM

    After several back-and-forths on the subject, a nice thread grew between myself, Barry Gervin, and Andres Aguiar&nbsp;on the subject - read it here. For any one wondering which way to go, I can only say that my way rocks! But...

  • Barry Gervin May 12, 2004 9:41 PM

    I agree with Udi. Having an Order object which has got both implementation (e.g. how to persist itself to the data store) as well as the state (e.g. attributes for the Order) is not a good idea.

    The first criticism is that, what happens if you populate the Order object, pass it back to the presentation layer and then call the persist method? Although you could argue that still you want to persist the value to a data store but the way you persist your object on the server (i.e. SQL Server database) is different from the way you persist your object on your PocketPC (an XML file or SQL Server CE).

    So there is a clear difference between those classes that implement functionality (e.g. business and data layer) and those that represent business entities. You populate the business entities and then pass it between components. This helps you in looking at the interactions as "Message Passing", rather than RPC. It also helps make your business/data layer objects stateless.

    If you are concerned about transactions, you could either use COM+, or create a SqlTransaction object (thanks to ADO.NET) and then pass it to those methods that need to be part of the same transaction. Again, although the workflow method has some internal state (which is in fact the state of the transaction), but as soon as the method call for the workflow finishes, all of the state is lost, so the object and connection can be returned to the pool.

    If you have an Order object and want to add an Order Line, you first read the existing values (the same way you do with the DataSet) which includes the order + orderlines, do the modification (remember you are passing the business entity including order+orderlines around) and then pass it to a final method which persists the data back to the data store. It is all up to that object (which is your data access layer) on how to perform the update (i.e. resolve conflicts, etc).

  • TrackBack May 12, 2004 11:52 PM

    A Thread on DataSet vs. Custom Classes

  • Barry Gervin May 13, 2004 11:27 AM

    Hi all, interesting thread!

    Andres and Udi, about updating orders:

    Granted, it's a chatty API that Udi suggests, but it's quite probably the appropriate one, seeing as how the adding of order lines is prone to extensive business logic, just as Udi demonstrates in his example. (btw, Udi, I'm another 'odd camper' who prefers to keep my business logic in a separate Service layer and let is work on the custom entities)

    Still, in cases where batch updates are viable, why not just send the updated graph serialized as an xml document to a Web Service on the server, that accepts the whole document in a parameter?

    You could also send along the original xml document that the server gave you in another parameter, enabling the server to do optimistic concurrency. Alternatively, you could cache the original document in server-side Session state or similar.

    The thing is, there's really no difference if you use custom entities or datasets on the server as far as the client-server communication is concerned. It is going to be Document Oriented and based on SOA anyway, as long as you don't go about Remoting the custom entites (I'm not a fan of that approach).

    So, the question would be:

    (1) Is it easier to implement the Retreive Services producing the xml documents using custom entities or datasets when the client requests data, and

    (2) is it easier to implement the Create/Update/Delete Services (both the fine-grained ones such as AddOrderLineToOrder() as well as coarse-grained services accepting whole updated documents) using custom entities or datasets when the client wants to update data?

    The answer would be - it depends!

    Of course, each approach (custom entites and datasets) takes a bit of getting used to. But in my experience it's worth getting to know both!

    Best Regards
    /Mats Helander

  • Barry Gervin May 13, 2004 11:57 AM

    Mehran,

    I'm not suggesting to have the persistence methods in the Order itself. The DataSets have a 'DataAdapter' that knows how to persist them.

    Mats,

    Sometime ago I learned that the right answer for any question is 'it depends' ;).

    Anyway, the point here is that people who uses custom entites does not see that they need to keep/send the old values and the new ones, or they use timestamps for optimistic locking. Once they find the limitations of that approach, they need to build something quite complex manually (that's what some O/R mappers usually do), and that's pre-built in the DataSet.

    About exchanging complete Orders or OrderLines, I still prefer working with the whole Order. That's the real 'entity', the one that needs to be consistent, that could require more complex concurrency checkings (i.e., noone added another line to the order), etc.

  • Barry Gervin May 13, 2004 1:11 PM

    Hi Andres,

    I agree completely with you that optimistic concurrency is important, and like you I prefer to use the the column-level approach rather than versino columns, for higher scalability. While I can't argue with the fact that datasets support optimistic concurrency, I'll also have to say that I wouldn't use custom entities that didn't, so in my eyes that point is kind of moot.

    I also agree with you that sending whole entities or even graphs around as xml documents is very comfortable, and something I try to do whenever possible, but sometimes the fine-grained service is the way to go...as you say,, 'it depends' is really an extremely useful phrase ;-)

    Another useful phrase is 'Ask the domain expert!' :-)

    /Mats

  • Barry Gervin May 14, 2004 3:48 PM

    I've done many projects with Typed Datasets and and few less without.
    I find the typed datasets easier to work with because third party controls/apps work with datasets, crystal reports. And the serialization is much easier.

    I only wish you could merge datasets. In one project we had a Typed dataset for every data entry page. and when we had customer and vendor which were both company tables, but different fields updated. Maintaining the datasets became an issue.

  • Barry Gervin May 15, 2004 5:46 PM

    The service example I gave "AddOrderLineToOrder" would be called in-process on the server. Should the server expose a document exchange protocol ( as is quite common ), after receiving the document on the server, a number of calls would have to be made like "AddOrderLineToOrder".

    Therefore, the issue decomposes from (so-called) client-server to between services on the server.

    My argument for custom classes derives from the need to perform actual business logic on the server, and not just toss the changes into a database.

  • Barry Gervin May 16, 2004 8:50 AM

    Of course the document based services like "UpdateOrder" will be broken up into several in-process calls to methods like "AddOrderLineToOrder" on the server.

    That doesn't mean that it isn't sometimes necessary to expose fine-grained services like AddOrderLineToOrder to out of process clients as well.

    In any case, I agree that it is when you need to perform more serious business logic than just shuffling data in and out of the db that custom classes really start becoming useful - whether you opt to distribute your business logic over the custom classes or in a separate service layer.

    I was probably a bit unclear on this before. I realize that reading my earlier post with points (1) and (2) in it, it looks like I'm suggesting that all that is being done on the server is basic CRUD (when exchanging documents) and then I ask "can custom classes help you with this?"...

    What I really meant was that one should ask the question "Do I do anything /more/ than just the basic CRUD on the server?" and then, if the answer is "yes", perhaps custom classes can be an interesting option...

    I'll try to be more clear! ;-)

  • Barry Gervin May 16, 2004 9:36 PM

    Udi,

    OK, but when you send the Order from the client to the server, you need to know which rows were added. You need to have that information in your Order custom entity.

    I could write the same business logic using a 'OrderLineDataRow'. Is the same as using a custom class, _unless_ your custom classes are really a 'domain model', with inheritance, composition, etc. When I say a domain model I'm not talking about where you put your business logic but where you are writing

    Order.Customer.Id

    or

    Order.CustomerId

    If you are doing the first, then you'll have a lot of problems moving to a service oriented architecture. If you are doing the second, you won't, but in that case writing business logic with the custom entity is not more difficult than using a DataRow.

  • Barry Gervin May 18, 2004 9:42 AM

    Andres,

    I do it the first way (Order.Customer.Id) and I don't have any problems with SOA. Quite the opposite, in fact, SOA and a Document Oriented approach are a very natural fit with the custom entities. A document usually corresponds to a subset of the domain model.



  • Barry Gervin May 18, 2004 12:25 PM

    Mats,

    It's a subset of the domain model, but the subset includes different views of 'Customer'. For one document the Customer could have some fields, and for other document, it could have other fields. When I'm retrieving an order I don't want all the Customer data. When I'm retrieving a Customer, I do.

    In that case, you need multiple custom entities for the Customer.

  • Barry Gervin May 18, 2004 4:36 PM

    Andres,

    That's quite right!

    I just posted an entry in my blog addressing this, perhaps you'll find it interesting:

    <a href="http://news.pragmatier.com/matshelander/Weblog/DisplayLogEntry.aspx?LogEntryID=6">Serializing Domain Objects</a>

    As you say, I use multiple (overlapping) sets of custom entities that I design using a Document Oriented perspective for just this purpose and that reside on top of the O/R Mapped custom entities.

  • Barry Gervin May 18, 2004 4:37 PM

    Oops, that link became very ugly. Sorry about that, my bad.

  • TrackBack May 19, 2004 12:15 PM

    Dataset/Custom Entity Discussion and My 2 Cents

  • Barry Gervin May 23, 2004 3:03 AM

    Andres,

    When sending documents from client to server, I prefer to view this problem as a connection between different systems issue.

    Firstly, I develop the "core" system "properly" and then consider how to connect it to other systems.

    In cases where there is a high speed connection between the systems, I prefer to expose functionality as I showed above and let the connecting system work it out (that is, unless I have to write the connecting system <grin />).

    The issue you raised above - Order.Customer.Id vs Order.CustomerId has too much background to cover in a comment, so I'll post on it later. Sometimes, these seemingly mundane issues make such a difference on how systems hang together that to let inexperienced programmers decide (as too often happens in bloated organizations, from my experience) can bring development to its knees as a result of bugs late in the lifecycle.

    No offense is intended, of course, against anyone here. I'm just venting a little :)

  • TrackBack June 1, 2004 11:38 AM

  • TrackBack June 1, 2004 12:02 PM

  • TrackBack June 2, 2004 10:13 AM

  • Barry Gervin June 8, 2004 11:51 AM

    Any one have a good example using datasets,similar to tracker or pet shop 3.0?

    So far, i like custom entities much better, but if someone can point me to a GOOD example of using a dataset across layers, i would be more then willing to give them a shot, but as of right now, i cant see how i could use them correctly.

    Thanks

  • Barry Gervin June 9, 2004 3:18 AM

    IssueVision uses a typed dataset although the sample is somewhat simplistic. So does TaskVision. http://www.windowsforms.net/default.aspx?tabIndex=7&tabId=44

  • Barry Gervin June 13, 2004 8:39 AM

    Yeah, i have looked at both and neither are that great of an example. Taskvision isnt 3 layered eitehr. I was hoping to find one like Tracker cept built with datasets.

  • TrackBack June 14, 2004 3:04 PM

  • Barry Gervin June 17, 2004 12:03 PM

    Well i did find a really good example, its got everything from mobile phone and the compact framework to a thick client app.

    http://www.learn247.net/werock247

  • Barry Gervin June 28, 2004 2:18 PM

    Barry, I think you might want to consider creating a direct link to this post right off your home. This is the best discussion I've seen so far, not only datasets vs. custom entities but also on general architecture best practices on DAL/BLL, etc.

  • Barry Gervin July 1, 2004 6:06 PM

    So... that took a while to finish reading all of the comments...

    Udi,
    (if you are still following) you talk about implementing optimistic concurrency by holding a shallow copy of the entity object. How do you do this?

    Mats,

    Why is column-level approach (for concurrency control) more scalable than using versionno(or timesamp) approach?

    Barry,

    Can you explain this "columnar storage" vs. row-based storage idea? Or point me to the right resource where I might read more on the topic?

    And all,

    How would you work with an entity, for example, Account that contains Address entity and Contact entries? And each contact entry would also contain an Address entity as well?
    So in a disconnected situation, how would you handle adds/edits/deletes to the disconnected Account entity WRT persisting the changes to the database? I would gather this is rather easy with DataSet approach but how about for custom entities?

    Thanks

  • Barry Gervin July 1, 2004 8:08 PM

    Columnar storage concept is referenced in the last paragraph here: http://objectsharp.com/Blogs/barry/archive/2004/02/24/284.aspx

  • Barry Gervin July 2, 2004 9:25 AM

    Thanks for the link. Although the linked page - in Mike Pizzo's comment - explains the concept of columnar storage well enough, it really doesn't delve into the trade-offs made when possibly switching to a row-based storage. Any ideas?

    BTW, I am reading Designing Data Tier Components and Passing Data Through Tiers from the MS P&P team. I've only read the very beginning parts but I've got to say it's already cleared some of the confusions I had regarding the topic. I'd suggest that it be a required reading if you hadn't already. Also, I should mention that Mike Pizzo is a contributor of the document.

  • Barry Gervin July 2, 2004 1:07 PM

    I've found the following in .NET Data Access Architecture Guide(another by MS P&P) on p.45:
    (sorry about the length...)

    Including All Columns in the WHERE Clause
    This option prevents you from overwriting changes made by other users between
    the time your code fetches the row and the time your code submits the pending
    change in the row. This option is the default behavior of both the Data Adapter
    Configuration Wizard and the SQL code generated by the SqlCommandBuilder.
    This approach is not a recommended practice for the following reasons:
    * If an additional column is added to the table, the query will need to be modified.
    * In general, databases do not let you compare two BLOB values because their
    large sizes make these comparisons inefficient. (Tools such as the
    CommandBuilder and the Data Adapter Configuration Wizard should not
    include BLOB columns in the WHERE clause.)
    * Comparing all columns within a table to all the columns in an updated row can
    create excessive overhead.

    Including Unique Key Columns and the Timestamp Columns
    With this option, the database updates the timestamp column to a unique value after
    each update of a row. (You must provide a timestamp column in your table.) Currently,
    neither the CommandBuilder nor the Data Adapter Configuration Wizard
    supports this option.

    Including Unique Key Columns and the Modified Columns
    In general, this option is not recommended because errors may result if your application
    logic relies on out-of-date data fields or even fields that it does not update.
    For example, if user A changes an order quantity and user B changes the unit price,
    it may be possible for the order total (quantity multiplied by price) to be incorrectly
    calculated.

    -------------
    Sounds like to me, the best option is still using a timestamp(version number) column. My question is does DataSet use option #1 or #3?

  • Barry Gervin August 6, 2004 4:51 PM

    I hear a lot of people saying how wonderful datsets are because you can bind,etc.. Seems people just don't want to write code anymore and they think Microsoft has handled all the real work for you. Hogwash!!! Personally give me an entity anyday over a heave dataset! I can write the code I need fill a list box.

  • Barry Gervin August 24, 2004 6:12 AM

    I haven't been here in a while, but it seems that so many people arrive at my blog from here that I just HAD to return.

    One short comment for now, though -

    I've recently began consulting on a project that made extensive use of datasets. Anyway, they had this teeny-tiny little problem that they weren't handling. You see, when several users used their rich client at the same time, every once in a while, a DbConcurrencyException popped up. So, the thing was that the system needed to support quite a few concurrent users for the next release, and they were worried that the fact that they just ignored (read, caught and didn't handle) the above exception would wreck their data integrity.

    Well, the clincher was that because the call to DataAdapter.Update would fail if they didn't insert rows in the correct order, update in (a different) correct order, and delete in (yet another different) correct order, that they decided to not use any foreign keys in the database.

    I could go on and on, but I guess that my point here is that because the use of datasets hid so much of the mechanics of what went on down there that the developers had dug themselves into a whole they couldn't dig their way out of.

    Bottom line, I do NOT suggest beginner developers use datasets because they won't know how (and why) to tweak the default behaviors to handle all the wierd and wonderful cases that occur in the real world.

    For the experienced developers out there, you could make any way work, so it's more a matter of taste than anything. Personally, I prefer the taste of a lean mean entity to the bloated dataset.

  • Barry Gervin August 24, 2004 8:42 AM

    I guess you don't use strings either - just char arrays right? Better yet, why not bit arrays for everything.

    There are good technical reasons for and against any technical architectural. The above is not one of them. Otherwise, if we took this advice, we'd all be programming in assembler language.

    see the law of leaky abstractions(http://www.joelonsoftware.com/articles/LeakyAbstractions.html).

  • Barry Gervin August 27, 2004 1:19 AM

    I like a combination of all methods mentioned so far. I build DataSets and DataTables for my Cached data, and use custom DataManager objects for "doing stuff" to my data.

    When a user requests a list(s) of his Stuff, I dump it into a DataSet/Table, and then manage it through a custom StuffManager class.

    Make sense?

  • TrackBack August 28, 2004 6:10 PM

  • Barry Gervin September 8, 2004 8:08 AM

    Barry,

    The law of leaky abstractions is exactly what I'm talking about. However, your comments don't speak to my point. Don't use abstractions you don't understand - when developers don't understand the issues surrounding concurrency, then using an abstraction like datasets may leave them in the dark when a DbConcurrencyException pops up.

  • Barry Gervin September 9, 2004 5:54 AM

    Lots of good discussions about custom entities and datasets. Can anyone comment on using XML to "pass data between tiers", esp. that SQLXML is evolving?

  • Barry Gervin September 13, 2004 1:25 AM

    When you say "tiers" are you referring to logical layers or to actual deployment tiers?

  • Barry Gervin September 13, 2004 7:51 AM

    I meant logical layers.

  • Barry Gervin September 14, 2004 1:57 AM

    Then why use plain xml (which is in essence the lowest common denominator) when you can use a much richer paradigm? If you're talking about passing data between deployment tiers, that's a different story. But, since the issue is moving data between object A that's in memory to object B that's also in memory, both running on the same platform and in the same process, possibly even in the same AppDomain, I see no advantage to xml besides possibly to ease the move to a distributed deployment model. But in that case, there are so many other critical issues to take care of that the fact that you're passing around xml will have a negligible impact.

  • Barry Gervin October 13, 2004 5:06 PM

    Fantastic debate..... I would like to pose a question for all of you but first some background on my working environment. I am part of a group of .NET architects at a large (very very large) company of about 80k users so perf and scale is immensely important. We are in the midst of building a framework that has the notion of a business entity. A business entity is intended to be the "over the wire" structure so footprint is very important. Today our framework doesn't dictate what the BE is. In other words it could be a datagram or a custom object collection (regardless of 1 or 1+) or a dataset (typed or not). Now all of these "work" for forms clients or web clients with varying degrees of work. Datasets are easy to use for all the reasons you guys have articulated but their footprint is large which means the over the wire size and serialization takes more time that a custom object collection. The downside to the custom object collection is binding. Binding can be done but it's not "free" as is with a DS. Furthermore, DS holds state so updated are easy. We use the following approach to determine if a custom object needs to be persisted. If indentity (id field) is not present it is a insert, if id is present and the delete bit (inherited from BE base) is not set it is a update. If id is present and the delete bit is set it is a delete. Optimisstic locking is done using datetime stamps carried on the object.

    Now onto our other problem. What to do about SOA. Our basic approach is our middle tier is exposed via web services such that we can serve as many clients as possible in a loosely coupled way. If your goal is true SOA what type of Business entity do you use? Well datasets don't make sense unless you are in a pure MSFT environment, which we are not. Custom objects work great if you are ok with losing your behaviour which really means you are using datagrams not rich objects. We have also successfully prototyped co-locating the custom object assemblies on the middle tier and client at the expense of loose coupling in favor of truely rich objects (data and behaviour) by dinking with the proxy class. To date we still struggle with what to recommend to our developers. Bottom line is it depends. If you are really SOA, datasets are out because you cannot assume anything about your consumer except that they can consume a SOAP message. I tend to lean towards the custom object approach and the persistance scheme mentioned above.

    So my question to all of you is... If you are doing SOA, and I assume some of you are doing it in a heterogenous vendor environment. What are you using as your over the wire structure? And why?

  • Barry Gervin October 16, 2004 4:19 PM

    It makes sense (almost by definition) that your BE's are over the wire.....but no matter what your BE implementation you can have multiple formats for "over the wire".

    What exactly do you mean by "Over the wire"? Remoting? DCOM (via COM+)? SOAP?

    With Soap we are likely looking at an XML format. Custom classes will serialize naturally pretty straight forward. So will Datasets but you need to think about the support of the entity to be required? Do you need diffgram functionality? If so, you are likely going to want a dataset (or something pretty close). If not, then certainly a dataset is going to be a bloated format....unless you do something special.

    When you say the dataset has a big footprint - you probably mean the wire format - the diffgram XML serialization. The internal representation is actually pretty good for the functionality and you'd likely not implement concurrency support and null value support with in your own custom entity format any smaller.

    It is possible to GetXml and strip the diffgram fluff and end up with an identical lightweight XML document that you'd get out of an XML serialized custom class. If your system was the host, you'd be doing this transformation in the Service Interface layer, and if you were in the client tier you'd reconstitute a dataset (if you wanted) in the service agent.

    You should NEVER - and I mean NEVER expose business methods (even for just returning entities) out of your system/service without wrapping them in a service agent. Likewise, you should never talk to an external service without wrapping the calls in a service agent (similar notion as a data access class).

    So the point I'm getting at - is that I don't the wire format footprint is a key factor in choosing how you structure your business entities....if your systems talk to each other through layers - which is what you should be doing to maximize your investment and help mitigate future interop risks.

    If you do thing this way - you can provide multiple service interfaces for different systems that want to talk to you - you might have a SOAP/.NET client that would love your dataset - you might have a Java client that really wants to have another format. You might have an external system that uses a different type of dataset and you might need to XSLT it into your format so with a careful plan for the future you can support anybody.

  • Barry Gervin November 12, 2004 10:51 AM

    Agree with the sentiment that this was a great thread.

    I was pretty hopeful that .Net 2.0 would make datasets more attractive via partial classes. Unfortunately, while both the DataSet and the DataTable classes support partials, the DataRow class does not. What bothers me about this is it does not eliminate the oddities associated with dealing with ONE record. The best way I can think of dealing with this is still via adapting the DataRow in a faux-entity class in the manner Andres suggested in a post earlier this year. I've been doing something similar in my own work. Anyone have specific knowledge of why MS didn't make the typed DataRow class extensible via partial classes?

    Personally I like the idea of using an ORM tool for the efficiencies gained (ie. less data to pass around), but I also primarily work with WinForm apps and DataSets are just extremely nice in that environment.

    Anyway, great thread.

  • Barry Gervin December 16, 2004 6:02 AM

    1.Why so many guys just mention the convinience of data binding but ignore the business logic?
    we also need business logic. where is business logic? we put it into the GUI layer when we use dataset?
    2. concurrency management
    there is no easy way to implement concurrency management in dataset.
    use custom entity, we can get clear software arch with knowing the data access layer, so we can implement and unit test all custom entity( let me name it domain object :)) and business logic together. But using dataset, we will involve more layer to do unit test.
    3. refractor is harder when we use dataset

  • Barry Gervin December 20, 2004 4:48 PM

    It is important to have a place for business logic. Other than internal validation - putting complex business logic in an entity (custom or dataset) is not the best spot. Perhaps you are confusing entities with business objects. Entities in the speak going around here is that they are merely data crucibles and can be serialized across tiers. So you may have the same entity living on several tiers of a solution. When I use datasets for entities I always wrap them up in another class to be able to extend them to do that kind of internal validation. Some people modify their strongly typed datasets but then everytime you modify the schema you have work to redo. With a composition approach you can also inherit extra functionality from your wrapping components base class. I still expose the IListSource interface of the dataset through to the surface of the wrapping entity to still enable the rich binding.

    Concurrency Management in Datasets are natively supported by the combination of both the dataset and the various DataAdapters. The dataset maintains original values and proposed changes and maps those to appropriate pairs of parameters in the Data Adapter. It takes about 2 seconds to review your SQL to see this happen. On a Custom Entity? Much more work to do.

  • TrackBack January 5, 2005 10:05 AM

  • Barry Gervin January 26, 2005 2:18 AM

    IMO partial classes will promote bad trendsd and creates disaster potential of adhock business layer implementations.

    The Business Logic problem is really another issue altogether as a well defined architecture will simply keep logic independant of data, and distinguish between that, data itself, and the means of accessing and propagating data. Often cases dealing with multiple 3rd party systems, data will undergo extensive validation and transformation, where data may be, ie, loaded in a DataSet and, ie, written to a binary file, generate a pdf, or interface with the source db/document/service to update. In such a case, a controller would manage the flow process of inbound, business logic (as filters), and outbound operations.

    Then it really just makes sense to use a dataset where you require the free functionality it provides (hundreds of man hours worth), and a custom entity where you need to do your own thing. That behavior can be defined by business logic managed within controllers.

    If is a matter of encapsulation, then it comes down to a question of establishing data structures. Class definition is meant to define programming structures, and is not appropriate as a data model. While Customer.Id is quite convenient way to access the data in the BL, defining Id as a primary key is problematic, sort of like null allowance. As a result, data decay will be inversely proportional to the amount of time designing the custom entities, many dimensions of the data will not make it to the BL unless specifically implemented. Databases and XML documents are proper data architectures, and DataSet implements both very accuretely with a few clicks, and the design of a potentially simplistic super class for the model.

    This way, in the end you will have one layer of code automatically generated and regeneratable without partial implementations, and one layer of code which will be reusable.
    All that is left to define is the flow process logic between both, which could potentially be softwired instead of coded and compiled. Markets change quickly, and sending a business class back to the drawing table can become a costly decision.

    Some will raise the performance issues of this approach, but if you want to flatten your implementation on the assumption it will gain speed, then consider putting the BL in stored procedures, and execute on views. Forget about web services.

  • TrackBack February 15, 2005 5:23 AM

  • TrackBack March 18, 2005 2:11 PM

  • TrackBack March 22, 2005 12:00 PM

    You&rsquo;ve probably heard a lot of talk about&nbsp;why you&nbsp;should use Business Entities rather...

  • TrackBack March 23, 2005 3:40 PM

    You’ve probably heard a lot of talk about&nbsp;why you&nbsp;should
    use Business Entities rather than...

  • TrackBack March 28, 2005 1:07 PM

    I't sjust a link, but it describes pros and cons of using DataSets instead of custom entities for transmission...

  • Barry Gervin April 13, 2005 4:14 PM

    I realize this is a late post, but the way I see it, you get the same or similar functionality between DataSets and custom classes (albeit you have more work to do for custom classes, but you also get more control).

    But for me, the biggest difference is performance/scalability in a distributed architecture. According to a paper published by Microsoft (at http://msdn.microsoft.com/library/en-us/dnbda/html/bdadotnetarch14.asp), there is a very noticable performance/scale difference between using a DataReader to return a custom class, and filling/returning a DataSet.

    In short, the test results show that the custom class is significantly faster and scales better.


  • Barry Gervin August 4, 2005 10:51 PM

    Jeff,
    I would like to clarify something. The paper published by Microsoft compares Web Services versus .NET Remoting as cross process communications choices.
    It does not mean a custom class can be filled faster (using a DataReader) than a Dataset. In fact, a Dataset is filled using a DataReader as well.
    What the paper compares is the performance of passing custom objects and datasets through machines.
    As it is well documented, the Datasets have a weakness when passed serialized as binary (when passing them using .NET Remoting), their binary serialization is just as big as their XML serialization, which is why their performance is weak compared to custom classes (again, when passing them as serialized binary with .NET Remoting). This problem has been fixed for the .NET Framework 2.0. In the meantime, there is a technique using Dataset surrogates to bypass this weakness.

  • TrackBack November 13, 2005 5:27 PM

    Well - here it is - Objects vs DataSets - Especially in a situation where you
    are not accessing...

  • Barry Gervin February 13, 2006 6:46 PM

    This is an excellent thread. In one of our project there is a similar tug of war between custom classes and datasets. This project has complex hierarchical information. To say, we have a entity A that can be used as a child in entity B but at the same time can exist on its own. The problem comes with typed datasets when i try to use the datatable A in B as well as in another datatable C. It throws an exception at compile time for the use of datatable A. In short, i will not be able to re-use datatable A in any other dataset except in B. Is there a resolution in typed datasets. Custom classes will allow me this re-usability.
    Also, the entities are extracting information from multiple tables in the database in different hierarchies so i am unable to use the fill method as well.

  • Barry Gervin May 2, 2006 3:06 PM

    I have been an enterprise developer for several years, and have used C# and VB.NET for most of this time. I have designed and coded numerous systems in which custom business objects REALLY made sense. Within those classes, I often used DataSets, DataAdapters, and CommandBuilders to generate CRUD logic for my objects. Each object had Create, Update, and Delete methods and had (as a subclass)a List class that was a typed collection of objects of each type.

    The problem that I see today is that many 'new' developers are able to develop apps using DataSets (typed or untyped), with typed datasets as their 'objects', but many of them didn't truly understand the underlying OO principles that are required to build custom business entities. It's almost as if Visual Studio has become TOO quick and easy! That's just my opinion. I am an old-school coder who demands complete control of my objects! I don't resist progress, nor do I see developers as becoming 'lazy'.

  • Barry Gervin May 3, 2006 11:52 AM

    I've really struggled in choosing between creating a custom domain entity and just using DataSets. This blog has really helped me understand the pro's and cons, but even so, going forward I don't know the answer.

    DataSets just feel yucky, but the results are really quick to see. Custom entity classes (which I've used on my Cyprus Villas website) took a lot of coding (I did it by hand).

    Overall, for smaller developments (websites) I think it's hard to beat a typed DataSet.

  • Barry Gervin May 21, 2006 9:53 AM

    there are many examples of Custom Entities project, is there a good eample that shows with typed datatset? out there

  • Barry Gervin June 13, 2006 3:41 AM

    Our data architecture just got modified from autogenerated classes via CodeSmith to strongly typed datasets. Unfortunately, our SQL Server 2000 legacy database has tables with fields that default to Null and we can't change it because we don't know how the legacy systems will handle it.

    Null values are becoming an incredible pain to work with through datasets. Null Integers throw an exception if you try to access them which is just unacceptable. It doesn't even matter if you test for DBNull first it still raises an exception and you can't change it to anything else via the data designer. The same problems also occur for dates. I've patched some of the problems using partial classes, (especially getting the fill to work in the first place) but it is a real bother. Is anyone else having problems relating to Null values (especially Int and Date), as I would really appreciate some feedback. I am really cursing MS for not handling nulls properly for me in the code the data designer generates.

  • Barry Gervin June 13, 2006 4:05 AM

    Continuing from my post above.....
    Here is an example of the Dataset Partial Class work-around we are using that may help.

    Partial Class CustomerInfo
    Partial Public Class CustomerDataTable
    Inherits System.Data.DataTable
    Implements System.Collections.IEnumerable

    Public Overrides Sub EndInit()
    AllowDBNulls()
    MyBase.EndInit()
    End Sub

    Public Sub AllowDBNulls()
    Dim dataCol As System.Data.DataColumn
    Dim dataTypeString As String = ""

    ' Loop through the defined columns.
    For Each dataCol In MyBase.Columns
    ' Change each column field to allow nulls from the database.
    dataCol.AllowDBNull = True
    dataTypeString = dataCol.DataType.ToString
    dataCol.DefaultValue = SetDefaultForType(dataTypeString)
    Next
    End Sub

    Public Sub ChangeDBNulls()
    Dim i As Integer = 0
    Dim dataRow As System.Data.DataRow
    Dim dataTypeString As String = ""

    ' Are there records present in the table?
    If Me.Rows.Count > 0 Then
    ' Loop through each record.
    For Each dataRow In Me.Rows
    ' Loop through each field in the record.
    For i = 0 To dataRow.ItemArray.Length - 1
    ' Does the field have a value of null?
    If IsDBNull(dataRow.Item(i)) Then
    dataTypeString = MyBase.Columns.Item(i).DataType.ToString
    dataRow.Item(i) = SetDefaultForType(DataTypeString)
    End If
    Next
    Next
    End If
    End Sub
    End Class


    Module Common

    Public Function SetDefaultForType(ByVal DataTypeString As String) As Object

    If DataTypeString = "System.String" _
    Or DataTypeString = "System.Char" Then ' Is Field character based?
    SetDefaultForType = ""

    ElseIf InStr(DataTypeString, "System.Int") > 0 _
    Or InStr(DataTypeString, "System.UInt") > 0 _
    Or DataTypeString = "System.Long" Then ' Is the field integer based?
    SetDefaultForType = 0

    ElseIf DataTypeString = "System.Decimal" _
    Or DataTypeString = "System.Double" Then ' Is the field float based?
    SetDefaultForType = 0D

    ElseIf DataTypeString = "System.DateTime" Then ' Is the field a date?
    SetDefaultForType = New System.DateTime

    ElseIf DataTypeString = "System.Boolean" Then ' Is the field a Boolean?
    SetDefaultForType = False

    Else
    SetDefaultForType = Nothing
    End If
    Debug.WriteLine(DataTypeString)

    End Function

    End Module

  • Barry Gervin June 13, 2006 4:13 AM

    I left out.....
    That we are calling ChangeDBNulls on the dataset after the dataset is filled, and it goes through the fields and changes them to a default value. This is obviously a real kludge, but what we need to do to go forward. Maybe I'm missing something somewhere.

    customersTable = customersAdapter.GetData(CustomerNo)
    customersTable.ChangeDBNulls()

    The problem with dates is even further complicated by the use of SQL smalldate on our tables. A new work-around using max or min dates to represent null is a real pain too. HELP!!!

    It might be obvious, but we are using stored procs to generate the dataset by the way, using aliases on the select to get some nice properties on our datatables.

  • Barry Gervin July 24, 2006 9:12 PM

    This is a great debate. I really like reading about this stuff. In the end, each has its own pros and cons. A good architect will know how to make a decision on it.

    I will explain how I make the decision to use Datasets vs. Custom Entities

    When to use datasets:
    use when it is a database driven application. Period =]

    When to use custom entities:
    use when it is NOT a database driven application. For example, user controls like Infragistics, network applicactions like xCeed FTP, any software that will be resold or reused by others external to your organization.

    p.s. Somebody who is a guru of sql is more valuable then somebody who is a guru of c#.

    p.s.s. nobody will ever convince me that custom entities are faster than well written optimized sql queries!

  • Greg Finzer January 1, 2007 2:55 PM

    I recently wrote an article on the advantages of business objects over datasets.  Please visit and let me know what you think.

    http://www.kellermansoftware.com/t-articlebusinessobjects.aspx

  • Roping March 27, 2007 10:20 PM

    I am not gree with you!

    DataSet is not objein_object all!

  • Roping March 27, 2007 10:21 PM

    DateSet is very ugrly!

  • 2Guys' Blog August 7, 2007 8:42 AM

    TBH-Chapter3:Planning an Architecture

New Comments to this post are disabled