The Entity Framework vs. The Data Access Layer (Part 1: The EF as a DAL)

In Part 0: Introduction of this series after asking the question "Does the Entity Framework replace the need for a Data Access Layer?", I waxed lengthy about the qualities of a good data access layer. Since that time I've received a quite a few emails with people interested in this topic. So without further adieu, let's get down to the question at hand.

So let's say you go ahead and create an Entity Definition model (*.edmx) in Visual Studio and have the designer generate for you a derived ObjectContext class and an entity class for each of your tables, derived from EntityObject. This one to one table mapping to entity class is quite similar to LINQ to SQL but the mapping capabilities move well beyond this to support advanced data models. This is at the heart of why the EF exists: Complex Types, Inheritance (Table per Type, Table per Inheritance Hierarchy), Multiple Entity Sets per Type, Single Entity Mapped to Two Tables, Entity Sets mapped to Stored Procedures or mapping to a hand-crafted query, expressed as either SQL or Entity SQL. EF has a good story for a conceptual model over top of our physical databases using Xml Magic in the form of the edmx file - and that's why it exists.

So to use the Entity Framework as your data access layer, define your model and then let the EdmGen.exe tool do it's thing to the edmx file at compile time and we get the csdl, ssdl, and msl files - plus the all important code generated entity classes. So using this pattern of usage for the Entity Framework, our data access layer is complete. It may not be the best option for you, so let's explore the qualities of this solution.

To be clear, the assumption here is that our data access layer in this situation is the full EF Stack: ADO.NET Entity Client, ADO.NET Object Services, LINQ to Entities, including our model (edmx, csdl, ssdl, msl) and the code generated entities and object context. Somewhere under the covers there is also the ADO.NET Provider (SqlClient, OracleClient, etc.)

image

To use the EF as our DAL, we would simply execute code similar to this in our business layer.

var db = new AdventureWorksEntities();
var activeCategories = from category in db.ProductCategory
                 where category.Inactive != true
                 orderby
category.Name
                 select category;

How Do "EF" Entities Fit In?

If you're following along, you're probably asking exactly where is this query code above being placed. For the purposes of our discussion, "business layer" could mean a business object or some sort of controller. The point to be made here is that we need to think of Entities as something entirely different from our Business Objects.

Entity != Business Object

In this model, it is up to the business object to ask the Data Access Layer to project entities, not business objects, but entities.

This is one design pattern for data access, but it is not the only one. A conventional business object that contains its own data, and does not separate that out into an entity can suffer from tight bi-directional coupling between the business and data access layer. Consider a Customer business object with a Load method. Customer.Load() would in turn instantiate a data access component, CustomerDac and call the CustomerDac's Load or Fill method. To encapsulate all the data access code to populate a customer business object, the CustomerDac.Load method would require knowledge of the structure the Customer business object and hence a circular dependency would ensue.

The workaround, if you can call it that, is to put the business layer and the data access layer in the same assembly - but there goes decoupling, unit testing and separation of concerns out the window.

Another approach is to invert the dependency. The business layer would contain data access interfaces only, and the data access layer would implement those interfaces, and hence have a reverse dependency on the business layer. Concrete data access objects are instantiated via a factory, often combined with configuration information used by an Inversion of Control container. Unfortunately, this is not all that easy to do with the EF generated ObjectContext & Entities.

Or, you do as the Entity Framework implies and separate entities from your business objects. If you've used typed DataSets in the past, this will seem familiar you to you. Substitute ObjectContext for SqlConnection and SqlDataAdapter, and the pattern is pretty much the same.

Your UI presentation layer is likely going to bind to your Entity classes as well. This is an important consideration. The generated Entity classes are partial classes and can be extended with your own code. The generated properties (columns) on an entity also have event handlers created for changing and changed events so you can also wire those up to perform some column level validation. Notwithstanding, you may want to limit your entity customizations to simple validation and keep the serious business logic in your business objects. One of these days, I'll do another blog series on handing data validation within the Entity Framework.

How does this solution stack up?

How are database connections managed?

thumbs up Using the Entity Framework natively itself, the ObjectContext takes care of opening & closing connections for you - as needed when queries are executed, and during a call to SaveChanges. You can get access to the native ADO.NET connection if need be to share a connection with other non-EF data access logic. The nice thing however is that, for the most part, connection strings and connection management are abstracted away from the developer.

thumbs down A word of caution however. Because the ObjectContext will create a native connection, you should not wait to let the garbage collector free that connection up, but rather ensure that you dispose of the ObjectContext either explicitly or with a using statement.

Are all SQL Queries centralized in the Data Access Layer?

thumbs down By default the Entity Framework dynamically generates store specific SQL on the fly and therefore, the queries are not statically located in any one central location. Even to understand the possible queries, you'd have to walk through all of your business code that hits the entity framework to understand all of the potential queries.

But why would you care? If you have to ask that question, then you don't care. But if you're a DBA, charged with the job of optimizing queries, making sure that your tables have the appropriate indices, then you want to go to one central place to see all these queries and tune them if necessary. If you care strongly enough about this, and you have the potential of other applications (perhaps written in other platforms), then you likely have already locked down the database so the only access is via Stored Procedures and hence the problem is already solved.

Let's remind ourselves that sprocs are not innately faster than dynamic SQL, however they are easier to tune and you also have the freedom of using T-SQL and temp tables to do some pre-processing of data prior to projecting results - which sometimes can be the fastest way to generate some complex results. More importantly, you can revoke all permissions to the underlying tables and only grant access to the data via Stored Procedures. Locking down a database with stored procedures is almost a necessity if your database is oriented as a service, acting as an integration layer between multiple client applications. If you have multiple applications hitting the same database, and you don't use stored procedures - you likely have bigger problems. 

In the end, this is not an insurmountable problem. If you are already using Stored Procedures, then by all means you can map those in your EDM. This seems like the best approach, but you could also embed SQL Server (or other provider) queries in your SSDL using a DefiningQuery.

Do changes in one part of the system affect others?

It's difficult to answer this question without talking about the possible changes.

thumbs up Schema Changes: The conceptual model and the mapping flexibility, even under complex scenarios is a strength of the entity framework. Compared to other technologies on the market, with the EF, your chances are as good as they're going to get that a change in the database schema will have minimal impact on your entity model, and vice versa.

thumbs up Database Provider Changes: The Entity Framework is database agnostic. It's provider model allows for easily changing from SQL Server, to Oracle, to My Sql, etc. via connection strings. This is very helpful for ISVs whose product must support running on multiple back-end databases.

thumbs down Persistence Ignorance: What if the change you want in one part of the system is to change your ORM technology? Maybe you don't want to persist to a database, but instead call a CRUD web service. In this pure model, you won't be happy. Both your Entities and your DataContext object inherit from base classes in the Entity Framework's System.Data.Objects namespace. By making references to these, littered throughout your business layer, decoupling yourself from the Entity Framework will not be an easy task.

thumbs down Unit Testing: This is only loosely related to the question, but you can't talk about PI without talking about Unit Testing. Because the generated entities do not support the use of Plain Old CLR Objects (POCO), this data access model is not easily mocked for unit testing.

Does the DAL simplify data access?

thumbs up Dramatically. Compared to classic ADO.NET, LINQ queries can be used for typed results & parameters, complete with intelli-sense against your conceptual model, with no worries about SQL injection attacks.

thumbs up As a bonus, what you do get is query composition across your domain model. Usually version 1.0 of a convention non-ORM data access layer provides components for each entity, each supporting crud behaviour. Consider a scenario where you need to show all of the Customers within a territory, and then you need to show the last 10 orders for each Customer. Now I'm not saying you'd do this, but what I've commonly seen is that while somebody might write a CustomerDac.GetCustomersByTerritory() method, and they might write an OrderDac.GetLastTenOrders(), they would almost never write a OrderDac.GetLastTenOrdersForCustomersInTerritory() method. Instead they would simply iterate over the collection of customers found by territory and call the GetLastTenOrders() over and over again. Obviously this is "good" resuse of the data access logic, however it does not perform very well.

Fortunately, through query composition and eager loading, we can cause the Entity Framework (or even LINQ to SQL) to use a nested subquery to bring back the last 10 orders for each customer in a given territory in a single round trip, single query. Wow! In a conventional data access layer you could, and should write a new method to do the same, but by writing yet another query on the order table, you'd be repeating the mapping between the table and your objects each time.

Layers, Schmayers: What about tiers?

thumbs down EDM generated entity classes are not very tier-friendly. The state of an entity, whether it is modified, new, or to be delete, and what columns have changed is managed by the ObjectContext. Once you take an entity and serialize it out of process to another tier, it is no longer tracked for updates. While you can re-attach an entity that was serialized back into the data access tier, because the entity itself does not serialize it's changed state (aka diff gram), you can not easily achieve full round trip updating in a distributed system. There are techniques for dealing with this, but it is going to add some plumbing code between the business logic and the EF...and make you wish you had a real data access layer, or something like Danny Simmons' EntityBag (or a DataSet).

Does the Data Access Layer support optimistic concurrency?

thumbs up Out of the box, yes, handily. Thanks to the ObjectContext tracking state, and the change tracking events injected into our code generated entity properties. However, keep in mind the caveat with distributed systems that you'll have more work to do if your UI is separated from your data access layer by one or more tiers.

How does the Data Access Layer support transactions?

thumbs up Because the Entity Framework builds on top of ADO.NET providers, transaction management doesn't change very much. A single call to ObjectContext.SaveChanges() will open a connection, perform all inserts, updates, and deletes across all entities that have changed, across all relationships and all in the correct order....and as you can imagine in a single transaction. To make transactions more granular than that, call SaveChanges more frequently or have multiple ObjectContext instances for each unit of work in progress. To broaden the scope of a transaction, you can manually enlist using a native ADO.NET provider transaction or by using System.Transactions.