The Entity Framework vs. The Data Access Layer (Part 0: Introduction)

So the million dollar question is: Does the Entity Framework replace the need for a Data Access Layer? If not, what should my Data Access Layer look like if I want to take advantage of the Entity Framework? In this multi-part series, I hope to explore my thoughts on this question. I don't think there is a single correct answer. Architecture is about trade offs and the choices you make will be based on your needs and context.

In this first post, I first provide some background on the notion of a Data Access Layer as a frame of reference, and specifically, identify the key goals and objectives of a Data Access Layer.

While Martin Fowler didn't invent the pattern of layering in enterprise applications, his Patterns of Enterprise Application Architecture is a must read on the topic. Our goals for a layered design (which may often need to be traded off against each other) should include:

  • Changes to one part or layer of the system should have minimal impact on other layers of the system. This reduces the maintenance involved in unit testing, debugging, and fixing bugs and in general makes the architecture more flexible.
  • Separation of concerns between user interface, business logic, and persistence (typically in a database) also increases flexibility, maintainability and reusability.
  • Individual components should be cohesive and unrelated components should be loosely coupled. This should allow layers to be developed and maintained independently of each other using well-defined interfaces.

Now to be clear, I'm talking about a layer, not a tier. A tier is a node in a distributed system, of which may include one or more layers. But when I refer to a layer, I'm referring only to the logical separation of code that serves a single concern such as data access. It may or may not be deployed into a separate tier from the other layers of a system. We could then begin to fly off on tangential discussions of distributed systems and service oriented architecture, but I will do my best to keep this discussion focused on the notion of a layer. There are several layered application architectures, but almost all of them in some way include the notion of a Data Access Layer (DAL). The design of the DAL will be influenced should the application architecture include the distribution of the DAL into a separate tier.

In addition to the goals of any layer mentioned above, there are some design elements specific to a Data Access Layer common to the many layered architectures:

  • A DAL in our software provides simplified access to data that is stored in some persisted fashion, typically a relational database. The DAL is utilized by other components of our software so those other areas of our software do not have to be overly concerned with the complexities of that data store.
  • In object or component oriented systems, the DAL typically will populate objects, converting rows and their columns/fields into objects and their properties/attributes. this allows the rest of the software to work with data in an abstraction that is most suitable to it.
  • A common purpose of the DAL is to provide a translation between the structure or schema of the store and the desired abstraction in our software. As is often the case, the schema of a relational database is optimized for performance and data integrity (i.e. 3rd normal form) but this structure does not always lend itself well to the conceptual view of the real world or the way a developer may want to work with the data in an application. A DAL should serve as a central place for mapping between these domains such as to increase the maintainability of the software and provide an isolation between changes  in the storage schema and/or the domain of the application software. This may include the marshalling or coercing of differing data types between the store and the application software.
  • Another frequent purpose of the DAL is to provide independence between the application logic and the storage system itself such that if required, the storage engine itself could be switched with an alternative with minimal impact to the application layer. This is a common scenario for commercial software products that must work with different vendors' database engines (i.e. MS SQL Server, IBM DB/2, Oracle, etc.). With this requirement, sometimes alternate DAL's are created for each store that can be swapped out easily.  This is commonly referred to as Persistence Ignorance.

Getting a little more concrete, there are a host of other issues that also need to be considered in the implementation of a DAL:

  • How will database connections be handled? How will there lifetime be managed? A DAL will have to consider the security model. Will individual users connect to the database using their own credentials? This maybe fine in a client-server architecture where the number of users is small. It may even be desirable in those situations where there is business logic and security enforced in the database itself through the use of stored procedures, triggers, etc. It may however run incongruent to the scalability requirements of a public facing web application with thousands of users. In these cases, a connection pool may be the desired approach.
  • How will database transactions be handled? Will there be explicit database transactions managed by the data access layer or will automatic or implied transaction management systems such as COM+ Automatic Transactions, the Distributed Transaction Coordinator be used?
  • How will concurrent access to data be managed? Most modern application architecture's will rely on an optimistic concurrency  to improve scalability. Will it be the DAL's job to manage the original state of a row in this case? Can we take advantage of SQL Server's row version timestamp column or do we need to track every single column?
  • Will we be using dynamic SQL or stored procedures to communicate with our database?

As you can see, there is much to consider just in generic terms, well before we start looking at specific business scenarios and the wacky database schemas that are in the wild. All of these things can and should influence the design of your data access layer and the technology you use to implement it. In terms of .NET, the Entity Framework is just one data access technology. MS has been so kind to bless us with many others such as Linq To SQL, DataReaders, DataAdapters & DataSets, and SQL XML. In addition, there are over 30 3rd party Object Relational Mapping tools available to choose from.

Ok, so if you're  not familiar with the design goals of the Entity Framework (EF) you can read all about it here or watch a video interview on channel 9, with Pablo Castro, Britt Johnson, and Michael Pizzo. A year after that interview, they did a follow up interview here.

In the next post, I'll explore the idea of the Entity Framework replacing my data access layer and evaluate how this choice rates against the various objectives above. I'll then continue to explore alternative implementations for a DAL using the Entity Framework.