Getting the Data to the Phone

A few posts back I started talking about what it would take to create a new application for the new Windows Phone 7.  I’m not a fan of learning from trivial applications that don’t touch on the same technologies that I would be using in the real world, so I thought I would build a real application that someone can use.

Since this application uses a well known dataset I kind of get lucky because I already have my database schema, which is in a reasonably well designed way.  My first step is to get it to the Phone, so I will use WCF Data Services and an Entity Model.  I created the model and just imported the necessary tables.  I called this model RaceInfoModel.edmx.  The entities name is RaceInfoEntities  This is ridiculously simple to do.

The following step is to expose the model to the outside world through an XML format in a Data Service.  I created a WCF Data Service and made a few config changes:

using System.Data.Services;
using System.Data.Services.Common;
using System;

namespace RaceInfoDataService
{
    public class RaceInfo : DataService
{ public static void InitializeService(DataServiceConfiguration config) { if (config
== null) throw new ArgumentNullException("config"); config.UseVerboseErrors
= true; config.SetEntitySetAccessRule("*", EntitySetRights.AllRead); //config.SetEntitySetPageSize("*",
25); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
} } }

This too is reasonably simple.  Since it’s a web service, I can hit it from a web browser and I get a list of available datasets:

image

This isn’t a complete list of available items, just a subset.

At this point I can package everything up and stick it on a web server.  It could technically be ready for production if you were satisfied with not having any Access Control’s on reading the data.  In this case, lets say for arguments sake that I was able to convince the powers that be that everyone should be able to access it.  There isn’t anything confidential in the data, and we provide the data in other services anyway, so all is well.  Actually, that’s kind of how I would prefer it anyway.  Give me Data or Give me Death!

Now we create the Phone project.  You need to install the latest build of the dev tools, and you can get that here http://developer.windowsphone.com/windows-phone-7/.  Install it.  Then create the project.  You should see:

image

The next step is to make the Phone application actually able to use the data.  Here it gets tricky.  Or really, here it gets stupid.  (It better he fixed by RTM or else *shakes fist*)

For some reason, the Visual Studio 2010 Phone 7 project type doesn’t allow you to automatically import services.  You have to generate the service class manually.  It’s not that big a deal since my service won’t be changing all that much, but nevertheless it’s still a pain to regenerate it manually every time a change comes down the pipeline.  To generate the necessary class run this at a command prompt:

cd C:\Windows\Microsoft.NET\Framework\v4.0.30319
DataSvcutil.exe
     /uri:http://localhost:60141/RaceInfo.svc/
     /DataServiceCollection
     /Version:2.0
     /out:"PATH.TO.PROJECT\RaceInfoService.cs"

(Formatted to fit my site layout)

Include that file in the project and compile.

UPDATE: My bad, I had already installed the reference, so this won’t compile for most people.  The Windows Phone 7 runtime doesn’t have the System.Data namespace available that we need.  Therefore we need to install them…  They are still in development, so here is the CTP build http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=b251b247-70ca-4887-bab6-dccdec192f8d.

You should now have a compile-able project with service references that looks something like:

image

We have just connected our phone application to our database!  All told, it took me 10 minutes to do this.  Next up we start playing with the data.

Data as a Service and the Applications that consume it

Over the past few months I have seen quite a few really cool technologies released or announced, and I believe they have a very real potential in many markets.  A lot of companies that exist outside the realm of Software Development, rarely have the opportunity to use such technologies.

Take for instance the company I work for: Woodbine Entertainment Group.  We have a few different businesses, but as a whole our market is Horse Racing.  Our business is not software development.  We don’t always get the chance to play with or use some of the new technologies released to the market.  I thought this would be a perfect opportunity to see what it will take to develop a new product using only new technologies.

Our core customer pretty much wants Race information.  We have proof of this by the mere fact that on our two websites, HorsePlayer Interactive and our main site, we have dedicated applications for viewing Races.  So lets build a third race browser.  Since we already have a way of viewing races from your computer, lets build it on the new Windows Phone 7.

The Phone – The application

This seems fairly straightforward.  We will essentially be building a Silverlight application.  Let’s take a look at what we need to do (in no particular order):

  1. Design the interface – Microsoft has loads of guidance on following with the Metro design.  In future posts I will talk about possible designs.
  2. Build the interface – XAML and C#.  Gotta love it.
  3. Build the Business Logic that drives the views – I would prefer to stay away from this, suffice to say I’m not entirely sure how proprietary this information is
  4. Build the Data Layer – Ah, the fun part.  How do you get the data from our internal servers onto the phone?  Easy, OData!

The Data

We have a massive database of all the Races on all the tracks that you can wager on through our systems.  The data updates every few seconds relative to changes from the tracks for things like cancellations or runner odds.  How do we push this data to the outside world for the phone to consume?  We create a WCF Data Service:

  1. Create an Entities Model of the Database
  2. Create Data Service
  3. Add Entity reference to Data Service (See code below)
 
    public class RaceBrowserData : DataService
{ public static void InitializeService(DataServiceConfiguration config) { if (config
== null) throw new ArgumentNullException("config"); config.UseVerboseErrors
= true; config.SetEntitySetAccessRule("*", EntitySetRights.AllRead); //config.SetEntitySetPageSize("*",
25); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
} } 

That’s actually all there is to it for the data.

The Authentication

The what?  Chances are the business will want to limit application access to only those who have accounts with us.  Especially so if we did something like add in the ability to place a wager on that race.  There are lots of ways to lock this down, but the simplest approach in this instance is to use a Secure Token Service.  I say this because we already have a user store and STS, and duplication of effort is wasted effort.  We create a STS Relying Party (The application that connects to the STS):

  1. Go to STS and get Federation Metadata.  It’s an XML document that tells relying parties what you can do with it.  In this case, we want to authenticate and get available Roles.  This is referred to as a Claim.  The role returned is a claim as defined by the STS.  Somewhat inaccurately, we would do this:
    1. App: Hello! I want these Claims for this user: “User Roles”.  I am now going to redirect to you.
    2. STS: I see you want these claims, very well.  Give me your username and password.
    3. STS: Okay, the user passed.  Here are the claims requested.  I am going to POST them back to you.
    4. App: Okay, back to our own processes.
  2. Once we have the Metadata, we add the STS as a reference to the Application, and call a web service to pass the credentials.
  3. If the credentials are accepted, we get returned the claims we want, which in this case would be available roles.
  4. If the user has the role to view races, we go into the Race view.  (All users would have this role, but adding Roles is a good thing if we needed to distinguish between wagering and non-wagering accounts)

One thing I didn’t mention is how we lock down the Data Service.  That’s a bit more tricky, and more suited for another post on the actual Data Layer itself.

So far we have laid the ground work for the development of a Race Browser application for the Windows Phone 7 using the Entity Framework and WCF Data Services, as well as discussed the use of the Windows Identity Foundation for authentication against an STS.

With any luck (and permission), more to follow.

Six Simple Development Rules (for Writing Secure Code)

I wish I could say that I came up with this list, but alas I did not.  I came across it on the Assessment, Consulting & Engineering Team blog from Microsoft, this morning.  They are a core part of the Microsoft internal IT Security Group, and are around to provide resources for internal and external software developers.  These 6 rules are key to developing secure applications, and they should be followed at all times.

Personally, I try to follow the rules closely, and am working hard at creating an SDL for our department.  Aside from Rule 1, you could consider each step a sort of checklist for when you sign off, or preferably design, the application for production.

--

Rule #1: Implement a Secure Development Lifecycle in your organization.

This includes the following activities:

  • Train your developers, and testers in secure development and secure testing respectively
  • Establish a team of security experts to be the ‘go to’ group when people want advice on security
  • Implement Threat Modeling in your development process. If you do nothing else, do this!
  • Implement Automatic and Manual Code Reviews for your in-house written applications
  • Ensure you have ‘Right to Inspect’ clauses in your contracts with vendors and third parties that are producing software for you
  • Have your testers include basic security testing in their standard testing practices
  • Do deployment reviews and hardening exercises for your systems
  • Have an emergency response process in place and keep it updated

If you want some good information on doing this, email me and check out this link:
http://www.microsoft.com/sdl

Rule #2: Implement a centralized input validation system (CIVS) in your organization.

These CIVS systems are designed to perform common input validation on commonly accepted input values. Let’s face it, as much as we’d all like to believe that we are the only ones doing things like, registering users, or recording data from visitors it’s actually all the same thing.

When you receive data it will very likely be an integer, decimal, phone number, date, URI, email address, post code, or string. The values and formats of the first 7 of those are very predictable. The string’s are a bit harder to deal with but they can all be validated against known good values. Always remember to check for the three F’s; Form, Fit and Function.

  • Form: Is the data the right type of data that you expect? If you are expecting a quantity, is the data an integer? Always cast data to a strong type as soon as possible to help determine this.
  • Fit: Is the data the right length/size? Will the data fit in the buffer you allocated (including any trailing nulls if applicable). If you are expecting and Int32, or a Short, make sure you didn’t get an Int64 value. Did you get a positive integer for a quantity rather than a negative integer?
  • Function: Can the data you received be used for the purpose it was intended? If you receive a date, is the date value in the right range? If you received an integer to be used as an index, is it in the right range? If you received an int as a value for an Enum, does it match a legitimate Enum value?

In a vast majority of the cases, string data being sent to an application will be 0-9, a-z, A-Z. In some cases such as names or currencies you may want to allow –, $, % and ‘. You will almost never need , <> {} or [] unless you have a special use case such as http://www.regexlib.com in which case see Rule #3.

You want to build this as a centralized library so that all of the applications in your organization can use it. This means if you have to fix your phone number validator, everyone gets the fix. By the same token, you have to inspect and scrutinize the crap out of these CIVS to ensure that they are not prone to errors and vulnerabilities because everyone will be relying on it. But, applying heavy scrutiny to a centralized library is far better than having to apply that same scrutiny to every single input value of every single application.  You can be fairly confident that as long as they are using the CIVS, that they are doing the right thing.

Fortunately implementing a CIVS is easy if you start with the Enterprise Library Validation Application Block which is a free download from Microsoft that you can use in all of your applications.

Rule #3: Implement input/output encoding for all externally supplied values.

Due to the prevalence of cross site scripting vulnerabilities, you need to encode any values that came from an outside source that you may display back to the browser. (even embedded browsers in thick client applications). The encoding essentially takes potentially dangerous characters like < or > and converts them into their HTML, HTTP, or URL equivalents.

For example, if you were to HTTP encode <script>alert(‘XSS Bug’)</script> it would look like: &lt;script&gt;alert('XSS Bug')&lt;/script&gt;  A lot of this functionality is build into the .NET system. For example, the code to do the above looks like:

Server.HtmlEncode("<script>alert('XSS Bug')</script>");

However it is important to know that the Server.HTMLEncode only encodes about 4 of the nasty characters you might encounter. It’s better to use a more ‘industrial strength’ library like the Anti Cross Site Scripting library. Another free download from Microsoft. This library does a lot more encoding and will do HTTP and URI encoding based on a white list. The above encoding would look like this in AntiXSS

using Microsoft.Security.Application;
AntiXss.HtmlEncode("<script>alert('XSS Bug')</script>");

You can also run a neat test system that a friend of mine developed to test your application for XSS vulnerabilities in its outputs. It is aptly named XSS Attack Tool.

Rule #4: Abandon Dynamic SQL

There is no reason you should be using dynamic SQL in your applications anymore. If your database does not support parameterized stored procedures in one form or another, get a new database.

Dynamic SQL is when developers try to build a SQL query in code then submit it to the DB to be executed as a string rather than calling a stored procedures and feeding it the values. It usually looks something like this:

(for you VB fans)

dim sql
sql = "Select ArticleTitle, ArticleBody FROM Articles WHERE ArticleID = "
sql = sql & request.querystring("ArticleID")
set results = objConn.execute(sql)

In fact, this article from 2001 is chock full of what NOT to do. Including dynamic SQL in a stored procedure.

Here is an example of a stored procedure that is vulnerable to SQL Injection:

Create Procedure GenericTableSelect @TableName VarChar(100)
AS
Declare @SQL VarChar(1000)
SELECT @SQL = 'SELECT * FROM '
SELECT @SQL = @SQL + @TableName
Exec ( @SQL) GO

See this article for a look at using Parameterized Stored Procedures.

Rule #5: Properly architect your applications for scalability and failover

Applications can be brought down by a simple crash. Or a not so simple one. Architecting your applications so that they can scale easily, vertically or horizontally, and so that they are fault tolerant will give you a lot of breathing room.

Keep in mind that fault tolerant is not just a way to say that they restart when they crash. It means that you have a proper exception handling hierarchy built into the application.  It also means that the application needs to be able to handle situations that result in server failover. This is usually where session management comes in.

The best fault tolerant session management solution is to store session state in SQL Server.  This also helps avoid the server affinity issues some applications have.

You will also want a good load balancer up front. This will help distribute load evenly so that you won’t run into the failover scenario often hopefully.

And by all means do NOT do what they did on the site in the beginning of this article. Set up your routers and switches to properly shunt bad traffic or DOS traffic. Then let your applications handle the input filtering.

Rule #6: Always check the configuration of your production servers

Configuration mistakes are all too popular. When you consider that proper server hardening and standard out of the box deployments are probably a good secure default, there are a lot of people out there changing stuff that shouldn’t be. You may have remembered when Bing went down for about 45 minutes. That was due to configuration issues.

To help address this, we have released the Web Application Configuration Auditor (WACA). This is a free download that you can use on your servers to see if they are configured according to best practice. You can download it at this link.

You should establish a standard SOE for your web servers that is hardened and properly configured. Any variations to that SOE should be scrutinised and go through a very thorough change control process. Test them first before turning them loose on the production environment…please.

So with all that being said, you will be well on your way to stopping the majority of attacks you are likely to encounter on your web applications. Most of the attacks that occur are SQL Injection, XSS, and improper configuration issues. The above rules will knock out most of them. In fact, Input Validation is your best friend. Regardless of inspecting firewalls and things, the applications is the only link in the chain that can make an intelligent and informed decision on if the incoming data is actually legit or not. So put your effort where it will do you the most good.

Generic Implementation of INotifyPropertyChanged on ADO.NET Data Services (Astoria) Proxies with T4 Code Generation

IMG_4855 Last Week Mike Flasko from the ADO.NET Data Services (Astoria) Team blogged about what’s coming in V1.5 which will ship prior to VS 2010. I applaud these out of band releases.

One of the new features is support for two-way data binding in the client library generated proxy classes. These classes currently do not implement INotifyPropertyChanged events nor project into ObservableCollections out of the box.

Last week at the MVP Summit I had the chance to see a demo of this and other great things coming down the road from the broader Data Programmability Team. It seems like more and more teams are turning to T4 Templates for code generation which is great for our extensibility purposes. At first I was hopeful that the team had implemented these proxy generation changes via changing to T4 templates along with a corresponding “better” template.  Unfortunately, this is not the case and we won’t see any T4 templates in v1.5. It’s too bad – would it really have been that much more work to invest the time in implementing T4 templates than to add new switches to datasvcutil and new code generation (along with testing that code).

Anyway, after seeing some other great uses of T4 templates coming from product teams for VS 2010, I thought I would invest some of my own time to see if I couldn’t come up with a way of implementing INotifyPropertyChanged all on my own. The problem with the existing code gen is that while there are partial methods created and called for each property setter (i.e. FoobarChanged() ), there is no generic event fired that would allow us to in turn raise a InotifyPropertyChanged.PropertyChanged event. So you can manually added this for each and every property on every class – but it’s tedious.

I couldn’t have been the first person to think of doing this, and after a bit of googling, I confirmed that. Alexey Zakharov’s post on generating custom proxies with T4 has been completely ripped off, er, inspirational in this derivative work. What I didn’t like about Alexy’s solution was that it completely over wrote the proxy client. I would have preferred a solution that just implemented the partial methods in a partial class to fire the PropertyChanged event. This way, any changes, improvements, etc. to the core MS codegen can still be expected down the road. Of course, Alexey’s template is a better solution if there are indeed other things that you want to customize about the template in its entirely should you find that what you need to accomplish can’t be done with a partial class.

What I did like about Alexey’s solution is that it uses the service itself to query the service meta data directly. I had planned on using reflection to accomplish the same thing but in hindsight, that would be difficult to generate a partial class of a class I’m currently reflecting on in the same project (of course). Duh.

So what do you need to do to get this solution working?

  1. Add the MetadataHelper.tt file to the project where you have your reference/proxies to the data service. You will want to make sure there is no custom tool associated with this file – it’s just included as a reference in the next one. This file wraps up all the calls to get the meta data I’ve made a couple of small changes to Alexey’s -- Added support for Byte and Boolean (typo in AZ’s).
  2. Copy the DataServiceProxy.tt file to the same project. If you have more than one data service, you’ll want one of these files for each reference. So for starters you may want to rename it accordingly. You are going to need to edit this bad boy as well.
  3. There are two options you’ll need to specify inside of the proxy template. The MetadataUri should be the uri to your service suffixed with $metadata. I’ve found that if your service is secured with integrated authentication, then the the metadata helper won’t pass those credentials along so for the purposes of code generation you’d best leave anonymous access on. Secondly is the Namespace. You will want to use the same namespace used by your service reference. You might have to do a Show All Files and drill into the Reference.cs file to see exactly what that is. 
  4. var options = new {
        MetadataUri = "http://localhost/ObjectSharpSample.Service/SampleDataService.svc/$metadata",
        Namespace = "ObjectSharp.SampleApplication.ServiceClient.DataServiceReference"
        };

That’s it. When you save your file, should everything work, you’ll have a .cs file generate that implements through a partial class an INotifyProxyChanged interface. Something like…..

public partial class Address : INotifyPropertyChanged
{
    public event PropertyChangedEventHandler PropertyChanged;

    private void OnPropertyChanged(string property)
    {
        var handler = PropertyChanged;
        if (handler != null)
        {
            handler(this, new PropertyChangedEventArgs(property));
        }
    }

    partial void OnAddressIdChanged()
    {
        OnPropertyChanged("AddressId");
    }
    partial void OnAddressLine1Changed()
    {
        OnPropertyChanged("AddressLine1");
    }
}

Visual Studio 2008 SP1 Beta &amp; SQL Server 2008

A quick heads up to let you know that VS 2008 Service Pack 1 is now available (links below). It typically takes a couple of months from this point before we'll see a final release.

This Service Pack includes new cool feature:

One interesting point is that MS is going to simultaneously ship SQL Server 2008 which actually has a hard dependency on SP1.

I thought I’d take a moment to highlight some new features that Dev’s would care about in SQL Server 2008.

  • Change Data Capture: Async “triggers” capture the before/after snapshot of row level changes and writes them to Change Tables that you can query in your app. They aren’t real triggers as this asynchronously reads the transaction log.
  • Granular control of encryption, right through to the database level without any application changes required.
  • Resource Governor – very helpful when you allow users to write adhoc queries / reports against your OLTP database. Allows a DBA to assert resource limits & priorities.
  • Plan Freezing – allows you to lock down query plans to promote stable query plans across disparate hardware, server upgrades, etc.
  • New Date, and Time data types, no longer just DateTime types that you have to manually parse out the time or date to just get the real data you want.
  • DataTimeOffset – is a time zone aware datetime.
  • Table Value Parameters to procs – ever want to pass a result set as an arg to a proc?
  • Hierarchy ID is a new system type for storing nodes in a hierarchy….implemented as a CLR User Defined Type.
  • FileStream Data type allows blobish data to be surfaced in the database, but physically stored on the NTFS file system. ….but with complete transactional consistency with the relational data and backup integration.
  • New Geographic data support, store spatial data such as polygons, points and lines, and long/lat data types.
  • Merge SQL statement allows you to insert, or update if a row already exists.
  • New reporting services features such as access to reports from within Word & Excel, better SharePoint integration

Personally, haven't spent any time with SQL Server 2008 but that's a great set of new features that I can hardly wait to start using in real-world applications.

Downloads

· VS 2008 SP1 : http://download.microsoft.com/download/7/3/8/7382EA08-4DD6-4134-9B92-8585A5B07973/VS90sp1-KB945140-ENU.exe

· .NET 3.5 SP1 : http://download.microsoft.com/download/8/f/c/8fc1fe13-55de-4bf5-b43e-375daf01452e/dotNetFx35setup.exe

· Express 2008 with SP1:

o http://download.microsoft.com/download/F/E/7/FE754BA4-140B-413C-933F-8D35FB150F12/vbsetup.exe

o http://download.microsoft.com/download/F/E/7/FE754BA4-140B-413C-933F-8D35FB150F12/vcsetup.exe

o http://download.microsoft.com/download/F/E/7/FE754BA4-140B-413C-933F-8D35FB150F12/vcssetup.exe

o http://download.microsoft.com/download/F/E/7/FE754BA4-140B-413C-933F-8D35FB150F12/vnssetup.exe

· TFS 2008 SP1: http://download.microsoft.com/download/a/e/2/ae2eb0ff-e687-4221-9c3e-9165a942bc1c/TFS90sp1-KB949786.exe

Feedback Forum: http://go.microsoft.com/fwlink/?LinkId=119125

 

The Entity Framework vs. The Data Access Layer (Part 1: The EF as a DAL)

In Part 0: Introduction of this series after asking the question "Does the Entity Framework replace the need for a Data Access Layer?", I waxed lengthy about the qualities of a good data access layer. Since that time I've received a quite a few emails with people interested in this topic. So without further adieu, let's get down to the question at hand.

So let's say you go ahead and create an Entity Definition model (*.edmx) in Visual Studio and have the designer generate for you a derived ObjectContext class and an entity class for each of your tables, derived from EntityObject. This one to one table mapping to entity class is quite similar to LINQ to SQL but the mapping capabilities move well beyond this to support advanced data models. This is at the heart of why the EF exists: Complex Types, Inheritance (Table per Type, Table per Inheritance Hierarchy), Multiple Entity Sets per Type, Single Entity Mapped to Two Tables, Entity Sets mapped to Stored Procedures or mapping to a hand-crafted query, expressed as either SQL or Entity SQL. EF has a good story for a conceptual model over top of our physical databases using Xml Magic in the form of the edmx file - and that's why it exists.

So to use the Entity Framework as your data access layer, define your model and then let the EdmGen.exe tool do it's thing to the edmx file at compile time and we get the csdl, ssdl, and msl files - plus the all important code generated entity classes. So using this pattern of usage for the Entity Framework, our data access layer is complete. It may not be the best option for you, so let's explore the qualities of this solution.

To be clear, the assumption here is that our data access layer in this situation is the full EF Stack: ADO.NET Entity Client, ADO.NET Object Services, LINQ to Entities, including our model (edmx, csdl, ssdl, msl) and the code generated entities and object context. Somewhere under the covers there is also the ADO.NET Provider (SqlClient, OracleClient, etc.)

image

To use the EF as our DAL, we would simply execute code similar to this in our business layer.

var db = new AdventureWorksEntities();
var activeCategories = from category in db.ProductCategory
                 where category.Inactive != true
                 orderby
category.Name
                 select category;

How Do "EF" Entities Fit In?

If you're following along, you're probably asking exactly where is this query code above being placed. For the purposes of our discussion, "business layer" could mean a business object or some sort of controller. The point to be made here is that we need to think of Entities as something entirely different from our Business Objects.

Entity != Business Object

In this model, it is up to the business object to ask the Data Access Layer to project entities, not business objects, but entities.

This is one design pattern for data access, but it is not the only one. A conventional business object that contains its own data, and does not separate that out into an entity can suffer from tight bi-directional coupling between the business and data access layer. Consider a Customer business object with a Load method. Customer.Load() would in turn instantiate a data access component, CustomerDac and call the CustomerDac's Load or Fill method. To encapsulate all the data access code to populate a customer business object, the CustomerDac.Load method would require knowledge of the structure the Customer business object and hence a circular dependency would ensue.

The workaround, if you can call it that, is to put the business layer and the data access layer in the same assembly - but there goes decoupling, unit testing and separation of concerns out the window.

Another approach is to invert the dependency. The business layer would contain data access interfaces only, and the data access layer would implement those interfaces, and hence have a reverse dependency on the business layer. Concrete data access objects are instantiated via a factory, often combined with configuration information used by an Inversion of Control container. Unfortunately, this is not all that easy to do with the EF generated ObjectContext & Entities.

Or, you do as the Entity Framework implies and separate entities from your business objects. If you've used typed DataSets in the past, this will seem familiar you to you. Substitute ObjectContext for SqlConnection and SqlDataAdapter, and the pattern is pretty much the same.

Your UI presentation layer is likely going to bind to your Entity classes as well. This is an important consideration. The generated Entity classes are partial classes and can be extended with your own code. The generated properties (columns) on an entity also have event handlers created for changing and changed events so you can also wire those up to perform some column level validation. Notwithstanding, you may want to limit your entity customizations to simple validation and keep the serious business logic in your business objects. One of these days, I'll do another blog series on handing data validation within the Entity Framework.

How does this solution stack up?

How are database connections managed?

thumbs up Using the Entity Framework natively itself, the ObjectContext takes care of opening & closing connections for you - as needed when queries are executed, and during a call to SaveChanges. You can get access to the native ADO.NET connection if need be to share a connection with other non-EF data access logic. The nice thing however is that, for the most part, connection strings and connection management are abstracted away from the developer.

thumbs down A word of caution however. Because the ObjectContext will create a native connection, you should not wait to let the garbage collector free that connection up, but rather ensure that you dispose of the ObjectContext either explicitly or with a using statement.

Are all SQL Queries centralized in the Data Access Layer?

thumbs down By default the Entity Framework dynamically generates store specific SQL on the fly and therefore, the queries are not statically located in any one central location. Even to understand the possible queries, you'd have to walk through all of your business code that hits the entity framework to understand all of the potential queries.

But why would you care? If you have to ask that question, then you don't care. But if you're a DBA, charged with the job of optimizing queries, making sure that your tables have the appropriate indices, then you want to go to one central place to see all these queries and tune them if necessary. If you care strongly enough about this, and you have the potential of other applications (perhaps written in other platforms), then you likely have already locked down the database so the only access is via Stored Procedures and hence the problem is already solved.

Let's remind ourselves that sprocs are not innately faster than dynamic SQL, however they are easier to tune and you also have the freedom of using T-SQL and temp tables to do some pre-processing of data prior to projecting results - which sometimes can be the fastest way to generate some complex results. More importantly, you can revoke all permissions to the underlying tables and only grant access to the data via Stored Procedures. Locking down a database with stored procedures is almost a necessity if your database is oriented as a service, acting as an integration layer between multiple client applications. If you have multiple applications hitting the same database, and you don't use stored procedures - you likely have bigger problems. 

In the end, this is not an insurmountable problem. If you are already using Stored Procedures, then by all means you can map those in your EDM. This seems like the best approach, but you could also embed SQL Server (or other provider) queries in your SSDL using a DefiningQuery.

Do changes in one part of the system affect others?

It's difficult to answer this question without talking about the possible changes.

thumbs up Schema Changes: The conceptual model and the mapping flexibility, even under complex scenarios is a strength of the entity framework. Compared to other technologies on the market, with the EF, your chances are as good as they're going to get that a change in the database schema will have minimal impact on your entity model, and vice versa.

thumbs up Database Provider Changes: The Entity Framework is database agnostic. It's provider model allows for easily changing from SQL Server, to Oracle, to My Sql, etc. via connection strings. This is very helpful for ISVs whose product must support running on multiple back-end databases.

thumbs down Persistence Ignorance: What if the change you want in one part of the system is to change your ORM technology? Maybe you don't want to persist to a database, but instead call a CRUD web service. In this pure model, you won't be happy. Both your Entities and your DataContext object inherit from base classes in the Entity Framework's System.Data.Objects namespace. By making references to these, littered throughout your business layer, decoupling yourself from the Entity Framework will not be an easy task.

thumbs down Unit Testing: This is only loosely related to the question, but you can't talk about PI without talking about Unit Testing. Because the generated entities do not support the use of Plain Old CLR Objects (POCO), this data access model is not easily mocked for unit testing.

Does the DAL simplify data access?

thumbs up Dramatically. Compared to classic ADO.NET, LINQ queries can be used for typed results & parameters, complete with intelli-sense against your conceptual model, with no worries about SQL injection attacks.

thumbs up As a bonus, what you do get is query composition across your domain model. Usually version 1.0 of a convention non-ORM data access layer provides components for each entity, each supporting crud behaviour. Consider a scenario where you need to show all of the Customers within a territory, and then you need to show the last 10 orders for each Customer. Now I'm not saying you'd do this, but what I've commonly seen is that while somebody might write a CustomerDac.GetCustomersByTerritory() method, and they might write an OrderDac.GetLastTenOrders(), they would almost never write a OrderDac.GetLastTenOrdersForCustomersInTerritory() method. Instead they would simply iterate over the collection of customers found by territory and call the GetLastTenOrders() over and over again. Obviously this is "good" resuse of the data access logic, however it does not perform very well.

Fortunately, through query composition and eager loading, we can cause the Entity Framework (or even LINQ to SQL) to use a nested subquery to bring back the last 10 orders for each customer in a given territory in a single round trip, single query. Wow! In a conventional data access layer you could, and should write a new method to do the same, but by writing yet another query on the order table, you'd be repeating the mapping between the table and your objects each time.

Layers, Schmayers: What about tiers?

thumbs down EDM generated entity classes are not very tier-friendly. The state of an entity, whether it is modified, new, or to be delete, and what columns have changed is managed by the ObjectContext. Once you take an entity and serialize it out of process to another tier, it is no longer tracked for updates. While you can re-attach an entity that was serialized back into the data access tier, because the entity itself does not serialize it's changed state (aka diff gram), you can not easily achieve full round trip updating in a distributed system. There are techniques for dealing with this, but it is going to add some plumbing code between the business logic and the EF...and make you wish you had a real data access layer, or something like Danny Simmons' EntityBag (or a DataSet).

Does the Data Access Layer support optimistic concurrency?

thumbs up Out of the box, yes, handily. Thanks to the ObjectContext tracking state, and the change tracking events injected into our code generated entity properties. However, keep in mind the caveat with distributed systems that you'll have more work to do if your UI is separated from your data access layer by one or more tiers.

How does the Data Access Layer support transactions?

thumbs up Because the Entity Framework builds on top of ADO.NET providers, transaction management doesn't change very much. A single call to ObjectContext.SaveChanges() will open a connection, perform all inserts, updates, and deletes across all entities that have changed, across all relationships and all in the correct order....and as you can imagine in a single transaction. To make transactions more granular than that, call SaveChanges more frequently or have multiple ObjectContext instances for each unit of work in progress. To broaden the scope of a transaction, you can manually enlist using a native ADO.NET provider transaction or by using System.Transactions.

Entity Framework Links for April, 2008

  • During the past month, Danny Simmons let us all officially know that SP1 of VS 2008/.NET Framework 3.5 will be the delivery mechanism for the Entity Framework and the Designer, and that we should see a beta of the entire SP1 very soon as well. No release dates yet.
  • Speaking of the next beta, there have been some improvements in the designer to support iterative development. Noam Ben-Ami talks about that here.
  • There is also a new ASP.NET EntityDataSource control coming in the next beta. Danny demo'd that at DevConnections, and Julie blogged about it here.
  • In April, Microsoft released the .NET 3.5 Enhancements Training Kit. This includes some preliminary labs on ASP.NET MVC, ASP.NET Dynamic Data, ASP.NET AJAX History, ASP.NET Silverlight controls, ADO.NET Data Services and last but certainly not least, the ADO.NET Entity Framework. Stay tuned for updates
  • Julie Lerman has created a spiffy pseudo-debug visualizer for Entity State. It's implemented as an extension method and not a true debug visualizer, but useful just the same.
  • Check out Ruurd Boeke's excellent post on Disconnected N-Tier objects using the Entity Framework. His sample solution is checked in to the EFContrib Project and he demonstrates using POCO classes, in his words "as persistence ignorant as I can get", serializing entities with no EF references on the clients, yet not losing full change tracking on the client - and using the same domain classes on the client and the server (one could argue this last point as being not being a desirable goal - but it does have it's place).

The Entity Framework vs. The Data Access Layer (Part 0: Introduction)

So the million dollar question is: Does the Entity Framework replace the need for a Data Access Layer? If not, what should my Data Access Layer look like if I want to take advantage of the Entity Framework? In this multi-part series, I hope to explore my thoughts on this question. I don't think there is a single correct answer. Architecture is about trade offs and the choices you make will be based on your needs and context.

In this first post, I first provide some background on the notion of a Data Access Layer as a frame of reference, and specifically, identify the key goals and objectives of a Data Access Layer.

While Martin Fowler didn't invent the pattern of layering in enterprise applications, his Patterns of Enterprise Application Architecture is a must read on the topic. Our goals for a layered design (which may often need to be traded off against each other) should include:

  • Changes to one part or layer of the system should have minimal impact on other layers of the system. This reduces the maintenance involved in unit testing, debugging, and fixing bugs and in general makes the architecture more flexible.
  • Separation of concerns between user interface, business logic, and persistence (typically in a database) also increases flexibility, maintainability and reusability.
  • Individual components should be cohesive and unrelated components should be loosely coupled. This should allow layers to be developed and maintained independently of each other using well-defined interfaces.

Now to be clear, I'm talking about a layer, not a tier. A tier is a node in a distributed system, of which may include one or more layers. But when I refer to a layer, I'm referring only to the logical separation of code that serves a single concern such as data access. It may or may not be deployed into a separate tier from the other layers of a system. We could then begin to fly off on tangential discussions of distributed systems and service oriented architecture, but I will do my best to keep this discussion focused on the notion of a layer. There are several layered application architectures, but almost all of them in some way include the notion of a Data Access Layer (DAL). The design of the DAL will be influenced should the application architecture include the distribution of the DAL into a separate tier.

In addition to the goals of any layer mentioned above, there are some design elements specific to a Data Access Layer common to the many layered architectures:

  • A DAL in our software provides simplified access to data that is stored in some persisted fashion, typically a relational database. The DAL is utilized by other components of our software so those other areas of our software do not have to be overly concerned with the complexities of that data store.
  • In object or component oriented systems, the DAL typically will populate objects, converting rows and their columns/fields into objects and their properties/attributes. this allows the rest of the software to work with data in an abstraction that is most suitable to it.
  • A common purpose of the DAL is to provide a translation between the structure or schema of the store and the desired abstraction in our software. As is often the case, the schema of a relational database is optimized for performance and data integrity (i.e. 3rd normal form) but this structure does not always lend itself well to the conceptual view of the real world or the way a developer may want to work with the data in an application. A DAL should serve as a central place for mapping between these domains such as to increase the maintainability of the software and provide an isolation between changes  in the storage schema and/or the domain of the application software. This may include the marshalling or coercing of differing data types between the store and the application software.
  • Another frequent purpose of the DAL is to provide independence between the application logic and the storage system itself such that if required, the storage engine itself could be switched with an alternative with minimal impact to the application layer. This is a common scenario for commercial software products that must work with different vendors' database engines (i.e. MS SQL Server, IBM DB/2, Oracle, etc.). With this requirement, sometimes alternate DAL's are created for each store that can be swapped out easily.  This is commonly referred to as Persistence Ignorance.

Getting a little more concrete, there are a host of other issues that also need to be considered in the implementation of a DAL:

  • How will database connections be handled? How will there lifetime be managed? A DAL will have to consider the security model. Will individual users connect to the database using their own credentials? This maybe fine in a client-server architecture where the number of users is small. It may even be desirable in those situations where there is business logic and security enforced in the database itself through the use of stored procedures, triggers, etc. It may however run incongruent to the scalability requirements of a public facing web application with thousands of users. In these cases, a connection pool may be the desired approach.
  • How will database transactions be handled? Will there be explicit database transactions managed by the data access layer or will automatic or implied transaction management systems such as COM+ Automatic Transactions, the Distributed Transaction Coordinator be used?
  • How will concurrent access to data be managed? Most modern application architecture's will rely on an optimistic concurrency  to improve scalability. Will it be the DAL's job to manage the original state of a row in this case? Can we take advantage of SQL Server's row version timestamp column or do we need to track every single column?
  • Will we be using dynamic SQL or stored procedures to communicate with our database?

As you can see, there is much to consider just in generic terms, well before we start looking at specific business scenarios and the wacky database schemas that are in the wild. All of these things can and should influence the design of your data access layer and the technology you use to implement it. In terms of .NET, the Entity Framework is just one data access technology. MS has been so kind to bless us with many others such as Linq To SQL, DataReaders, DataAdapters & DataSets, and SQL XML. In addition, there are over 30 3rd party Object Relational Mapping tools available to choose from.

Ok, so if you're  not familiar with the design goals of the Entity Framework (EF) you can read all about it here or watch a video interview on channel 9, with Pablo Castro, Britt Johnson, and Michael Pizzo. A year after that interview, they did a follow up interview here.

In the next post, I'll explore the idea of the Entity Framework replacing my data access layer and evaluate how this choice rates against the various objectives above. I'll then continue to explore alternative implementations for a DAL using the Entity Framework.

Tech-Ed 2008 U.S. Developers Session Catalog Online

website_banner_dev

If you're thinking of heading down to Orlando June 3-6, 2008 - you can check out the list of sessions which has been recently posted here.

You also have the option of rating sessions that you might be interested in. I assume this is to help plan room sizes.

I'll be doing a breakout session called "Building Next Generation Data Access Layers with the ADO.NET Entity Framework" in the Architecture Track. I'm really looking forward to it. Hope to see you there.

Visual Studio 2008 and Windows Server 2008 for aspiring Architects

imageAre you an architect or an aspiring architect interested in learning what's new this year for the MS platform?

On February 11th, 2008 come to the MS Canada Office in Mississauga and visit yours truly and some other dazzling speakers to learn more about Visual Studio 2008 and Windows Server 2008.

You can choose either the morning or afternoon session as your schedule permits. Only a 1/2 day out of your busy schedule and you'll know everything you need! Ok, well maybe not everything, but I hope that you'll be inspired to take the next steps in learning about technologies such as LINQ, Windows Communication Foundation and Windows Presentation Foundation and how you can make the most of these technologies in your applications.

Register Here.