Floating down the FILESTREAM...

The old CCR lyrics popped into my head "Rollin' down the river..." and then of course I realized a river isn't a stream, nor is it any way related to data streams, and I lost my concentration for 20 minutes-ish.  What does this have to do with FILESTREAMS?  Nothing really.
 
Let's talk a little about Filestreams in SQL Server 2008.  So, what is FILESTREAM?  According to Microsoft, FILESTREAM is a technology "which allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system".  Thats all fine and dandy, but what does it really mean?  FILESTREAM, very eloquently put, stores large BLOBs of data on the file system.
 
In previous versions of SQL Server, you didn't have all that much choice when it came to working with large BLOB data.  Your options were three-fold.
 
1. Store it in the database.
2. Store it on the file system.
3. Store it in a BLOB store.
 
Storing the data in the database was tricky.  If the data was small in size, it worked out fairly well.  But in instances where you had to store large pieces of data, such as videos, or audio files, keeping the data within the database was a major blow for performance. 
 
So what is the next best option?  Store the data on the file system, and leave a pointer in the database.  You didn't have performance issues, and data was streamed directly from the file system.  However, you did have problems with data consistency.  You didn't benefit from the transactional properties of SQL Server.  Backup's were a pain, and managing the streaming was tricky from a programmatic perspective.
 
Next up, and the final option was a BLOB store.  Basically, it works much like the filesystem approach, except it was more manageable, and had transactional characteristics due to a tie in with the database.  An example is the EMC Centera.  Drawback: Fricken EXPENSIVE.  Worked great though... 
 
 Now onto SQL Server 2008.  FILESTREAM works very much like option number 2, has transactional benefits like option number 1, and is cheaper than option number 3.  How does it do it?  It's actually pretty simple.  A column of VARBINARY(MAX) is created in a table, with an AS FILESTREAM attribute.  Once data is stored within the column, a piece of metadata tells SQL Server to write the file to disk, and stick a pointer in the column.
 
So far that takes care of replacing option number 2.  So how is transactional consistency dealt with?  Easy.  NTFS as a file system is transactional by nature, so FILESTREAM takes full advantage.  Backing up is dealt with the same way as all databases are, and is done transparently.  That takes care of option number 1.
 
Now comes the issue of price.  Well, it turns out FILESTREAM is available for free.  It is built in, starting with SQL Express.  That's all I have to say about that, which takes care of option number 3.
 
That in a nut shell is SQL Server FILESTREAM.  My next post will be on how to actually implement FILESTREAM on a 2008 database.