RSS Feeds and DataSets

So as I write this at 1:10 in the morning, I have just finished beating my head against a fairly hard wall.  The task was to load an RSS Feed into a DataSet.  Then once in the DataSet, the feed could be bound to a Datagrid, a DataRepeater, etc.  The plan was simple.  The strategy sound. Or so I thought.

"No plan survives contact with the enemy." -Field Marshal Helmuth von Moltke.

While I suspect that many of you are aware of the quote, it´s place in the process of strategic is frequently forgotten.  The quote is intended to remind strategists to set broad objectives and seize unexpected opportunities when they arise. Of course, in this instance, I´m using the more pessimistic “stuff happens“ meaning that is more commonly ascribed to the quote.

So I naively start the process by creating a DataSet object and using the ReadXml method to build it.  Due to the foresight of the .NET Framework designers, I can simply pass the URL to the RSS Feed as a parameter.  Sweet.

But then I run into a snag. Or, to put it another way, Moltke proved his prescience. Within ReadXml, a DuplicateNameException is thrown. The specific message was  “A column named 'comments' already belongs to this DataTable.“ After an examination of the RSS feed, I discovered that the Slash RSS module uses a Comments tag. In the RSS feed, it is properly namespaced, but the namespace is not recognized by ReadXml.  So the Comments tag in RSS clashes with the Comments tag in Slash RSS.

The solution to this problem is not as clean as I would have liked.  In my ideal world, there would be a way to limit the namespaces that are loaded into the DataSet.  Perhaps using the XmlNamespaceManager, for instance.  But for all the Googling that I did, there doesn't appear to be any solution down this alley. So instead I turned to a kludgey (from my perspective) method that involves transforming the RSS feed using XSL.

My next problem immediately followed this solution.  When I processed the transformed RSS feed, another exception was thrown.  This time it was a ArgumentException that read "The same table (p) cannot be the child table in two nested relations.". Back to the RSS feed we went.  What we saw was that if the post was XHTML compliant, then a separate <Body> block was contained within the post.  In this <Body> block was the post complete with the various markup tags (like <p>) intact.  This differs from the normal format of the post, where the angle brackets surrounding the tag are converted the &lt; and &gt;. The result was that ReadXml was choking on the fact that <P> existed in to separate items.

Back to the XSL, where I excluded the <Body> block from the RSS feed.  Now my XSL file looks like the following:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"   
     xmlns
:slash="http://purl.org/rss/1.0/modules/slash/" 
     xmlns
:xhtml="http://www.w3.org/1999/xhtml" >

    <xsl:template match="*|@*|comment()|text()">
       
<xsl:copy>
           
<xsl:apply-templates select="*|@*|comment()|text()"/>
       
</xsl:copy>
    </
xsl:template>

    <xsl:template match="slash:comments">
        <SlashComments>
            <
xsl:apply-templates select="*|@*|comment()|text()"/>
        </
SlashComments>
    </
xsl:template>

    <xsl:template match="xhtml:body"></xsl:template>

</xsl:stylesheet>

Mission accomplished.  I now have a DataSet that contains the necessary information from an RSS Feed.  Next up is to actually bind it to the desired control.  Haven't gotten around to it yet, but I'm hoping that it is much easier.  It certainly shouldn't be much more challenging.