thoughts on coding and everything in between
Tag Archives: xml
I received a request a few weeks ago for some data that we only have stored in an XML field in one of our tables. The reason for storing the data as XML is not as important as how we were going to go about digging through the data to provide a tabular representation of the relevant pieces to satisfy the request. It took quite a while for me to catch on and get the data I needed that was buried several levels deep in the XML which spurred me determinedly to have a better understanding of how to accomplish similar requests in the future, hence this post.
The first thing I needed to do was get a handle on the XML itself. As with any complex task, it is usually best to first break it down into the smallest piece consumable and understand that first so that you have a solid starting point. So the first thing I did was to grab just one row’s XML string stored in my table and play with that first. I set it equal to an XML variable so that I could easily query the nodes in it and then just wrote a simple query that selected one value from the XML text.
One very important thing to note is that in order to get at the Customer’s Id buried several nodes deep, my “table” being queried has to include the nodes leading up to the Customer node. In other words, I couldn’t just assume that the query language understands that “CommissionedFor” is my root node – I have to tell it that. What results do you get when you leave out the root node?
Unfortunately you don’t get a lot of help from the sql compiler to figure out what could be causing this utter lack of data so the most important thing I gleaned from playing around with XQuery is to know the XML you are querying! The more child nodes that are present the more complex the query could potentially become and the greater the need will be to know where the root is and where each sub-node begins & ends.
Another reason that its good to have solid knowledge of the XML you are querying is that you need to know if you have any XML namespaces defined because if you do, that can potentially change the entire way your query works and again you might be left with a big fat NULL if you don’t have your namespaces identified correctly. The XML I was attempting to query actually had two namespaces: a default namespace created when the object was serialized and our own namespace. Working with one namespace is tricky enough, let alone multiple namespaces (http://beyondrelational.com/modules/2/blogs/28/posts/10318/xquery-lab-37-working-with-multiple-namespaces.aspx)
After much searching and finally figuring out I had to mess with namespaces my simple query got a little more complex:
I finally got my “simple” query working and now it was time to plug in the actual table where this information was stored so that I could aggregate the XML data across a range:
There were a few “gotchas” here for me:
- Of course the fact that there were two namespaces as part of my root node from the get-go. Everything in the query is important, so be purposeful about which namespace you use as the DEFAULT (if any). I actually found having a default namespace to be easier for my purposes than aliasing each namespace.
- There is a semicolon at the end of the declared XML string! Without the semicolon in this particular case I get the error message “Incorrect syntax near the keyword ‘with’. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.“
- In order to create a unique row for each Id I had to learn the use of CROSS APPLY which at the most basic level is just joining the “columns” in your XML to columns in your table. The equivalent in TSQL is INNER JOIN. The equivalent to a TSQL LEFT JOIN is OUTER APPLY.
The title of this blog post indicated a “beginning” and I think I have dived deeper than just a beginning, but hopefully I have kept things relatively simple enough that some of the key needs for querying XML have been demonstrated without being too overwhelming.