Thursday, August 21, 2008

Breaking SharePoint: Lists.GetListItems via XML Corruption

So inconvienently timed after a production release, we started getting errors on all of our XML-running processes -- from the RSS builder to a search-appliance index file. The index file was thankfully generated by a batch job and was logging out an error to the effect of "hexadecimal value 0x0B, is an invalid character" from, astoundingly, a call to the Lists web service (against GetListItems, which returns an XmlNode).

I am utterly amazed to report that it wasn't lying.

What happened was a user uploaded a document with a "vertical tab" -- that's ASCII #11, or 0x0B, in the title and every request we made of MOSS resulted in that vertical tab going into the XML which is something that System.Xml apparently just can't handle. I can't really blame System.Xml on that count. But I have no qualms about blaming SharePoint -- it takes the metadata, ignores the known-XML-invalid characters, saves them to a database that we really just need to stay the heck away from, and then kills any XML-based request (including its own) for that data, usually without clear reason because data is requested in batches and any property corruption will result in this kind of obtuseness being the only thing the Black Box spits out of its foul heart.

So there are two lessons here: First, don't trust SharePoint to validate your users' data. And therefore, second, don't discount the possibility that SharePoint will have functions and functionality that seems to spontaneously choke and die based on user-input data.

I honestly cannot believe the amount of money we've effectively burned trying to use SharePoint because it is a vendor-supplied platform as opposed to just having a couple of senior devigners build something from the ground up. (But I freely admit that this is following on the heels of searches for something along the lines of "IT+Fortune500" not returning the data because the search engine was treating + as a word breaker and then treating IT as "it" which is just a noise word to be discarded, so the search was really just looking for Fortune500 and thus returning stuff like "Fortune500 Rainbows" and "Bunnies and Fortune500 Flowers" and, more importantly, working as intended by the vendor.)

1 comment:

Think's World said...

We stumpled upon the same problem in our testing and live environment. Although the special character in our case was a soft return and it was not placed in the title of the document but in a part of the content which appeared in the summary of the search result.