Monday, March 17, 2008

Works for Me: Purging abandoned files in SharePoint 2007

Hah.

So regarding the problems I was facing in the previous post, I think I have a solution -- and all I had to do was slack off playing Super Smash Bros Brawl over the weekend which, I was amazed and delighted to discover, includes a fairly extensive platformer game featuring co-op multiplayer and some astoundingly good (and really dang funny) cutscenes.

But you're probably here for the solution. Okay, so where we started was with "get the document library by name as a list from the web from the site, the files are items in the list" and that wasn't working for items that had never been checked in and didn't belong to the person running the process. Well it turns out that there's a SPDocumentLibrary type that has a CheckedOutFiles property that returns an IList of SPCheckedOutFiles and you can see the files that you might be looking for in there. That could look something like this:

SPDocumentLibrary srcLibrary = (SPDocumentLibrary)srcList;
List<SPCheckedOutFile> suspectFiles = srcLibrary.CheckedOutFiles as List
<SPCheckedOutFile>;


Now you might be a bit confounded about the srcList being cast over to an SPDocumentLibrary and so might your runtime, so checking compatibility on that (ahead of time!) is probably a good idea. That line looks like this:

SPList srcList = site.Lists[_listName];
if (srcList.BaseType != SPBaseType.DocumentLibrary) { throw new Exception(); }


So now we've got our Document Library and our list of Checked Out Files which does include all of the files that our careless users have abandoned and also does include all of the checked out files that our careful users are editing at the moment. So to make sure that we purge only the right files, we're going to compare our Checked Out Files to the list of files we can see in srcList.Items. It's not wildly performant, but if we toss the names of all of the valid files into a list declared as such:

List<string> validItemNames = new List<string>();

They'll all be there when we need them. No, it's not great, it's what I did while I was sitting in a meeting this morning. That said, the performance isn't going to be awful because the goal is to get the valid files (which are the overwhelming majority) into a list so we can compare against them with our abandoned files (better darned well be the overwhelming minority) to see if it's time to delete our abandoned files.

Now I have to admit that our implementation made my job easy -- we can't have two files with the same name, so I'm able to compare FileLeafRef (from the Items) to LeafName (from the SPCheckedOutFile) to determine whether or not the file has never been checked in. Remember, if it's not in the items but has been checked out, then it's never been checked in and we can delete it. I think. That said, there are ID fields on both the Items and the SPCheckedOutFile and I suspect that they will also work for this comparison.

There's one other bit of performance we can improve upon against that list of strings, and that is to have our purgeable date threshhold checked (against our SPCheckedOutFile's TimeLastModified) before we go and -- or rather, && -- check to see if the file has a check-in history visible to the world. So that'll help a bit too. And we could've skipped the whole thing if the library's CheckedOutFiles returned null or 0 results. So there's lots of minor performance enhancers that require no algorithmic effort here; just following an appropriate order of operations.

The one bizarre behavior of this comparison is that the Items list will retrieve any content that the current user has created (which is how we missed that it wouldn't pick up stuff that had never been checked in in the first place) with the net result of that being that if you run this kind of tool as an actual user, that user won't be able to purge their own checked-out files because they'll be able to see their files on the valid file list. Whee.

That's how I spent my Monday, how was yours?

Update: And how I spent my Tuesday, excepting a very long meeting which continually looks ahead to the work we won't get done as opposed to addressing the work we should be doing instead of looking ahead, was discovering that this process described above tends to fail. It tends to fail because apparently the SPCheckedOutFile's Delete() method will always fail if it's called by anybody who isn't the file's owner. Even a site administrator that can take the file over in the blink of an eye. Yeah. They get a big fat exception, too. So the net effect is that if you're going to delete somebody else's checked-out file (with no check-in history on it at all), then you're going to need to take it over like this:

myFile.TakeOverCheckOut();
myFile.Delete();

No comments: