Situation

You’re looking for a way to semi-automate the back-loading of of scrubbed data from your production environment to your development/test/staging/whatever environments. There are several reasons you’d want to do this that I don’t need to get into here. The problem is that your production environment is well-used and full of a bunch of data, both personal and proprietary. This data must therefore be scrubbed to shrink its size and remove anything that shouldn’t be in a non-production environment.

Solution

Build an AI-First Enterprise

From early pilots to enterprise-wide deployment, our award-winning AI consulting and technical services help you build the right foundation, scale responsibly, and deliver meaningful business outcomes.

Learn More

You need a site collection scrubber that will rip out and dump the data you don’t need or don’t want to keep. Fortunately, this step is relatively simple, there’s just a ton of looping. You need to loop through each item in each list/library in each web in the site collection and match either the file or the list item against a set of heuristics that helps identify what should stay and what should go. These heuristics could even allow you to change data that you want to keep but that has personally identifiable data.
To the code:

public static class Program
{
    private static Regex exp; //Matches target elements
    private static string siteName; //Path to the site
    private static bool remove = false; //Whether to remove elements or not

    public static void Main(string[] args)
    {
        using (SPSite site = new SPSite(siteName))
        {
            foreach (SPList list in site.RootWeb.Lists)
            {
                TryProcessList(list);
            }

            try
            {
                foreach (SPWeb web in site.AllWebs)
                {
                    foreach (SPWeb subWeb in web.Webs)
                    {
                        ProcessSubWeb(subWeb);
                    }

                    foreach (SPList list in web.Lists)
                    {
                        TryProcessList(list);
                    }

                    try
                    {
                        site.RecycleBin.DeleteAll();
                    }
                    catch (Exception e)
                    {
                        Console.WriteLine("Error emptying recycling bin: {0}", e.Message);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine("Error processing subsite: {0}", e.Message);
            }

            try
            {
                site.RecycleBin.DeleteAll();
            }
            catch (Exception e)
            {
                Console.WriteLine("Error emptying recycling bin: {0}", e.Message);
            }
        }
        Console.WriteLine("Finished processing site: {0}", siteName);
        Console.Read();
    }

    private static void TryProcessList(SPList list)
    {
        try
        {
            SPDocumentLibrary library = (SPDocumentLibrary)list;

            try
            {
                foreach (SPCheckedOutFile file in library.CheckedOutFiles)
                {
                    TryTakeOverFile(file);
                }
            }
            catch (SPException e)
            {
                Console.WriteLine("Could not take over file: {0}", e.Message);
            }

            foreach (SPListItem item in list.Items)
            {
                if (item.File.CheckOutType != SPFile.SPCheckOutType.None)
                {
                    TryUndoCheckOut(item);
                }
                if (remove && exp.IsMatch(item.File.Name))
                {
                    TryDeleteFile(item);
                }
            }
        }
        catch(InvalidCastException)
        {

        }
    }

    private static void TryDeleteFile(SPListItem item)
    {
        try
        {
            item.File.Delete();
        }
        catch (SPException e)
        {
            Console.WriteLine("Could not delete file {0}: {1}", item.File.Name, e.Message);
        }
    }

    private static void ProcessSubWeb(SPWeb subwebs)
    {
        foreach (SPList list in subwebs.Lists)
        {
            TryProcessList(list);
        }
    }

    private static void TryTakeOverFile(SPCheckedOutFile file)
    {
        try
        {
            file.TakeOverCheckOut();
        }
        catch (SPException)
        {
            Console.WriteLine("Could not take over file: {0}", file.LeafName);
        }
    }

    private static void TryUndoCheckOut(SPListItem item)
    {
        try
        {
            item.File.UndoCheckOut();
            item.File.Update();
        }
        catch (SPException)
        {
            item.File.CheckIn(string.Empty);
            Console.WriteLine("Could not undo checkout for file: {0}", item.Title);
        }
    }
}

As you can see, although the code is relatively long, it’s not very difficult. Your heuristics can be as simple as a regular expression match against a file’s extension (as above) or as complex as you’d like to make it. The only downside of looping through a large site collection is that it can take a significant amount of time to complete the scrubbing, especially if the recycle bin is in place. I recommend running the scrubber without the recycle bin in place, since the scrubber empties the recycle bin anyway. The reason this happen is because the goal is to shrink the amount of data you’re moving or keeping in what is likely a limited size environment.
Keep in mind that the scrubber is not perfect and cannot completely replace a human looking through the data. Using the scrubber as a blunt instrument that can whack a good deal of data from the site collection and then cleaning up the last few bits of remaining data is a great plan and can save you considerable amounts of time. The end result is a site collection that contains your site structure and branding but does not contain any of the unnecessary files that you don’t need. This creates the perfect testing environment and can be repeated much more often, making you, your environments, and your change management board happy.

SharePoint 2010: Site Collection Scrubber

by Andrew Schwenker on October 17th, 2011 | ~ minute read

Situation

Solution

Build an AI-First Enterprise

Tags

Leave a Reply

Andrew Schwenker

Categories

Follow Us