In my previous Kapow migration post, I gave an overview of the tool. In this post, I’ll give a short technical explanation of the migration process I developed. Keep in mind that my upload target was Sitecore, so some of my setup was Sitecore-specific.
First, an inventory of all the current pages in the site must be done. For our site, these pages were grouped according to Sitecore template and the URL of each page was loaded into spreadsheets. So I had 8 spreadsheets with names like “FAQ”, “Video”, and “LandingPage”, correlating with Sitecore templates named similarly. My spreadsheets had the following layout:
- First column: Page URL of the current site, which Kapow reads to load the page I wanted to extract from, since Kapow crawls HTML to extract data
- Second column: the new site’s URL so I knew where to load the finished data
- Third column: the name of the left menu associated with the new page so that I could associate the correct menu with each page
Second comes the data extraction. The first step here is setting up the data structure. In my opinion, this is very much like designing database tables for any project. Recurring elements are grouped into their own tables. The tables in Kapow are called types. This is one of my types, a recurring element that had an image, text and a URL:
A couple of things that I think are key to setting up useable types:
- Certain common fields are indispensable. All my types had a SourceURL (the old page), and a TargetURL (the new page). This helped me trace back to the old page when I was verifying data or troubleshooting.
- Another common field was a MigrationStatus. This helped me track where I was in the process – extraction, transformation, or upload.
- It helps to plan ahead. I added the ItemGuid to this type even though it had nothing to do with migration until the very last step. This field contained the unique Sitecore identifier for a piece of content that wasn’t populated until the clean data was loaded into Sitecore in the upload step. This proved invaluable in troubleshooting.
- For any field that needs to be cleaned up, I included two fields – the original field and the transformed field. (Note the ChicletText and ChicletTextTransformed fields, above.) This allowed me to check a transformed field against its source. It also allowed me to rerun just my transformation process, since I still had the original field from the extraction.
One difficult issue will be familiar to anyone who has designed a database – which data to break out into separate tables and how to link them. Because Kapow doesn’t give direct control over SQL Server updates, I found this trickier than usual. Once again, an example:
Many different types of pages had image banners at the top. It made sense to have a PageBanner type. But I needed a way to link that PageBanner record back to its parent page. Kapow does have the concept of a foreign key, but because of the complexity of our data, I opted to use the SourceURL to link parent and children. This worked well and I would do it that way again. Kapow also provides an Iterator as a built-in variable type for a loop command, so if there were multiple children that had to be placed in order (think of a slideshow, for instance), using this Iterator to sequence the data worked well.
Once the data structure is defined, the robot-making process can begin. A robot is a series of actions chained together. At its simplest, it looks like this:
- Do a Load page action. Read the URL from the spreadsheet and load the associated page into Kapow’s built-in browser.
- Perform an extract action. Determine what you need to extract. If it’s a single field, like text on a page, right-click it and load it directly into the type you’ve defined. If you need to loop through data (like FAQs, e.g.), you can use one of Kapow’s looping constructs. In all cases, you need to determine how Kapow will find your data reliably – whether it’s by a named div, a unique CSS class, or position in a table.
- Do a Store in Database action. Kapow uses the type to create a table if necessary and store a record in the database
Run the robot and Kapow loops through all the rows in the spreadsheet, extracting data and storing it in the database. The above example is simplistic, of course. Kapow has many actions, from assigning a variable to performing a test to storing in a database. Here is a partial list:
For anyone familiar with the basics of programming, configuring an action is really a matter of figuring out Kapow’s method of using a given construct. Following is an example of an if/else action, which won’t seem foreign at all to a developer:
The third step is Transformation – cleanup of the data. There are really two parts to this process: figuring out what transformations need to occur and then implementing those transformations.
Even the most careful analysis does not always uncover every transformation that must occur. For example, we know about the following 2 path transformations:
/sites/oldsite/BannerImageRotatorImageLibrary to /newsite/Images/Banners
/sites/oldsite/FileLibrary to /newsite/Files
We build our transformations. Now our analyst comes along and apologetically explains that one more transformation has been discovered in some dark corner of the site. Sigh. We have to go back and change all of our Transformation robots to include the new cleanup item. But…there is a solution to this problem. It’s called a snippet. This is a set of steps that can be set up and reused throughout other robots. If all of my path transformations are included in a snippet, adding one more to the snippet updates those transformations throughout the site. The snippet is highlighted in the image below.
Kapow makes the process of transformation easier through the use of regular expressions. My personal favorite, however, is the Data Converter. This allows chaining of commands, passing in the output of one command as the input of the next command. A simple example follows:
When we’re finished, we have clean extracted data in database tables. The only thing left to do is upload it to our new Sitecore site. Stay tuned.