In two previous posts, I gave a Kapow overview and an outline of the extraction and transformation process. This article will cover the upload of migrated content into Sitecore.
Once data is extracted and transformed, the clean data is sitting in database tables ready to be uploaded into Sitecore. Sitecore has an Item Web API available for uploading data, but it is limited to basic retrieval, creation, and update operations. How was I going to tie related records together? How could I perform basic if/else operations that were necessary? It was obvious almost immediately that the Item Web API would not be adequate.
Because I had so much system specific processing to do, I decided to write my own upload process, using the Sitecore API. If this were an ongoing process, it would have been necessary to build a more automated, flexible way to upload this data, but because it was a once and done operation, and time was short, the solution outlined below was sufficient. I wrote a rather ugly web page that allowed me to click through the upload process quite quickly:
A word about the Sitecore architecture of the site: it is built with a library of “widgets” (banners, FAQ lists, etc.) in addition to the page templates. It also has a separate navigation section. Thus the building block widgets must be created before the pages into which they are placed are created. Of course, parent pages must be created before child pages can be inserted. And links cannot be created before the link target is present. Obviously, the order of the entire process was important.
The following steps were completed in order:
- Images and Files were inserted into the Media Library. The Kapow extraction process for files and images allows them to be written to the file system. Using the MediaCreator class, the images/files stored in the file system could be uploaded to Sitecore. One benefit of this class is that it recognizes duplicates and will upload a given item just once. One issue here is that many images had periods as part of the file name. Sitecore doesn’t allow this. My solution was to write a Powershell script that iterated all images and changed periods to underscores. Of course, this same transformation had to be done in the HTML during the transformation process.
- Next, widgets were inserted. Issues here included how to determine the folder structure of the widgets. It was decided to use the path of the containing page, up to 2 levels deep.
- Pages came next. Parent pages needed to be inserted first. This was accomplished by counting “/” in the paths of the page. All one-slash pages for all templates were inserted first, followed by two-slash pages, etc. Thus /parent was inserted, followed by /parent/child, and so on. At this point also the widgets previously inserted could be associated with the page and added to the presentation details.
- Finally came the insertion of menu items and link cleanup. Links had been properly transformed by the Kapow process, but the general link field in Sitecore requires its target to be present before it can be inserted, since the link itself is an Item Guid (look at raw values for a link). So any item with a general link field had to be updated after all content was present.
- With each successful insertion, the Item Guid was written to the appropriate database table, and the migration status was set to “Uploaded” or “Upload Error” so that errors could be corrected. One of the most commonly occurring issues was that an image associated with an item was not present because it had been missed in the extraction process, so the item could not be created.
So we’re all finished and the site looks great! Not so fast.
- First, there were pages that we decided not to automate. They were either too complicated or there were too few of them to make automation worthwhile. So these were inserted manually.
- There were other cleanup issues. Certain HTML was so unpatterned that it was impossible to extract it properly. This was adjusted manually
- Some duplicates were inserted. Kapow has duplicate detection capability, but it would have been difficult to detect duplicates because of the content structure. It was easier and quicker to do this manually
This migration was not a magic bullet. Verification of the migrated content was still necessary. But overall I found the results satisfactory and enjoyed working with Kapow.