Many projects need to import data. I have found it to be a painful process every time. Each import is custom and requires custom code. There are parts of the process that are the same each time (connecting with source data, reading data, creating a new item, avoiding duplicates, updating existing items). I wanted to find a way to make this process easier and less painful while reusing as much code as possible. I did plenty of research and came across Sitecore Data Exchange Framework (DEF). I was surprised to learn it has been around since version 8.2! I’ve never heard of it! After doing some initial research, I decided it sounded exactly like what I was looking for. It’s supported by Sitecore and has a pipeline architecture, so I should be able to make it do exactly what I need. And I can package up the base and share with others to speed up future imports. In the end, I decided it was not for me. But let me share my journey so you can decide for yourself.
Getting Started with DEF
I started by following Sitecore’s tutorial walkthrough. The tutorial is well written. I was quickly able to install the tool inside a docker container by making a few changes to my docker-compose.xml and .env files. It took me a couple of days to get all the way through the tutorial. There are many concepts to grasp and naming conventions to understand. I would say the learning curve is quite high. At the end of the tutorial, I was able to import data from a csv file and create items in Sitecore! Mapping the source data to Sitecore items was easy and flexible. I was very excited about the prospect, and I was ready to move on!
Taking my First Steps
The first change I wanted to make was to be able to read from a JSON file instead of a csv. This was easy to accomplish with either system.text.json or newtonsoft.json and a custom model class. Instead of reading positions from an array, I wanted to read properties of an object. I changed my Value Accessor from an Array Value Accessor to a Property Value Accessor. I changed the Examples.DataExchange.Providers.FileSystem.ReadTextFileStepProcessor class to process the json data instead of lines of a csv.
var text = File.ReadAllText(settings.Path); var json = JsonConvert.DeserializeObject<Response>(text); foreach(Person p in json.results) { yield return p; }
The second change I made was to read data from an api instead of a text file. This was easy to do with WebRequest. I would have come back later and changed this to a dependency injected HttpClient.
var webRequest = WebRequest.Create(uri); using (var webResponse = webRequest.GetResponse()) { using (var stream = new StreamReader(webResponse.GetResponseStream())) { text = stream.ReadToEnd(); } }
Going Beyond the Basics
As I started to implement the specific needs of my importer, I started to struggle with DEF.
My first goal was to be able to delete items in Sitecore that were no longer part of the source data. I wanted to read or write some property so I would know which items I could delete.
My first thought was to read the start time of the pipeline batch item and set as the created or last updated time of the newly created items. I did not find an easy way to access this item during the pipeline execution. My second thought was to write a guid to each item. But I could not find a way to change the guid on each run of the pipeline batch. I also considered deleting items created within a certain amount of time from the last pipeline batch run using powershell. But I wanted to keep the process within DEF and there would be no guarantee how fast the importer would run to know which items were modified this run.
I settled on adding boolean value to the item’s data template to indicate if it imported on the last run. I added a pipeline to reset the field to false on any existing items. I added a field to the value accessor set that used a constant value reader to set the field to true while importing the data (either creating new items or updating existing items). I added a second pipeline to delete any items where the field was set to false. I was surprised that deleting a Sitecore item was not part of DEF. I ended up writting a custom pipeline step processor (https://github.com/ericsanner/SitecoreContentImporter).
Then Things Got Interesting
My next goal was to map a droplink field. The data would come from the source as text. I wanted to lookup the related item and store the guid on the target item. I could not find an out of the box component that matched these criteria. The documentation mentioned when you would want to extend the default components but did not give any examples of how to do it. I scoured the web without much success. I decompiled the DEF dlls and created a custom value accessor. This required a Value Accessor Converter and a Value Reader. The plan was to use the value reader on the source data, find a matching item in the content tree, and write the guid value to the target. I fought with this for many days off and on. I had problems getting the pipeline to call my custom value accessor. When it did get called, it would not use my custom read method. I tried going the other way using a custom value writer before writing the value to the target with equally unsuccessful results. I decided to move on to the next problem and come back to this later.
The next goal was to process child data. Each item in the source data had associated locations. I needed to interrupt the main pipeline to kick off a new pipeline to iterate the locations. I spent a day searching for a solution, all the while thinking I would have the same problem of mapping the related location to the current item via guid (or guids if an item had multiple locations). The problem was worse yet because the I wanted to create Sitecore items for the state and country and store the guids on the location item. I put this on the back burner as well.
At this point, I was pretty frustrated, I almost gave up, but I moved on to my next goal. If I could get to work, I would be able to go back to the other items. I wanted to skip items from the source data if certain fields were empty. This seemed like an easy win. The value mappings support mapping rules, value transformers and have an option for “mapping set fails if this mapping fails”. I created a mapping rule to check for an empty value. I checked the box to fail the mapping set. I got a new Sitecore item with an empty value on the field. Wait. What? I decompiled some more dlls and found that the mapping rules don’t stop the pipeline or cause it to abort processing the current item. The new item is already created by the time it applies the mapping set. The pipeline continues and writes any available data to the item. That is not what I would have expected based on the label on the checkbox.
And Then I Gave Up
I had three major tasks left and I still had to publish the new/updated items and set the pipeline batch to run on a schedule. I was already past my deadline of needing to complete this work. I could not see how to make this easy for others to use in the future. I decided to cut my losses, move on and go in a new direction. In a previous project, we wrote a custom importer in C#. That took forever too. A colleague recommended powershell and offered some examples to get me going quickly. Their code was hard to follow, and I still wanted something that was more reusable.
I did end up using powershell. I created a library of functions that will hopefully make future imports easier. The most interesting piece was the update function. I pass an item and a hashtable of the key value pairs to update. It uses variable replacement to update the item based on the keys in the hashtable.
Function Update-SitecoreItem { $item.Editing.BeginEdit() foreach($key in $updates.GetEnumerator()) { if($item.($key.Name) -ne $null -and $item.($key.Name) -ne $key.Value) { $item.($key.Name) = $key.Value } } $itemModified = $item.Editing.EndEdit() }
This makes it fast and easy to create the mapping from your source data to the Sitecore item. It is very easy for the next person to read and update as well.
$item = Get-NewOrExistingSitecoreItem $itemRoot $itemTemplateId $itemName $updates = @{} $updates.Add("__Display name", $job.title) $updates.Add("Title", $job.title) $updates.Add("OpenGraphTitle", $job.title) $updates.Add("OpenGraphDescription", $briefDescription) $updates.Add("MetaDescription", $briefDescription) $updates.Add("NavigationTitle", $job.title) $updates.Add("Content", $job.description) $updates.Add("ImportedOnLastRun", "1") Update-SitecoreItem $item $updates
I was able to save the script to the Script Library (/sitecore/system/Modules/PowerShell/Script Library) and quickly create a schedule task to execute my script on a daily basis.
Conclusion
I really wanted to like Data Exchange Framework. Sitecore’s tutorial walkthrough is well written. It gives you the basics but leaves a lot for you to discover on your own. I had days of research decompiling Sitecore dlls and searching online, followed by a day of feel good success, then days of frustration as I tackled the next step in the process. After three weeks, I threw in the towel and wrote the importer in Powershell. You can find my full powershell script on my github https://github.com/ericsanner/SitecoreContentImporter. Let me know if you find it useful or if you have any suggestions on things I could add based on an import you’ve done in the past.