Skip to main content

Sitecore

Exporting Media Items with Sitecore PowerShell Extensions

Intro 📖

At this point, I think every Sitecore developer uses or has at least heard of Sitecore PowerShell Extensions (SPE). This powerful module for Sitecore is an auto-include for most teams and projects. The module is included with XM Cloud instances by default; if you use XM Cloud, then you already have SPE. If you somehow haven’t heard of SPE, stop whatever you’re doing and go check it out. 🚪🏃‍♂️

Included with SPE are several out-of-the-box (OOTB) reports that users can run. On a recent project, one of these reports came in clutch: the Unused media items report which is located in the Sitecore content tree at the following path:

/sitecore/system/Modules/PowerShell/Script Library/SPE/Reporting/Content Reports/Reports/Media Audit/Unused media items

This report generates a list of media items that are in the media library but not used in Sitecore, where “used” is defined as being referenced at least once in the Sitecore link database. In other words, the report lists media items that are (probably) just taking up space.

Extending the Report 🚀

In my case, rather than generating a report of unused media items, I needed to do kind of the opposite–generate a report of used media items and export those items using content packages (which will be installed in higher environments as part of a content migration). Using the OOTB Unused media items report as a starting point, I wrote a more generic, general-purpose script to package up media items based on several criteria. When the script is executed, it looks like this (I used PowerShell ISE included with SPE to run the script):

Export Media Items

…continued below 👇 (…I need a bigger monitor 🖥)

Export Media Items

Parameters ⚙

Here’s a rundown of the different parameters:

  • Media to Include
    • Default: Used
    • This parameter determines if the script processes used media items, unused media items, or both used and unused media items. To determine if an item is used or not, the script queries the link database to get the referrer count for the item.
  • Media Library Folders
    • Default: (none)
    • Use this parameter to designate the folders from which you’d like the script to pull and process media items. The script includes a check to prevent duplicate media items from being processed if, for whatever reason, multiple overlapping media folders are selected.
    • Note that the root media library node can be selected, but doing so isn’t ideal for performance reasons. It’s better to limit the script to those media folders you know contain the media items you need to export. If no folders are selected, the script does nothing.
  • Extensions to Include
    • Default: (none)
    • If you’d like the script to only process media items with certain extensions, check the relevant extensions here. Note that, by default, no extensions are checked, and no extension filtering is applied. Only interested in exporting PDFs? Check the pdf extension. Want every media item regardless of extension? Don’t check any extensions.
  • Cutoff Date
    • Default: (none)
    • If specified, this parameter causes the script to only process media items that were either created or update after this date.
    • This parameter is useful if you need to run a “delta” export to pick up any new or updated items since a previous run. Just remember the date of your last run. Note that the file names of the generated packages include a date stamp, e.g., 20240323T0504075151Z – Export Media Items 1.zip.
  • Maximum Package Size
    • Default: 25 MB
    • The script uses the Size field on media items to estimate the overall size of the files in a given content package. If the total size of the packaged media items reaches this threshold, an additional package is created until all media items are packaged.
    • The script can generate packages that are larger than this size–the threshold check happens after a file has been added. Also note that the size on disk of a serialized item isn’t exactly the same as the number of bytes stored in the Size field. In other words, the “package chunking” logic is approximate.
    • Anecdotally, I’ve noticed that when content packages get to be large (x > 200 MB), uploading and installing them can get dicey, depending on the environment. Pick a size that makes sense for your use-case.
  • Exclude System Folders
    • Default: Checked (☑)
    • If checked, the script will ignore any media items whose path contains /System/. This is useful for excluding, say, thumbnails that are generated by Sitecore.
  • Verbose Console Output
    • Default: Checked (☑)
    • If checked, additional output is written to the console which can be useful when performing a dry run together with Debug Mode (below).
  • Debug Mode
    • Default: Checked (☑)
    • If checked, the script won’t write any content packages to disk. This is useful when performing dry runs of the script to generate the report detailing which item is in which package before committing to writing a potentially large number of files or large individual files to disk.
    • Assuming this checkbox is unchecked, the resulting content packages are saved to disk under the $SitecorePackageFolder path, which is usually C:\inetpub\wwwroot\App_Data\packages.
    • If you aren’t seeing any content packages on disk, make sure this parameter is unchecked.

Output 📝

The script essentially provides three different forms of output: console output, the typical SPE report results dialog (with CSV and Excel export functionality), and the content packages themselves which are written to disk. For example, assuming Verbose Console Output is checked and a Maximum Package Size of 100 MB, the console output could look something like this:

Export Media Items - Console Output

The cyan line outputs the total number of media items to be processed based on the parameters. The green lines detail the first package to be created with the magenta lines listing each media item in that first package. The green and magenta lines repeat until all of the media items are processed and all the content packages are generated. The last bit of console output will look something like this:

Export Media Items - Console

The grey lines are the paths on disk for each of the generated content packages. Note that the generated packages are named using a pattern that includes a time stamp; this is the (server) time that the script was executed and will be the same date and time for all packages generated on a particular run.

The report results dialog is similar to other SPE reports and could look something like this:

Export Media Items - Report Dialog

From here, the user can see which media items are in which packages. Exporting this data to a CSV or Excel file, users can audit installed content packages, set up additional downstream automation, use the file as a manifest for archiving media items, etc.

Ideas for Improvements 💡

There’s always room for improvement. These were some of my ideas:

  • Adding support for non-media items would be cool, though it would mean having to figure out how to calculate the size of the packages without the use of the Size field (which is unique to media items and is automatically set by Sitecore when uploading a file). I suppose you could determine the size of a serialized item in memory and use that…🤔
  • Including more extensions: other image extensions, video extensions, Office extensions, etc.
  • Allowing for a date range to process files added or modified within a given time span.
  • Exposing more options for package installation rather than assuming an overwrite.
  • Supporting a configurable naming convention for generated packages.
  • General performance tweaks.

Do you have other ideas for improvements? Do you see a bug or typo that I missed? Please drop me a comment below! 💬 👇

The Code 💻

The script is available as a public GitHub Gist here and is also duplicated below if you don’t get the Gist (dad joke?).

<#
    .SYNOPSIS
        Creates content packages for media items matching certain criteria.
        By default, packages are saved to disk at C:\inetpub\wwwroot\App_Data\packages.
        
    .NOTES
        Original "Unused media items" report (/sitecore/system/Modules/PowerShell/Script Library/SPE/Reporting/Content Reports/Reports/Media Audit/Unused media items) written by Michael West.
        Additional parameters, filtering, content package creation, etc. written by Nick Sturdivant.

        This script requires that Sitecore PowerShell Extensions be installed.
#>

$reportName = "Export Media Items"

$extensionOptions = [ordered]@{ "bmp" = "bmp"; "gif" = "gif"; "jpg" = "jpg"; "jpeg" = "jpeg"; "pdf" = "pdf"; "png" = "png"; "svg" = "svg"; }
$maxPackageSizeOptions = [ordered]@{ "5 MB" = 5000000; "10 MB" = 10000000; "25 MB" = 25000000; "50 MB" = 50000000; "100 MB" = 100000000; "250 MB" = 250000000 }
$usedMediaItemOptions = [ordered]@{ "Both" = "both"; "Used" = "used"; "Unused" = "unused" }

$props = @{
    Parameters  = @(
        @{
            Name    = "usedMediaMode"
            Title   = "Media to Include"
            Tooltip = "Determines if the script processes used, unused, or both used and unused media items (where ""used"" is defined as having at least one entry in the link database)."
            Value   = "used"
            Options = $usedMediaItemOptions 
        }
        @{
            Name    = "selectedMediaFolders"
            Title   = "Media Library Folders"
            Tooltip = "The media library folders from which to include items."
            Value   = @()
            Editor  = "treelist" 
        }
        @{
            Name    = "selectedExtensions"
            Title   = "Extensions to Include"
            Tooltip = "The file extension(s) for the media items to process and include in the package(s)."
            Value   = @()
            Options = $extensionOptions
            Editor  = "check" 
        }
        @{
            Name    = "cutoffDate"
            Title   = "Cutoff Date"
            Tooltip = "If set, causes the script to only process media items that were created or updated after this date."
            Value   = [datetime]::MinValue
            Editor  = "date" 
        }
        @{
            Name    = "selectedMaxPackageSize"
            Title   = "Maximum Package Size"
            Tooltip = "The maximum size package the script will generate. If the total size of the media items to be packaged exceeds this limit, then multiple packages are created until all items have been packaged."
            Value   = 25000000
            Options = $maxPackageSizeOptions 
        }
        @{
            Name    = "excludeSystemFolders"
            Title   = "Exclude System Folders"
            Tooltip = "If checked, any media items with ""/System/"" anywhere in their path are ignored."
            Value   = $true
            Editor  = "check" 
        }
        @{
            Name    = "verboseOutput"
            Title   = "Verbose Console Output"
            Tooltip = "If checked, additional output will be written to the console."
            Value   = $true
            Editor  = "check" 
        }
        @{
            Name    = "debugMode"
            Title   = "Debug Mode"
            Tooltip = "If checked, no packages will be saved to disk."
            Value   = $true
            Editor  = "check" 
        }
    )
    Title       = " $reportName"
    Icon        = "OfficeWhite/32x32/box_into.png"
    Description = "This script queries for used and/or unused (referenced) media items and generates content packages containing those items."
    Width       = 600
    ShowHints   = $true
}

$result = Read-Variable @props

$items = @()
$itemsReport = @()
$timestamp = (Get-Date -Format FileDateTimeUniversal)

if ($result -eq "cancel") {
    exit
}

function HasReference {
    param(
        $Item
    )
    
    $linkDb = [Sitecore.Globals]::LinkDatabase
    $linkDb.GetReferrerCount($Item) -gt 0
}

function Get-MediaItemWithReference {
    param(
        [string]$Path,
        [string[]]$Extensions
    )
    
    $mediaItemContainer = Get-Item ("master:" + $Path)
    $excludedTemplates = @([Sitecore.TemplateIDs]::MediaFolder, [Sitecore.TemplateIDs]::Node)
    $items = $mediaItemContainer.Axes.GetDescendants() | 
    Where-Object { $excludedTemplates -notcontains $_.TemplateID } | Initialize-Item | 
    Where-Object { -not $excludeSystemFolders -or ( -not ($_.FullPath -like "*/System/*") ) } |
    Where-Object { $cutoffDate -eq [datetime]::MinValue -or ( $_.__Created -gt $cutoffDate -or $_.__Updated -gt $cutoffDate ) } |
    Where-Object { $Extensions.Count -eq 0 -or $Extensions -contains $_.Fields["Extension"].Value }
    
    # filter based on usage (links)
    foreach ($item in $items) {
        if ($usedMediaMode -eq "both") {
            $item
        }
        if ($usedMediaMode -eq "used" -and (HasReference -Item $item)) {
            $item
        }
        if ($usedMediaMode -eq "unused" -and (-not (HasReference -Item $item))) {
            $item
        }
    }
}

function Build-Package {
    param(
        [Sitecore.Data.Items.Item[]]$Items,
        [int]$Size,
        [int]$PackageNumber,
        [ref]$ItemsReport
    )    
    
    if ($verboseOutput) {
        Write-Host ""
        Write-Host "Building package $PackageNumber..." -ForegroundColor Green
        Write-Host "Total items: $($Items.Count)" -ForegroundColor Green
        Write-Host "Total size: $Size bytes" -ForegroundColor Green
        Write-Host ""
    }
    
    $package = New-Package -Name "Export Media Items"
    $package.Sources.Clear()
    $package.Metadata.Author = "SPE"
    $package.Metadata.Version = $timestamp
    $package.Metadata.Readme = "A package containing media items; generated by a Sitecore PowerShell Extensions script."
    
    $packageZipFileName = "$( $package.Metadata.Version ) - $( $package.Name ) $PackageNumber.zip"
    
    foreach ($itemToPackage in $Items) {
        if ($verboseOutput) {
            Write-Host "`t+ $($itemToPackage.FullPath)` ($($itemToPackage.Fields["Size"].Value -as [int]) bytes)" -ForegroundColor Magenta
        }
        $source = Get-Item $itemToPackage.FullPath | New-ExplicitItemSource -Name "$($itemToPackage.ID)" -InstallMode Overwrite
        $package.Sources.Add($source)
        
        $ItemsReport.Value += @{
            ID       = $itemToPackage.ID
            FullPath = $itemToPackage.FullPath
            Package  = $packageZipFileName
        }
    }

    if (-not $debugMode) {
        Export-Package -Project $package -Path $packageZipFileName -Zip
    }
}

foreach ($selectedMediaFolder in $selectedMediaFolders) {
    # ensure selected media folder is the media library itself or a folder within the media library
    if ($selectedMediaFolder.FullPath -ne "/sitecore/media library" -and $selectedMediaFolder.TemplateID -ne [Sitecore.TemplateIDs]::MediaFolder) {
        Write-Host "Selected folder $($selectedMediaFolder.FullPath) is neither the media library root nor a media folder within the media library and will be ignored." -ForegroundColor Yellow
        continue
    }
    
    $itemsFromPath = Get-MediaItemWithReference -Path $selectedMediaFolder.FullPath -Extensions $selectedExtensions
    
    # prevent duplicate items if overlapping media folders are selected
    foreach ($itemFromPath in $itemsFromPath) {
        $existingItem = $items | Where-Object { $_.ID -eq $itemFromPath.ID }
        if ($null -eq $existingItem) {
            $items += $itemFromPath
        }
    }
}

if ($items.Count -eq 0) {
    Show-Alert "There are no media items matching the specified parameters."
}
else {
    Write-Host "Total media items to be processed and packaged: $($items.Count)" -ForegroundColor Cyan

    $packageSize = 0
    $itemsInPackage = @()
    $itemsProcessed = 0
    $packageCount = 0
    
    foreach ($itemToPackage in $items) {
        $itemsInPackage += $itemToPackage
        $packageSize += $itemToPackage.Fields["Size"].Value -as [int]
        $itemsProcessed++
        
        if ($packageSize -ge $selectedMaxPackageSize -or $itemsProcessed -eq $items.Count) {
            
            $packageCount++

            Build-Package -Items $itemsInPackage -Size $packageSize -PackageNumber $packageCount -ItemsReport ([ref]$itemsReport)
            
            $packageSize = 0
            $itemsInPackage = @()
        }
    }
    
    # report output
    $mediaToInclude = ""
    if ($usedMediaMode -eq "both") {
        $mediaToInclude = "Both (used and unused)"
    } else {
        $mediaToInclude = ($usedMediaMode.Substring(0, 1).ToUpper() + $usedMediaMode.Substring(1))
    }
    $mediaFolders = ""
    $selectedMediaFolders | ForEach-Object { $mediaFolders += "<br/>&nbsp;&nbsp;- $($_.FullPath)" }
    $extensions = $selectedExtensions -join ", "
    if ($extensions -eq "") {
        $extensions = "(all)"
    }
    
    $infoDescription = "List of the media items matching the specified criteria that are contained within the generated content packages.<br/><br/>" +
    "Media to Include: $mediaToInclude<br/>" + 
    "Media Library Folders: $mediaFolders<br/>" +
    "Extensions to Include: $extensions<br/>" + 
    "Cutoff Date: "
    if ($cutoffDate -eq [datetime]::MinValue) {
        $infoDescription += "(none)<br/>"
    }
    else {
        $infoDescription += "$($cutoffDate.ToShortDateString())<br/>"
    }
    $infoDescription += "Maximum Package Size: $($selectedMaxPackageSize / 1000000) MB<br/>" +
    "Exclude System Folders: $excludeSystemFolders<br/>" +
    "Verbose: $verboseOutput<br/>" + 
    "Debug: $debugMode"

    $reportProps = @{
        InfoTitle       = $reportName
        InfoDescription = $infoDescription
        PageSize        = 25
        Title           = $reportName
    }
    
    Write-Host ""
    Write-Host "Finished! 🎉" -ForegroundColor Cyan

    if ($verboseOutput) {
        Write-Host ""
        $itemsReport | 
        ForEach-Object { $_.Package } | 
        Select-Object -Unique | 
        ForEach-Object { Write-Host ($SitecorePackageFolder + "\" + $_) -ForegroundColor Gray }
    }

    # display report output
    $itemsReport |
    Show-ListView @reportProps -Property @{ Label = "ID"; Expression = { $_.ID } },
    @{Label = "Full Path"; Expression = { $_.FullPath } },
    @{Label = "Package"; Expression = { $_.Package } }
}

(📝 Note: the code snippet above may not be kept up-to-date with the Gist over time)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Nick Sturdivant, Senior Solutions Architect

More from this Author

Follow Us