Challenge:
One of the best Sitemap practices is limiting a single sitemap to 50MB (uncompressed) or 50,000 URLs. We must break the Sitemap into multiple sitemaps if we have a larger file or more URLs. Then, reference each of them in the Sitemap index. Sitecore 10.3 onwards, SXA Sitemap supports splitting the Sitemap into multiple sitemaps if it exceeds the given URL count limit. In this post, we enhance the SXA Sitemap to support splitting the Sitemap into multiple sitemaps if it also exceeds 50MB or the given size.
Solution:
The enhancement discussed in this post is developed with Sitecore 10.3 Update 1 but should work with Sitecore 10.3 as well.
SXA Sitemap settings can be found at /sitecore/content/<Tenant>/<site>/Settings/Sitemap.
Please read the Sitecore documentation. Configure a sitemap to ensure the sitemap is working properly and to understand the purpose of each field in the sitemap setting item. Validate if the sitemap is working properly at /sitemap.xml.
By default, the field “Maximum number of pages per sitemap (if undefined, all URLs will be rendered into single)” has no value. As per class “Sitecore.XA.Foundation.SiteMetadata.Settings. SitemapSettingsProvider”, the default value set to this field, i.e., SitemapIndexThreshold property, is int.MaxValue (2147483647), as shown below.
SitemapIndexThreshold = MainUtil.GetInt(configurationRoot[Sitecore.XA.Foundation.SiteMetadata.Templates.SitemapSettings.Fields.SitemapIndexThreshold], int.MaxValue),
As per the Sitemap best practices, let’s provide 50000 to limit the number of URLs to 50000 for a single Sitemap. Hence, if we have a URL count more than this, then it will split the Sitemap into multiple Sitemaps and reference them in a Sitemap index.
Let’s customize the class “Sitecore.XA.Foundation.SiteMetadata.Services.SitemapManager” to split the Sitemap into multiple Sitemaps if the Sitemap size exceeds 50MB or the given size. Basically, each Sitemap size should be within the given size.
Following is the customized SitemapManager code. Please check the comments for more details. For the complete code, download the CustomSXA.Foundation.SiteMetaData.Services.SitemapManager code and add to a suitable foundation layer project. Feel free to update the namespace as per your project.
//Existing using namespaces here and below are the new one using Sitecore; using Sitecore.XA.Foundation.SiteMetadata.Services; using System.IO; using System.Text; using System.Xml; //Custom namespace and class. namespace CustomSXA.Foundation.SiteMetaData.Services { public class SitemapManager : ISitemapManager { //Existing properties here //New property added for content size limit of Sitemap private long SitemapMaxSizeLimit { get; } = StringUtil.ParseSizeString(Sitecore.Configuration.Settings.GetSetting("CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit", "50MB")); //Existing methods here //customized code for GenerateSitemap public SitemapContent GenerateSitemap(SiteContext site) { Item homeItem = this.GetHomeItem(site); Sitecore.XA.Foundation.SiteMetadata.Models.Sitemap.SitemapSettings sitemapSettings = this.SitemapSettingsProvider.GetSitemapSettings(homeItem); if (sitemapSettings == null) return (SitemapContent)null; if (sitemapSettings.CacheType == SitemapStatus.Inactive) return (SitemapContent)null; IList<Item> items = this.SitemapSettingsProvider.GetItemCrawler(sitemapSettings).GetItems(homeItem); int count = items.Count; bool flag = sitemapSettings.IncludeXdefault && sitemapSettings.GenerateAlternateLinks; if (flag) count += items.GroupBy<Item, ID>((Func<Item, ID>)(i => i.ID)).Where<IGrouping<ID, Item>>((Func<IGrouping<ID, Item>, bool>)(i => i.Count<Item>() >= 2)).Count<IGrouping<ID, Item>>(); SitemapContent sitemap; if (sitemapSettings.SitemapIndexThreshold < count) { List<string> stringList = new List<string>(); IEnumerable<IGrouping<ID, Item>> groupings = items.GroupBy<Item, ID>((Func<Item, ID>)(i => i.ID)); List<Item> objList = new List<Item>(); int num1 = 0; foreach (IGrouping<ID, Item> grouping in groupings) { int num2 = grouping.Count<Item>() < 2 ? 0 : 1; if (flag) num1 += num2; if (objList.Count + grouping.Count<Item>() + num1 <= sitemapSettings.SitemapIndexThreshold || objList.Empty<Item>()) { objList.AddRange((IEnumerable<Item>)grouping); } else { SplitSitemap(sitemapSettings, stringList, objList); //New SplitSitemap function is called instead of old code - stringList.Add(this.RenderSitemap((IList<Item>) objList, sitemapSettings)); objList.Clear(); if (flag) num1 = num2; objList.AddRange((IEnumerable<Item>)grouping); } if (objList.Count + num1 > sitemapSettings.SitemapIndexThreshold) { SplitSitemap(sitemapSettings, stringList, objList); //New SplitSitemap function is called objList.Clear(); num1 = 0; } } if (objList.Any<Item>()) { SplitSitemap(sitemapSettings, stringList, objList); //New SplitSitemap function is called } sitemap = new SitemapContent() { Values = stringList }; } else { List<string> stringList = new List<string>(); SplitSitemap(sitemapSettings, stringList, items); //New SplitSitemap function is called. This is the case where the URL count limit is already satisfied but we check for the content size limit here and split it if neeeded sitemap = new SitemapContent() { Values = stringList }; } this.SetRefreshDate(site); return sitemap; } //Existing code for GetSettings() and RenderSitemap() here. //Following are the new functions to split the Sitemap based on the content size limit. private void SplitSitemap(SitemapSettings sitemapSettings, List<string> stringList, IList<Item> objList) { string originalSiteMap = this.RenderSitemap((IList<Item>)objList, sitemapSettings); List<string> listOfSiteMap = SplitSitemap(originalSiteMap); stringList.AddRange(listOfSiteMap); } public List<string> SplitSitemap(string originalSitemap) { //return the same original sitemap back if its size is within the given limit List<string> sitemapSegments = new List<string>(); if (Encoding.UTF8.GetBytes(originalSitemap).Length <= SitemapMaxSizeLimit) { sitemapSegments.Add(originalSitemap); return sitemapSegments; } //If not within the size limit, split it. StringBuilder currentSegment = new StringBuilder(); using (StringReader stringReader = new StringReader(originalSitemap)) using (XmlReader xmlReader = XmlReader.Create(stringReader)) { while (xmlReader.Read()) { if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "urlset") { if (currentSegment.Length > 0) { // Close the previous <urlset> tag currentSegment.AppendLine("</urlset>"); sitemapSegments.Add(currentSegment.ToString()); currentSegment.Clear(); } currentSegment.AppendLine("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>"); currentSegment.AppendLine("<urlset xmlns:xhtml=\"http://www.w3.org/1999/xhtml\" xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">"); } else if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "url") { if (currentSegment.Length == 0) { throw new InvalidOperationException("Invalid sitemap structure"); } string urlElement = xmlReader.ReadOuterXml(); //Add the new URL element with closing tag to temporary sitemap and check its size if exceeded the limit. StringBuilder tempCurrentSegment = new StringBuilder(); tempCurrentSegment.Append(currentSegment); tempCurrentSegment.AppendLine(urlElement); tempCurrentSegment.AppendLine("</urlset>"); if (Encoding.UTF8.GetBytes(tempCurrentSegment.ToString()).Length > SitemapMaxSizeLimit) { // Close the previous <urlset> tag currentSegment.AppendLine("</urlset>"); sitemapSegments.Add(currentSegment.ToString()); currentSegment.Clear(); tempCurrentSegment.Clear(); // Start a new <urlset> tag currentSegment.AppendLine("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>"); currentSegment.AppendLine("<urlset xmlns:xhtml=\"http://www.w3.org/1999/xhtml\" xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">"); } currentSegment.AppendLine(urlElement); } } } if (currentSegment.Length > 0) { // Close the last <urlset> tag currentSegment.AppendLine("</urlset>"); sitemapSegments.Add(currentSegment.ToString()); } return sitemapSegments; } } }
Provide the following Sitecore patch config – CustomSXA.Foundation.SiteMetaData.config.
<?xml version="1.0"?> <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> <sitecore> <services> <register patch:instead="register[@implementationType='Sitecore.XA.Foundation.SiteMetadata.Services.SitemapManager, Sitecore.XA.Foundation.SiteMetadata']" serviceType="Sitecore.XA.Foundation.SiteMetadata.Services.ISitemapManager, Sitecore.XA.Foundation.SiteMetadata" implementationType="CustomSXA.Foundation.SiteMetaData.Services.SitemapManager, CustomSXA.Foundation.SiteMetaData" lifetime="Singleton"/> </services> <settings> <setting name="CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit" value="50MB" /> </settings> </sitecore> </configuration>
Update CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit to fit your needs. Use KB, MB, or Bytes. If in bytes, provide only the numeric value. Sitecore.StringUtil.ParseSizeString(), used to process this setting, can also take GB, so please make necessary validation in the code if needed. Update the namespace also with yours.
Consider installing the Nuget packages from this list. Build the solution and deploy it.
Demo
Default SXA Sitemap.
With the custom SitemapManager code. For demo purposes, we have set the “Sitemap Index Threshold” (URL count limit) to 5 in the Sitemap setting item, and have set the “CustomSXA.Foundation.SiteMetaData.SitemapMaxSizeLimit” (content size limit ) to 1000, i.e., 1000 Bytes. Please update these settings as per your requirements.
Good to read the Sitecore documentation Prioritize a page in the search engine sitemap to manage sitemap settings at the page item level.
Hope this helps. Happy Sitecore Learning!