Skip to main content

Back-End Development

Media Indexing Approaches

Developers Desk@1x.jpg

Sitecore has powerful search capabilities for those who are interested in runtime performance. Lucene is embedded by default with a standard set of indexes for all databases that is automatically refreshed upon change of content. LINQ-based queries allow you to easily retrieve documents and filter them. So, to make a website faster, it’s wise not to query the Sitecore database and instead get data from the Lucene index.

Typical search functionality includes displaying some text and image data on a search results page with filtering and pagination. Media is an important part of the visualization of any data and this article will show how to store media in Lucene.

Standard image field type indexing

Let’s say we have custom index configuration and a template which is included in the crawling process. We have an Image field in the template and the easiest way to include it in indexing is to add such field by field name*:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<contentSearch>
<indexConfigurations>
<configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
…
<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="Image" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.GUID" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
…
</fieldNames>
</fieldMap>
…
</configuration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>

*The code and configuration are tested using Sitecore 8.1 Update-1.

After content is published and the index is updated, you can see the actual value in the index. If there is no such field in the records, try to change Alt field for the media item. You may be surprised but the ImageFieldReader class, which is responsible for the crawling of media items, will put the Alt field value into the index:

namespace Sitecore.ContentSearch.FieldReaders
{
public class ImageFieldReader : FieldReader
{
public override object GetFieldValue(IIndexableDataField indexableField)
{
ImageField imageField = FieldTypeManager.GetField((Field) (indexableField as SitecoreItemDataField))
as ImageField;
if (imageField == null)
return (object) null;
if (string.IsNullOrEmpty(imageField.Alt))
return (object) null;
else
return (object) imageField.Alt;
}
}
}

 

Give me the URL!

When there is a need to have more control over what to put into the index, computed fields can serve the purpose. They can also be helpful if we want to reference fields from different entities under the same name, “Image.” For example, for a Product entity the field name could be Product Image and for a Category entity the field could be called Category Image. Only the image URL matters, ultimately, so logic tells us that we should store URLs in a computed field. It’s also important to ensure the same code:

  • works across all environments
  • is independent of host name and protocol
  • returns null if image field is not assigned in order to avoid creating an empty record in index document
public class Image: BaseComputedField
{
public override object ComputeFieldValue(IIndexable indexable)
{
var indexableItem = (SitecoreIndexableItem)indexable;

if (indexableItem == null || indexableItem.Item == null)
{
return null;
}

Item item = indexableItem.Item;

if (item.IsDerived(TemplateIds.Product))
{
return GetMediaItemUrl(indexableItem.Item, "Product Image");
}

if (item.TemplateID.Equals(TemplateIds.Category))
{
return GetMediaItemUrl(item, "Category Image");
}

return null;
}
}

where BaseComputedField class is:

public class BaseComputedField : IComputedIndexField
{
// implement interface here

protected string GetMediaItemUrl(Item item, string fieldName)
{
if (item != null)
{
var imageField = (ImageField)item.Fields[fieldName];

if (imageField != null && imageField.MediaItem != null)
{
return GetMediaUrl(imageField.MediaItem);
}
}

return null;
}

protected string GetMediaUrl(MediaItem mediaItem)
{
var mediaUrlOptions = new MediaUrlOptions {AbsolutePath = false, AlwaysIncludeServerUrl = false};

return Sitecore.StringUtil.EnsurePrefix('/', MediaManager.GetMediaUrl(mediaItem, mediaUrlOptions));
}
}

 

Non-trivial usages of media (removal case)

Now, when such code escapes into the wild, you cannot control and predict how content editors will work with media items. Some content editors can remove media items with the Leave links option. This is not a big issue as we already handled this case and it will not create a field in the index.

But what if a content editor uses the option Link to another item? What if the link is changed to some non-media item, like a media library root item? The code will still work and generates the link /-/media/3D6658D8A0BF4E75B3E2D050FABCF4E1.ashx but it’s not pointing to the media resource.

To figure out if it’s a media item, we can use:
1. item.Paths.IsMediaItem, which will check that the item’s path starts with /sitecore/media library/. Forget about this solution. It’s not applicable for anything unrelated to media items such as folders under media library.
2. MediaItem properties such as extension or size. This is more reliable and, to apply it, let’s adjust the logic above:

if (imageField != null && imageField.MediaItem != null && ((MediaItem)imageField.MediaItem).Size > 0)

 

Non-trivial usages of media (attach/detach)

It’s always a good practice to invest time in training a customer on content editing and cover all possible methods of Sitecore backend usage. But a client’s behavior can still bring lots of surprises. One such example is editing the media items themselves by attaching/detaching of media.

What happens in such case? The value of the media URL remains the same; and, there is no change on UI, either. This is due to media response caching. The default Sitecore setting which regulates it is

<setting name="MediaResponse.MaxAge" value="7.00:00:00"/> // adds max-age = 7 days to media response headers

Caching brings us many benefits, like making interaction with the user faster and saving bandwidth. That’s why it’s not a good practice to turn it off. It’s time for our code to evolve then.

Storing media item ID

To eliminate any issue with media response caching, we will generate an image link during every rendering execution. MediaUrlOptions class has useful property called DisableBrowserCache which adds ts query parameter to image url with revision/datetime. That’s why there is no need to store and calculate image URL in the index; it’s enough to store just the media item ID:

protected ID GetMediaItemId(Item item, string fieldName)
{
if (item != null)
{
var imageField = (ImageField)item.Fields[fieldName];

if (imageField != null && imageField.MediaItem != null && ((MediaItem)imageField.MediaItem).Size > 0)
{
return imageField.MediaID;
}
}

return null;
}

 

Conclusion

It looks like we have a universal solution now which works perfectly for all known cases.

di kaprio

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Yury Fedarovich

Yury is a Senior .NET Developer at BrainJocks with more than 8 years of experience. He is a certified Sitecore developer and ScrumMaster (CSM), as well as Microsoft certified in .NET Framework 4 and Web Applications. Being part of the BrainJocks team has given Yury the opportunity to serve as Sitecore Architect on some client projects, and to solve some complex problems. Yury never gets tired of that part of his job. Delivering innovative technical solutions that reflect best practices is one of his specialties. When he is not developing web applications, Yury might be found developing his appreciation for culture at one of Atlanta's many great museums.

More from this Author

Follow Us