Coveo Articles / Blogs / Perficient https://blogs.perficient.com/category/partners/coveo/ Expert Digital Insights Mon, 15 Jul 2024 16:28:32 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Coveo Articles / Blogs / Perficient https://blogs.perficient.com/category/partners/coveo/ 32 32 30508587 Lessons from the Front: Indexing Content Hub in Coveo https://blogs.perficient.com/2024/05/29/lessons-from-the-front-indexing-content-hub-in-coveo/ https://blogs.perficient.com/2024/05/29/lessons-from-the-front-indexing-content-hub-in-coveo/#respond Wed, 29 May 2024 16:56:52 +0000 https://blogs.perficient.com/?p=363431

Intro 📖

In the new composable world, it’s common for medium to large Sitecore solutions to include a search appliance like Coveo and a digital asset management tool like Sitecore Content Hub. A typical use-case is to build search sources in Coveo that index content residing in Content Hub. Those indexes, in turn, can then be used to build front-end search experiences. In this blog post, I’d like to cover a few tips for working with the Content Hub REST API to populate search sources in Coveo. These tips are based on my experiences on a recent project that used the Content Hub REST API to index PDF documents in Coveo.

#1 – Knowing Which API to Use 🤔

Having not previously used the Content Hub REST API, I wasn’t initially aware that there are several endpoints. Here’s a quick rundown of a few of them:

Query API (GET http://<hostname>/api/entities/query/)

The Querying feature allows you to query for specific entities using specific indexed metadata fields. This basic querying is contrasted against the more elaborate search functionality offered by the M.Content API.

Scroll API (GET http://<hostname>/api/entities/scroll/)

You can use Scroll API to retrieve a large number of results (or even all results) from a single query.

It does not support the skip parameter and only lets you request the next page through the resource. You can continue paging until it no longer returns results or you have reached the last page.

SearchAfter API (POST http://<HOSTNAME>/api/entities/searchafter/)

The SearchAfter API is used to fetch multiple pages of search results sequentially. To start, a request is made for the first page of results, which includes a last_hit_data value corresponding to the last item in the page. This value is then used to fetch subsequent pages until all results are retrieved.

On this particular project, the Query API was used to pull PDFs. By design, the Query API returns a maximum of 10k results. In this case, that was okay–there were something like ~9k assets in Content Hub at the time (without any additional filtering applied). However, in order to future-proof the query a little and to avoid unnecessary processing of non-PDF documents, it made sense to make the query more specific (see #2, below 👇).

Net out: If you know you’ll need to pull 10k+ items from Content Hub and efficiently paginate through all of them, use the SearchAfter API. If your number of assets is smaller than 10k, then the Query API is probably fine. Note that the SearchAfter API will soon deprecate and replace the Scroll API so it’s best to avoid the Scroll API for any new work.

#2 – Filtering on Media Type and Approval Status 🔮

I like to think that I can figure most things out if I read the documentation. However, when it came to updating the query to filter down to approved PDFs for indexing, it wasn’t at all clear to me how to do that. As mentioned above, the Query API is limited to 10k results and we were pretty close to that in terms of total asset count. It was important to be more selective when pulling assets such that only approved PDFs were returned.

After unsuccessfully experimenting for while, I broke down and opened a Sitecore support ticket to ask how that could be accomplished. I got an answer…and it worked, but it wasn’t as obvious as I would have liked it to be. Who likes magic numbers? 🧙‍♂️✨

To query for PDF assets: ... AND Parent('AssetMediaToAsset').id==1057.

To ensure that only approved assets are included: ... AND Parent('FinalLifeCycleStatusToAsset')==544.

Putting it together, the full query URL (without any ordering applied; see #3 below 👇) was:

{baseURL}/api/entities/query?query=Definition.Name=='M.Asset' AND Parent('AssetMediaToAsset').id==1057 AND Parent('FinalLifeCycleStatusToAsset').id==544

In other words:

Give me all assets whose file type is PDF and whose approval status is approved.

Now, I think these IDs are common across all Content Hub instances but, just in case, please make sure they match the appropriate values in your Content Hub instance prior to using the same IDs in your queries. You can find the asset media type IDs under Taxonomy Management in Content Hub:

Asset media types under Content Hub's Taxonomy Management interface.

Asset media types in Content Hub’s Taxonomy Management interface.

#3 – Sorting 🔼

When you’re building a REST API source in Coveo with the intention of iterating through hundreds or thousands of assets in Content Hub, it’s best to return them in a consistent order. At one point during the troubleshooting of some indexing issues, Coveo support suggested that the Content Hub API was returning results in an inconsistent order and that that was potentially a contributing factor. While that was never conclusively shown to be the case, it does make sense to apply a sort, even if only to ensure assets are processed in a specific, predictable order.

The query was updated to sort on createdOn ascending (oldest first); the updated query URL looked like this:

{baseURL}/api/entities/query?query=Definition.Name=='M.Asset' AND Parent('AssetMediaToAsset').id==1057 AND Parent('FinalLifeCycleStatusToAsset').id==544&sort=createdOn&order=Asc

Interestingly enough, I found that created_on worked, too, but, according to Sitecore support, createdOn should be used instead.

#4 – Paging 📃

REST API sources in Coveo will almost always be configured to paginate through the results coming from the external API, otherwise only the first page’s worth of data will be processed and indexed. It’s important to ensure paging is configured correctly to allow for reasonable index rebuild and rescan times, too. In this case, using the Query API, and with a page size of 25 items per page, the paging configuration section in the Coveo REST API source looked like this:

...
"paging": {
  "pageSize": 25,
  "offsetType": "url",
  "nextPageKey": "next.href",
  "parameters": {
    "limit": "take"
  },
  "totalCountKey": "total_items"
},
...

The corresponding paging properties as returned in the Query API response (for the first page) looked like this:

{
  "items": [ ... ],
  "total_items": 12345,
  "returned_items": 25,
  "next": {
    "href": "https://{baseURL}/api/entities/query?skip=25&take=25&query=Definition.Name%3D%3D%27M.Asset%27%20AND%20Parent(%27AssetMediaToAsset%27).id%3D%3D1057%20AND%20Parent(%27FinalLifeCycleStatusToAsset%27).id%3D%3D%20544&sort=createdOn&order=Asc",
    ...
  },
  ...
}

Note that the paging configuration may need to change if you’re using a different Content Hub API endpoint. For more information about configuring paging in Coveo REST API sources, refer to the official documentation.

#5 – File Size Can Affect Document Properties in Extensions 🏋️‍♂️

In Coveo, the maximum size for a single item is approximately 256 MB (reference). That number includes the item’s permissions, metadata, and content. For larger files, the content isn’t indexed, just the metadata. This limit came to light indirectly on this recent project.

While outside the scope of this post, Coveo supports extensions that can be attached to search sources. Extensions are bits of Python code that Coveo will run in the context of each document while processing the source. On this project, an extension was used to do things like conditionally reject (skip indexing) documents, set metadata fields based on other properties, etc. At one point, the extension attempted to resolve the extension (file type) for the document using the following code:

filetype = document.get_meta_data_value("detectedfiletype")[0]

For any documents not above the maximum size, the filetype variable would have the expected value: "pdf". For any documents that were above the maximum size, the variable had a generic value that, while non-empty, was also not the expected file type. Because the document was too large, the document object available within the extension didn’t have the expected values, including detectedfiletype. As a result, because the file was large, some logic within the extension broke as this case wasn’t accounted for.

Upon further investigation of the PDFs in Content Hub, it was noted that, of the 10 or so that consistently exhibited indexing issues, all of them were 300+ MB in size.

For more information on indexing pipeline extensions (IPE), please see Indexing pipeline extension overview.

Net out: If you’re using an extension on a source and you’re noticing that the document object has one or more properties that aren’t returning what you’d expect to see, double-check to ensure that the underlying document isn’t > 256 MB and that you aren’t trying to access properties within the extension that will never correctly resolve.

 

Thanks for the read! 🙏

 

]]>
https://blogs.perficient.com/2024/05/29/lessons-from-the-front-indexing-content-hub-in-coveo/feed/ 0 363431
Coveo Recognizes Perficient Colleagues as MVPs in 2024 https://blogs.perficient.com/2024/03/14/coveo-recognizes-perficient-colleagues-as-mvps-in-2024/ https://blogs.perficient.com/2024/03/14/coveo-recognizes-perficient-colleagues-as-mvps-in-2024/#respond Thu, 14 Mar 2024 19:24:09 +0000 https://blogs.perficient.com/?p=358711

Perficient Receives Seven “Most Valuable Professional” Recognitions

We are proud to share our inclusion in the Coveo MVP Program with 7 of our own selected for the recognition. The Coveo MVP program recognizes individuals for their invaluable contributions and expertise within the Coveo ecosystem.

To be considered for the recognition of Coveo MVP, one must possess a deep understanding of the platform and its business value to demonstrate strong thought leadership in the community. In addition, individuals must deploy quality implementations that enhance adoption and boast long-term success for customers using Coveo.

Join us in congratulating our MVPs:

  1. Eric Immerman, Director
  2. Kristofer Quinn, Senior Technical Architect
  3. Ryan Weeber, Lead Technical Consultant
  4. Rohit Patidar, Lead Technical Consultant
  5. Fernando Rodriguez Rojas, Lead Technical Consultant
  6. William Kirkconnell, Technical Consultant
  7. Zachary Fischer, Senior Solutions Architect

First time MVP recipient and Lead Technical Consultant, Ryan Weeber bolsters excitement on his inclusion:

“I am thrilled to be recognized for the first time as a Coveo MVP! It’s such an honor to be included among this incredibly talented group of professionals.”

Eric Immerman, Coveo practice director celebrates his 4th consecutive inclusion in the Coveo MVP program and reflects on his team’s success:

“Being recognized once more as a Coveo MVP remains a fantastic acknowledgment of the expertise and hard work my team and I contribute towards ensuring our clients fully benefit from and use the Coveo platform effectively.”

 

Coveo at Perficient

Our Coveo practice helps brands design, architect and implement modern intelligent search solutions that empower users to be more successful in delivering a winning customer experience. As a Platinum partner and two time Accelerator Award recipient, we pride ourselves on providing unique, innovative solutions to our clients.

We would like to extend a special thanks to our team and their continued dedication to demonstrating key thought leadership and technical expertise to grow the Perficient + Coveo partnership. To learn more about Perficient’s Coveo solutions, visit our Coveo partner page, subscribe to Perficient’s blog and follow us on LinkedIn and Twitter.

]]>
https://blogs.perficient.com/2024/03/14/coveo-recognizes-perficient-colleagues-as-mvps-in-2024/feed/ 0 358711
Perficient Wins 2023-24 Coveo Relevance Accelerator Award https://blogs.perficient.com/2024/01/25/perficient-wins-2023-24-coveo-relevance-accelerator-award/ https://blogs.perficient.com/2024/01/25/perficient-wins-2023-24-coveo-relevance-accelerator-award/#respond Thu, 25 Jan 2024 23:19:18 +0000 https://blogs.perficient.com/?p=354699

Perficient is excited to announce we’ve won Coveo’s exclusive partner award! The Accelerator Award commemorates a Coveo partner that exhibited deep knowledge and technical expertise, understands a customers’ business challenges, and clearly delivers value-driven business outcomes.

Perficient is a trusted Coveo Platinum Partner with expertise in modern intelligent search solutions. Our award-winning work helps organizations provide their customers and employees with relevant search results and recommendations while increasing revenue, boosting conversion rates, and improving employee efficiency.Ccs 8554 Ad Relevanceawards 1080x1080 D1

This award marks the second time Perficient has received a Coveo partner award. These awards are a testament to the incredible work our teams deliver using the Coveo platform to accelerate outcomes for our clients. Huge shoutout to Perficient’s own Eric Immermann, Director Enterprise Search, and Kyla Faust, Alliance Manager, for their investment in the partnership, as well as the extended team for all their continuous collaboration in making our joint customers successful.

“We’re honored to be recognized by Coveo for the delivery of value-driven business outcomes,” said Eric Immermann. “Coveo serves as a powerful platform as customers demand more personalized and conversational experiences. In partnering with them, Perficient can deploy the latest advancements in AI and search to our clients.”

Our cross-technology and platform expertise enables us to seamlessly integrate intelligent search with a variety of enterprise applications to unlock the value of information and transform your business. We leverage the best features across industry-leading platforms to provide innovative solutions and drive outcomes that meet the unique needs of each client. With Coveo, you can expect to see tangible results such as higher productivity, improved customer satisfaction, and increased revenue.

]]>
https://blogs.perficient.com/2024/01/25/perficient-wins-2023-24-coveo-relevance-accelerator-award/feed/ 0 354699