This article will review the several different ways that analytics software packages implement first-party cookies, and how it affects the accuracy of the data you receive from those software packages. The solutions chosen have interesting implications on the overall capability of the package. These will also be identified and discussed. When contacting an analytics vendor you should make sure that the way they implement cookies will meet the needs of your site.
Analytics packages use cookies to store session information about each user that visits your site, or to store an ID that is used as an index to access session information about the user that is stored within the application. However, because of the privacy concerns people have, cookies are subject to deletion from time to time. A small number of users, about 2%, block cookies altogether.
Cookie Overview
When you open a web page, most (but not necessarily all) of the content your browser loads is from the domain you are visiting. When the site you are currently visiting loads a cookie on your system, it is considered a first party cookie.
However, many sites run ad banners or pull in other content such as images, or content via an iframe, from other sites. When this happens, those other sites can put a cookie on your computer. These are known as third-party cookies.
Ad networks place such cookies on your computer to track what web sites you visit across the web. The information in the cookie is used to improve the targeting of the ads delivered to you. Normally, this information is not personally identifiable, but privacy groups object to this type of use of cookies.
So, for this reason, third-party cookies often get deleted. One common way for them to get deleted is by anti-spyware software. Some estimates suggest that about 15% of third-party cookies get deleted. This is a big source of error for your analytics package, and worth eliminating. This is the main reason why analytics software packages use first-party cookies.
How Analytics Packages use Cookies
So what do analytics packages do with cookies? It’s a fairly straightforward process. The analytics software needs a method to track user sessions on the site. There is no way for the Javascript on a web page to do anything, except on a new page load. When the page loads, it can run, and then needs to decide if this is a new user session, a continuation of a current user session, or a first time ever visitor. The process goes something like this:
- Upon each load of a page check and see if the browser on the user’s machine has sent cookie data
- If yes:
- Store in memory the number of visits and date/time of the last click
- Check the date/time of the last click, and see if more than 30 minutes has passed.
- If yes:
- Increase the number of visits in the cookie by 1
- Write the date/time of the current visit to the cookie
- If no:
- Do not change the number of visits in the cookie
- Write the date/time of the current visit to the cookie
- If no:
- Set the number of visits in the cookie to 1
- Write the date/time of the current visit to the cookie
Note that this outline is just one way that cookies can be used to store data (and is based on an outline in Eric Peterson’s fine book Web Analytics Demystified). As discussed before, some analytics packages simply use the cookie as an index into an internal data store where the visitor history details are stored. But nonetheless, the basic process remains the same.
Cookies play a critical role in identifying unique visitors and sessions. Note that unique visitors is not the same thing as “users”. There is no way for a cookie to tell when more than one user uses the same machine, or one user is accessing the site from more than one machine (e.g. home and work computers). But they still offer the best technique available.
Remaining Problems with First-Party Cookies
In theory, a first party cookie should solve much of the problem with managing sessions. We don’t have to worry that large numbers of users will start banning all cookies, because so much web functionality depends on them. However, there are two-second level issues:
- One problem occurs with first-party cookies that then communicate their data to a third party site. People who have spyware blockers, such as Ad-Aware, actively resident in memory all the time will likely be blocking all this traffic. So even though the cookie has been read, the analytics software does not get the session data it needs.
- Another key problem is that of “Multi-Domain Sites”. A Multi-Domain Site is a group of domains that the website owner wants the analytics software to treat as if it were a single site. Why would you do this? Many media companies (for example) make heavy use of multiple domains. Other sites use shopping carts or registration systems that reside on different domains. If you simply set a first party cookie the first time a user arrives at your Multi-Domain Site, when they go to a different domain within your Multi-Domain Site, it will be unable to read that cookie (only the domain that sets a cookie can read it). So your analytics vendor Javascript will see the visit as a new session, set a new first-party cookie, and your whole session tracking plan has fallen apart because this visit now looks like multiple sessions.
Types of First-Party Cookie Implementations
There are three major methods for implementing first-party cookies that we have seen so far:
- During a user’s first visit to your site create a first party cookie on the initial domain visited. It will also set a third party cookie during the user’s first visit. If you have a Multi-Domain Site, and the user visits a different domain of your Multi-Domain Site, it will set a new first-party cookie. Then it will read the contents of the third party cookie that was set, which will be in the name of the analytics software, and copy the contents of the third party cookie into the new first-party cookie. What does this buy you? The ability to keep the same session information synchronized across the domains that make up your site. It begins to solve the Multi-Domain Site problem. However, it does not quite solve all of it. Also, it does not solve the problem of users who have memory resident anti-spyware software.
- One problem, of course, is if the user’s computer does not allow third-party cookies. This would leave you with no solution to the Multi-Domain Site session tracking problem. If you are dealing with a well-contained problem, such as an off-site shopping cart, it’s fairly easy to deal with. Some analytics packages allow you to wrap a shopping cart link in some Javascript that contains the session information. The Javascript on the shopping cart page sees the session info on the incoming link and uses it to set a first party cookie for the shopping cart domain that is in sync with the same session. This provides a better solution for the Multi-Domain Site problem because it gets around the third party cookie problem of the previous solution. However, it’s not a practical solution for those sites that truly spread their content across domains. For one thing, you would not want to wrap all those links in Javascript and complicate the processing of them by search engine robots.
- There is one more robust solution available. It involves adding a CNAME record to your DNS server (you can do this by communicating with your hosting company). This record is something that will automatically redirect queries from one domain to another. So for example, you could redirect traffic from a.yourdomain.com to analyticsvendor.com. This happens at a level that is transparent to Ad blocking and anti-spyware software products. This allows the analytics software to set a cookie on a.yourdomain.com, and to treat this the same way that the third party cookie was treated before. It’s a bit more hassle to implement and certainly heavy duty for the non-technically initiated. But it does provide a better solution for the Multi-Domain Site problem, and it also solves the spyware blocking software problem.
So you can see that there are tradeoffs between the complexity of implementation and accuracy. The CNAME record approach provides the greatest accuracy. The first party cookie / third party cookie solution should be workable for many people, and it’s simple. The solution using Javascript wrapped links helps address some additional problems. But for some businesses, the highest level of accuracy will be worth the trouble of going with the CNAME approach.
Based on your needs for analytics, and your site structure, make sure the analytics vendor you choose offers a first party cookie solution that will provide you with the solution that fits your needs.
So that’s the story with cookies and analytics. I would like to take a moment to thank WebSideStory / Visual Sciences, Clicktracks, IndexTools, and the Google Analytics teams for helping me sort this all out. In addition, my thanks to Eric T. Peterson, the author of Web Analytics Demystified, the book that started my investigation of all this, and for doing a quick review of this article to make sure I didn’t screw it up!