In my earlier post, User Admin Console (Classic UI) & Crashing Browsers, I discussed how business analysts and administrators can make use of queries to debug user and group-related issues in AEM. In relation to that topic, I noticed we have a business requirement to add users to AEM, but we do not have a requirement to remove users from the repository.
Why do we need to remove users from the Java Content Repository (JCR)?
The answer is simple – we continue to add new users to JCR. As days go by, the size of the repository grows. The child nodes under the “/home/users” path grow. Any query run on the user path will have a performance issue. To increase the performance of the query or traversal of child nodes under user path, we can remove unwanted nodes. These nodes are ghost user nodes.
What creates ghost user nodes?
- After registering to the site, a user does not visit the site for an extended period of time. The user node resides in the repository as a ghost.
- The corporate portal hosted in AEM will have employee information as user nodes. One common scenario occurs when an employee has left the company. Despite the individual leaving, the user node created in AEM will reside as a ghost user node.
- A user who registered on the website may forget his credentials, and subsequently create a new account. When this happens, the old account will remain on the repository as a ghost user node.
Inspired Digital Experiences for Manufacturing & Automotive
Whether you’re just beginning your digital transformation journey or are well on your way, we invite you to explore our partnership with Adobe and our diverse capabilities in manufacturing and automotive.
Of course, depending on the application, there may be different scenarios unique to that application that cause the creation of ghost user nodes.
How to solve the repository growth caused by ghost user nodes
“Exorcism” on the server? Not exactly. There is simple solution to remove the unused user nodes.
Factors to consider:
- Avoid deleting system users and OOTB users.
- Look closely at users who have not logged in more than 6 months.
- AEM is often not the single source of user data.
- If AEM is the single source of user data, consider your corporate policy (in some cases, user data cannot be deleted).
- Remove the authors who have left the company or are not working in AEM currently.
- Consider any individual users who are important to the system and company; do not remove them.
What are the options to delete?
- If the user source is an external system, obtain the list of users who have not logged into AEM in the last six months. Next, run a script to remove those users from AEM publish servers.
- If the user source is from LDAP, make use of “org.apache.jackrabbit.oak: External Identity Synchronization Management (UserManagement)” JMX Service purgeOrphanedUsers() method to remove ghost users in the repository.
- Create a JCR query to fetch all the users who have not logged into AEM for a long period of time. Consider the above factors as query parameters.
SELECT * FROM [rep:User] AS s WHERE ISDESCENDANTNODE([/home/users]) AND NOT ISDESCENDANTNODE([/home/users/system]) AND NOT s.”rep:authorizableId” IN (“admin”, “anonymous”,”replication-agent”) AND (s.”jcr:created” < CAST(‘2017-02-20T18:26:49.328-06:00’ AS DATE)) AND s.”rep:externalId” IS NOT NULL
Note: Before running this solution on the production environment, have a comprehensive plan including a detailed backup strategy (should you need it).
What are some scenarios you’ve run into that have caused the creation of ghost user nodes? Did you have issues when removing? Feel free to leave a comment below.