Large scale data breaches and critical security vulnerabilities have companies thinking about security more than ever. Many developers are familiar with the OWASP top 10 (https://owasp.org/www-project-top-ten/) and there are already many resources on generic mitigation for these vulnerabilities. Instead in this series, I cover security issues and mitigations specific to AEM. Today’s topic is Denial of Service vulnerabilities.
Previous posts in series:
Denial of Service Vulnerabilities
Ever visit a popular new website only to find that it has crashed or is throwing errors? The site visitors may have unintentionally caused a denial of service issue by overloading the server processor or memory. Malicious actors can intentionally cause the same issue by sending a high amount of actions to a server.
User-triggered creation of JCR nodes
DoS attack prevention starts at the business requirements, and is fortified through technical design. Be very careful in designing any services, even authenticated, that create new JCR nodes as part of their behavior. Malicious users may be able to exploit this behavior by flooding the JCR with node writes. Worse still, they may be able to achieve Remote Code Execution by uploading new JSP servlet files, OSGI configs, OSGI bundle uploads, or persistent XSS. Therefore, apps should never programmatically create new JCR nodes under the /apps path. This is good practice for later working in AEM as a Cloud Service, where /apps is completely immutable after deployment.
If all this isn’t considered from the design stage, you may have some massive refactoring and data access rewrites later on. Consider this example. You have a public API within your application that writes to the JCR every time it receives a request. A malicious user could quickly hit that endpoint 1000s of times, creating a massive write queue in your system. So, how to fix it? Do you remove all public access while you try to sort out the vulnerability? That may create a frustrating experience for legitimate users. Instead, let’s examine some practical prevention measures.
- Double check your ACLs for anonymous users write access on /content/usergenerated or other paths. Consider removing unauthenticated access to generate content.
- Utilize rate limiting by IP, User, and/or API key. IP rate limits can be somewhat circumvented by malicious users using a botnet or VPN. Whereas User and API Key rate limits do not prevent access to public APIs or page loads. So, it’s ideal to implement a combination of them for full coverage.
- Consider implementing audit logging and rate alerts for any authenticated actions. This will allow you to quickly identify and revoke access from any users who have turned malicious.
- Avoid JCR writes and large in memory processing. Set restrictions on the amount and type of content a user can upload.
- Utilize vulnerability scanning tools like SonarQube and Checkmarx. They will typically catch vulnerabilities where the user is able to input arbitrarily sized or un-validated parameters to your application.
- Perform load testing for new services – a high number of calls should not significantly degrade the rest of the application.
Heavy processing services / jobs
Many use cases exist for jobs to run on AEM environments. These are typically started via an admin content page, API endpoint, Sling scheduler, Sling scheduled jobs, or combination of the four. These ideally kick off asynchronously processed tasks via Sling Jobs, an AEM workflow, or Adobe IO rather than using immediate processing power. If using Sling Jobs and AEM Workflows, you can limit the instance to use only up to a certain amount of processor cores, lessening the total potential load on the server. The default configuration is set at half the number of available total cores, so it may be useful to tweak this number down in applications with constant asynchronous processing. Also, it’s important to note that Adobe recommends never increasing the configuration above half the number of total cores.
In any case, I highly recommend that heavy processing should protected by authentication + ACLs and only be accessible from author environments. The data can then flow out to a common data store or in non-cloud environments, replicate to publishers as needed. But these days, the replication strategy is not as common, as large I/O writes cause processor churn and a later debt in JCR compaction.
Restricting JCR writes or who has access to a service is typically a conversation that should start in early solutioning and business requirements. Don’t wait until technical implementation to have this discussion! Stakeholders may need to budget additional funding for a shared data store, or entirely rethink how to provide public services, leading to project delays or failures. Build key stakeholder relationships and design against attacks early. Remember, perfect implementation on its own cannot fix a flawed design.
For more information on how Perficient can help you achieve your AEM Security goals and implement your dream digital experiences, we’d love to hear from you.
Contact Perficient to start your journey.