Very often we take frameworks for granted. Hystrix with Javanica annotation is one such. It may look as simple as a @HystrixCommand over a method and can easily spiral down to a complex performance or resource depletion issue.
Let’s go over the various behaviors of Hystrix such as :
- Circuit is open
- Circuit is closed
- Fallback semaphore count reached
- Hystrix timeout before socket read time out
- Semaphore and thread pool behavior when Hystrix timeout before socket read timeout
- The call to the Hystrix command wrapped under @HystrixCommand. Here you may choose Semaphore or thread pool, note that there is a difference in behavior which will be discussed later.
- The Aspect Oriented Programming (Spring AOP) looks for the annotation and runs bunch of metrics related to the Hystrix. This includes checking if the circuit is open.
- If the Circuit is open :
- Check if the thread pool/semaphore count of the fallback is reached (11)
- If it’s reached throw an exception about not being able to run the fallback method
- If all is good, the fallback command is executed.
- fallback.isolation.semaphore.maxConcurrentRequests property is used to set the number for fallback execution.
- Fallback maximum concurrent requests properties must be specified, preferably equal to the threadpool semaphore count size.
- If the Circuit is closed :
- If thread is available to run the hystrix command retrieve it to run the command.
- If the maximum size of the thread pool is reached, throw an exception which contains messages related to the current in use thread size and maximum size of the thread pool.
- Then follow the same flow as a circuit is open steps.
- If the Circuit is closed and a thread is available to run the command :
- Check if the HTTP Connection pool has a thread to execute our REST exchange call.
- If available make a connection within CONNECITON_TIMEOUT. If unable to create a connection throw an exception and move to fallback.
- If the connection is made, then execute the rest call and wait for the SOCKET_TIMEOUT or the complete response. (10)
- Note that the socket timeout is the time between each packet over the wire and not the total http call time.
- Now, in a scenario where the http thread is waiting on the socket read, it’s possible for the hystrix to timeout. In this case :
- Hystrix starts a timer thread to run the fallback command. This special thread does not contain the ThreadContext data and is new. So if you are using log4j thread context expect inconsistencies in your logging. (8)
- If you are using threadpool, the point of execution goes to the fallback and immediately returns back to the client.
- But if you are using semaphore it does not return to the client immediately!! It’ll wait until the http socket read is completed and then return to the client from the fallback method.
Thread pool vs Semaphore on this behavior :
Here is a simplified diagram. The above explained behavior raises a question ! If we have a thread pool count of 4 and a semaphore count of 4, which gives us a better isolation in terms of the http thread pool being consumed.
In a thread pool scenario, the maximum number of http thread that can be obtained is 4. As any request, during the time when the 4 threads are used will result in fallback execution.
This is because the hystrix thread pool count remains 4 and the hystrix threads are NOT RELEASED to be used until the Http threads have finished their job. So if you have a hystrix thread pool size of 1, and a http pool size of 10, after your first request all the request go to fallback execution until the first http thread finishes it job, irrespective of the hystrix timeout.
In case of semaphore, while the thread is waiting on socket read, the hystrix timeout causes the execution to move to fallback in a timer thread. But at the same time, it also updates the semaphore count. This could be a potential problem as the next request would end up executing the hystrix command in a different thread, even though we have a thread waiting on socket read.