Selenium 4, the latest iteration of the Selenium WebDriver, introduces several enhancements to streamline web automation. One notable improvement is the revamped Actions API. This low-level interface provides virtualized device input actions to the web browser, offering granular control over keyboard, mouse, pen, touch devices, and even scroll wheel interactions. In this blog, we will delve into the intricacies of the Actions API, exploring its key components and providing practical examples.
Understanding the Actions API
- Key Components: Action Builder
The Actions API introduces an intricate but powerful set of low-level building blocks. These blocks include commands for key inputs, pointer inputs, and wheel inputs. The Action Builder allows the construction of individual action commands assigned to specific inputs, which can be chained together. The associated perform method is then called to execute them collectively.
- Pause Command
While pointer movements and wheel scrolling allow users to set a duration for the action, the pause command becomes handy for inserting a delay between actions. This is crucial for ensuring correct execution in scenarios where a pause is needed.
WebElement clickable = driver.findElement(By.id("clickable")); new Actions(driver) .moveToElement(clickable) .pause(Duration.ofSeconds(1)) .clickAndHold() .pause(Duration.ofSeconds(1)) .sendKeys("abc") .perform();
- Release All Actions
An important consideration is that the driver retains the state of all input items throughout a session. The releaseAll method becomes valuable for resetting the state by releasing all currently depressed keys and pointer buttons.
((RemoteWebDriver) driver).resetInputState();
Keyboard Actions in Selenium 4
- Keys Representation
Selenium 4 provides a representation of any key input device for interacting with a web page. Aside from supporting ASCII characters, each keyboard key has a representation that can be pressed or released in designated sequences. Selenium assigns unicode values to keyboard keys for use in automation.
- Keyboard Actions Example
Here’s an example demonstrating keyboard actions using the Actions API:
new Actions(driver) .keyDown(Keys.SHIFT) .sendKeys("a") .keyUp(Keys.SHIFT) .sendKeys("b") .perform();
This sequence involves pressing the Shift key, typing ‘a’, releasing the Shift key, and then typing ‘b’.
- SendKeys Method
The sendKeys method is a convenience method in the Actions API that combines keyDown and keyUp commands in one action. It’s particularly useful when needing to type multiple characters in the middle of other actions.
new Actions(driver) .sendKeys("abc") .perform();
Mouse Actions in Selenium 4
- Mouse Actions Overview
Similar to keyboard actions, Selenium 4 provides a representation of any pointer device for interacting with a web page. The Actions API supports various mouse actions, including clicking and holding, clicking and releasing, right-clicking, double-clicking, and moving the mouse.
- Mouse Actions Example
Here’s an example demonstrating mouse actions using the Actions API:
WebElement clickable = driver.findElement(By.id("clickable")); new Actions(driver) .clickAndHold(clickable) .perform();
This action involves clicking and holding the left mouse button on a clickable element.
- Real-Time Usage Scenarios
Mouse actions find applications in scenarios like canvas drawing applications (for handling various pointer events) and testing right-click functionalities (using the contextClick method).
Conclusion
Selenium 4’s enhanced Actions API empowers testers and developers with precise control over keyboard and mouse interactions. The Action Builder, along with commands like pause and releaseAll, adds flexibility and depth to test scenarios. In the next part of this series, we will explore advanced mouse and keyboard actions and their applications. Stay tuned for a comprehensive guide to mastering Selenium 4’s Actions API.