This document is essentially a cheat sheet for the official WebDriver spec (which has in-progress drafts available on GitHub). The official spec is designed for implementers to have very detailed information about processing algorithms and so on. Much of the information in the spec is not targeted towards those who are simply writing client libraries, or even users who want a closer look at the API. It also uses language which is so exact it can sometimes obfuscate an intuitive understanding of a section.
The approach used here is to simply look at the supported endpoints along with their inputs and outputs, without worrying too much how the implementation is supposed to work. This should be beneficial to client library implementers as well as remote end implementers looking for some quick highlights. And in most cases there are examples to illustrate
DISCLAIMER: This is not the official spec. It is my interpretation of it and an attempt to present the most salient bits of it in a more digestible fashion. You should always consult the official spec before beginning work on a client or server implementation!
What is WebDriver? From the spec:
WebDriver is a remote control interface that enables introspection and control of user agents. It provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers.
Essentially, it’s a client-server protocol that allows you to automate web browsers. Clients send well-formed requests, the server interprets them according to the protocol, and then performs the automation behaviors as defined by the implementation steps in the spec. The most common use of this technology is for automated testing.
The WebDriver spec grew out of the Selenium project, and that is still the community of users pushing forward the associated browser automation technology and using it every day to write and run automated tests. Browser vendors now also support the WebDriver spec natively.
WebDriver has gone beyond the web, with implementations for mobile and desktop app automation. The Appium project is a set of WebDriver-compliant servers that allow automation of these non-web-browser platforms.
Automation is organized around WebDriver sessions, whose state is maintained across requests via a ‘session id’ token shared by the server and client. Creating a new session involves sending parameters in the form of capabilities, which tell the server what you want to automate and under what conditions. The server prepares the appropriate browser with any modifications as specified in the capabilities, and the session is then underway. Automation commands and responses are sent back and forth (keyed on the session id), until the client sends a request to delete the session, at which point the browser and other resources are quit or cleaned up, and the session id is discarded.
When the client (called a local end) sends a request to the server (called a remote end), this is known in the spec as a ‘command’. Since this is an HTTP protocol, commands have several components:
The remote end at this point looks up the command based on the HTTP verb and path. The spec defines a list of endpoints that map verb + path to a command name. The path portion is actually a list of “URI Templates” that show how path components should be extracted as parameters for the command. For example, in:
/session/{session id}/element
The {session id} bit is saying that this component of the path is a “url variable” called session id whose value will be sent to the command (in this case Find Element). Once a command is matched to the request, other data is potentially parsed from the request body (these are the “parameters”), the command is executed (having been passed any url variables and request parameters), and a response is returned.
A request from the local end to the remote end is a valid HTTP request, with a verb, path, and potentially a body. As mentioned above, the remote end validates the request and attempts to map it to a command. If the request can’t be mapped to a command, an unknown method error is returned (see below for what it means to return an error).
There is one command (New Session) which does not require a session id url variable. Every other command requires this variable, since every other command is executed in the context of an existing session. If we are not requesting a new session, the remote end immediately validates the session id against the list of active sessions. If it’s not found, an invalid session id error is returned.
In the case of a POST request, the local end might have sent data in the request body. This data must always be JSON data. The remote end first parses it as JSON (if this fails, an invalid argument error is returned). If the result of the parse is not a JSON object (i.e., if it’s a string or array or number or what have you), an invalid argument is likewise returned. Otherwise, the result of parsing is the set of “parameters” which is passed to the command.
(In the case of a POST request without a request body, the “parameters” value is null.)
When a remote end sends an HTTP response, it first of all uses an appropriate HTTP status code and message (for example, 404 and no such element), based on the command that was attempted and the result. The spec defines status codes and messages for various responses, including success and error conditions (see below).
It then sets the following headers:
Content-Type: "application/json; charset=utf-8"Cache-Control: "no-cache"If any data needs to be returned with the response, it is serialized into a JSON object with the key value, e.g.:
{"value": null}
And this becomes the body of the HTTP response.
When an error has not occurred, the HTTP status is 200, and the response body is the appropriate JSON object with the response data in the value property of the JSON object.
When an error occurs, the remote end first of all determines the appropriate error code and corresponding HTTP status code (see below for the full list). For example, if an element could not be found, the error code is no such element and the corresponding HTTP status code is 404. The remote end then constructs a data JSON object with the properties error, message, and stacktrace. Here error is just the JSON code for the error (see table below; usually the same as the error code itself). message is whatever implementation-specific error message is appropriate. And likewise stacktrace is an implementation-specific stacktrace useful to implementation maintainers in diagnosing any issues.
An example error JSON object could look like:
{
"error": "no such element",
"message": "My fake implementation couldn't find your element",
"stacktrace": "Fake:21> Not a real stacktrace"
}
Since this JSON object becomes the data for the response, the full response from the remote end would be an HTTP status code of 404, the headers listed above, and finally the following JSON string as the response body:
{
"value": {
"error": "no such element",
"message": "My fake implementation couldn't find your element",
"stacktrace": "Fake:21> Not a real stacktrace"
}
}
The following is a list of all the possible errors, their HTTP status codes, and their JSON error codes:
| Error code | HTTP Status | JSON code | Description |
|---|---|---|---|
| element click intercepted | 400 | element click intercepted | The Element Click command could not be completed because the element receiving the events is obscuring the element that was requested clicked. |
| element not selectable | 400 | element not selectable | An attempt was made to select an element that cannot be selected. |
| element not interactable | 400 | element not interactable | A command could not be completed because the element is not pointer- or keyboard interactable. |
| insecure certificate | 400 | insecure certificate | caused the user agent to hit a certificate warning, which is usually the result of an expired or invalid TLS certificate. |
| invalid argument | 400 | invalid argument | The arguments passed to a command are either invalid or malformed. |
| invalid cookie domain | 400 | invalid cookie domain | An illegal attempt was made to set a cookie under a different domain than the current page. |
| invalid coordinates | 400 | invalid coordinates | The coordinates provided to an interactions operation are invalid. |
| invalid element state | 400 | invalid element state | A command could not be completed because the element is in an invalid state, e.g. attempting to click an element that is no longer attached to the document. |
| invalid selector | 400 | invalid selector | Argument was an invalid selector. |
| invalid session id | 404 | invalid session id | Occurs if the given session id is not in the list of active sessions, meaning the session either does not exist or that it’s not active. |
| javascript error | 500 | javascript error | An error occurred while executing JavaScript supplied by the user. |
| move target out of bounds | 500 | move target out of bounds | The target for mouse interaction is not in the browser’s viewport and cannot be brought into that viewport. |
| no such alert | 400 | no such alert | An attempt was made to operate on a modal dialog when one was not open. |
| no such cookie | 404 | no such cookie | No cookie matching the given path name was found amongst the associated cookies of the current browsing context’s active document. |
| no such element | 404 | no such element | An element could not be located on the page using the given search parameters. |
| no such frame | 400 | no such frame | A command to switch to a frame could not be satisfied because the frame could not be found. |
| no such window | 400 | no such window | A command to switch to a window could not be satisfied because the window could not be found. |
| script timeout | 408 | script timeout | A script did not complete before its timeout expired. |
| session not created | 500 | session not created | A new session could not be created. |
| stale element reference | 400 | stale element reference | A command failed because the referenced element is no longer attached to the DOM. |
| timeout | 408 | timeout | An operation did not complete before its timeout expired. |
| unable to set cookie | 500 | unable to set cookie | A command to set a cookie’s value could not be satisfied. |
| unable to capture screen | 500 | unable to capture screen | A screen capture was made impossible. |
| unexpected alert open | 500 | unexpected alert open | A modal dialog was open, blocking this operation. |
| unknown command | 404 | unknown command | A command could not be executed because the remote end is not aware of it. |
| unknown error | 500 | unknown error | An unknown error occurred in the remote end while processing the command. |
| unknown method | 405 | unknown method | The requested command matched a known URL but did not match an method for that URL. |
| unsupported operation | 500 | unsupported operation | Indicates that a command that should have executed properly cannot be supported for some reason. |
In this section, we go through each endpoint and examine its inputs and outputs and potential errors. The conventions I use are:
“URL variables”: variable strings slotted into URI templates
“Request parameters”: properties of the JSON object in the request body. Could be “None”, which means no body
“Response value”: the value of the value property of the response body, when that is a single, non-object value.
“Response properties”: properties of a JSON object which is the value of the value property of the response body. For example, in this JSON response body:
{"value": {"foo": "bar"}}
I’m calling foo a “response property” with a value of "bar".
“Possible errors”: errors and codes it’s possible for the command to return in case of an error specific to that command. Note that regardless of what’s in this list, it’s always possible for some errors to occur (e.g., invalid session id or unknown error. As another example, most endpoints attempt to handle user prompts in the course of operation, which might result in unexpected alert open. See Handling User Prompts for more information). A value of “None” here means “no particularly relevant errors”, not that it’s not possible for an error to occur!
| Method | URI Template | Command |
|---|---|---|
| POST | /session | New Session |
| DELETE | /session/{session id} | Delete Session |
| GET | /status | Status |
| GET | /session/{session id}/timeouts | Get Timeouts |
| POST | /session/{session id}/timeouts | Set Timeouts |
| POST | /session/{session id}/url | Go |
| GET | /session/{session id}/url | Get Current URL |
| POST | /session/{session id}/back | Back |
| POST | /session/{session id}/forward | Forward |
| POST | /session/{session id}/refresh | Refresh |
| GET | /session/{session id}/title | Get Title |
| GET | /session/{session id}/window | Get Window Handle |
| DELETE | /session/{session id}/window | Close Window |
| POST | /session/{session id}/window | Switch To Window |
| GET | /session/{session id}/window/handles | Get Window Handles |
| POST | /session/{session id}/frame | Switch To Frame |
| POST | /session/{session id}/frame/parent | Switch To Parent Frame |
| GET | /session/{session id}/window/rect | Get Window Rect |
| POST | /session/{session id}/window/rect | Set Window Rect |
| POST | /session/{session id}/window/maximize | Maximize Window |
| POST | /session/{session id}/window/minimize | Minimize Window |
| POST | /session/{session id}/window/fullscreen | Fullscreen Window |
| POST | /session/{session id}/element | Find Element |
| POST | /session/{session id}/elements | Find Elements |
| POST | /session/{session id}/element/{element id}/element | Find Element From Element |
| POST | /session/{session id}/element/{element id}/elements | Find Elements From Element |
| GET | /session/{session id}/element/active | Get Active Element |
| GET | /session/{session id}/element/{element id}/selected | Is Element Selected |
| GET | /session/{session id}/element/{element id}/attribute/{name} | Get Element Attribute |
| GET | /session/{session id}/element/{element id}/property/{name} | Get Element Property |
| GET | /session/{session id}/element/{element id}/css/{property name} | Get Element CSS Value |
| GET | /session/{session id}/element/{element id}/text | Get Element Text |
| GET | /session/{session id}/element/{element id}/name | Get Element Tag Name |
| GET | /session/{session id}/element/{element id}/rect | Get Element Rect |
| GET | /session/{session id}/element/{element id}/enabled | Is Element Enabled |
| POST | /session/{session id}/element/{element id}/click | Element Click |
| POST | /session/{session id}/element/{element id}/clear | Element Clear |
| POST | /session/{session id}/element/{element id}/value | Element Send Keys |
| GET | /session/{session id}/source | Get Page Source |
| POST | /session/{session id}/execute/sync | Execute Script |
| POST | /session/{session id}/execute/async | Execute Async Script |
| GET | /session/{session id}/cookie | Get All Cookies |
| GET | /session/{session id}/cookie/{name} | Get Named Cookie |
| POST | /session/{session id}/cookie | Add Cookie |
| DELETE | /session/{session id}/cookie/{name} | Delete Cookie |
| DELETE | /session/{session id)/cookie | Delete All Cookies |
| POST | /session/{session id}/actions | Perform Actions |
| DELETE | /session/{session id}/actions | Release Actions |
| POST | /session/{session id}/alert/dismiss | Dismiss Alert |
| POST | /session/{session id}/alert/accept | Accept Alert |
| GET | /session/{session id}/alert/text | Get Alert Text |
| POST | /session/{session id}/alert/text | Send Alert Text |
| GET | /session/{session id}/screenshot | Take Screenshot |
| GET | /session/{session id}/element/{element id}/screenshot | Take Element Screenshot |
| HTTP Method | Path Template |
|---|---|
| POST | /session |
The
New Sessioncommand creates a new WebDriver session with the endpoint node. If the creation fails, asession not createderror is returned.
URL variables:
Request parameters:
capabilities: a JSON object with a special structure that’s so complex it deserves its own section. See Capabilities under Other Topics below.
Example:
{"capabilities": {...}}
Response properties:
sessionId: a string, the UUID reference of the session, to be used in subsequent requests
capabilities: a JSON object, the set of capabilities that was ultimately merged and matched in the capability processing algorithm.
Example:
{
"value": {
"sessionId": "1234567890",
"capabilities": {...}
}
}
Possible errors:
session not created (500): if the session could not be started for a variety of reasons:
invalid argument (400): if the capabilities object was malformed in some way (see section on capabilities for examples)| HTTP Method | Path Template |
|---|---|
| DELETE | /session/{session id} |
The
Delete Sessioncommand closes any top-level browsing contexts associated with the current session, terminates the connection, and finally closes the current session.
session id: the id of a currently active sessionnull| HTTP Method | Path Template |
|---|---|
| GET | /status |
The
Statuscommand returns information about whether a remote end is in a state in which it can create new sessions and can additionally include arbitrary meta information that is specific to the implementation.
URL variables:
Request parameters:
Response properties:
ready: boolean value; whether the server has the capability to start more sessions
message: implementation-specific string describing readiness state
arbitrary other properties denoting metadata returned by the remote end
Example:
{
"value": {
"ready": true,
"message": "server ready",
"uptime": 123457890
}
}
Possible errors:
| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/timeouts |
The
Get Timeoutscommand gets timeout durations associated with the current session.
URL variables:
session idRequest parameters:
Response properties:
script: value (in ms) of the session script timeout
A session has an associated session script timeout that specifies a time to wait for scripts to run. If equal to null then session script timeout will be indefinite. Unless stated otherwise it is 30,000 milliseconds.
pageLoad: value (in ms) of the session page load timeout
A session has an associated session page load timeout that specifies a time to wait for the page loading to complete. Unless stated otherwise it is 300,000 milliseconds.
implicit: value (in ms) of the session implicit wait timeout
A session has an associated session implicit wait timeout that specifies a time to wait in milliseconds for the element location strategy when retreiving elements and when waiting for an element to become interactable when performing element interaction. Unless stated otherwise it is zero milliseconds.
Example:
{
"value": {
"script": 30000,
"pageLoad": 300000,
"implicit": 0
}
}
Possible errors:
| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/timeouts |
The
Set Timeoutscommand sets timeout durations associated with the current session. The timeouts that can be controlled are listed in the table of session timeouts below.
URL variables:
session idRequest parameters: Send one or more of the following parameters. For definition of what each of these timeouts means, see Get Timeouts above.
script: integer in ms for session script timeout
pageLoad: integer in ms for session page load timeout
implicit: integer in ms for session implicit wait timeout
Example:
{
"script": 1000,
"pageLoad": 7000,
"implicit": 5000
}
Response value:
nullPossible errors:
invalid argument (400) if a parameter property was not a valid timeout, or was not an integer in the range [0, 264 - 1]| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/url |
The
Gocommand is used to cause the user agent to navigate the current top-level browsing context a new location.
URL variables:
session idRequest parameters:
url: string representing an absolute URL (beginning with http(s)), possibly including a fragment (#...). Could also be a local scheme (about: etc).{
"url": "https://jlipps.com"
}
Response value:
nullPossible errors:
invalid argument (400) if:
url parameter is missingurl parameter doesn’t conform to above spectimeout (408) if url is different from the current URL, and the new page does not load within the page load timeout.| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/url |
The
Get Current URLcommand returns the URL of the current top-level browsing context.
URL variables:
session idRequest parameters:
Response value:
current document URL of the top-level browsing context
Example:
{
"value": "https://google.com"
}
Possible errors:
no such window (400) if the current top-level browsing context is no longer open| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/back |
The
Backcommand causes the browser to traverse one step backward in the joint session history of the current top-level browsing context. This is equivalent to pressing the back button in the browser chrome or callingwindow.history.back.
session idnullno such window (400) if the current top-level browsing context is no longer opentimeout (408) if it took longer than the page load timeout for the pageShow event to fire after navigating back| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/forward |
The
Forwardcommand causes the browser to traverse one step forwards in the joint session history of the current top-level browsing context.
session idnullno such window (400) if the current top-level browsing context is no longer opentimeout (408) if it took longer than the page load timeout for the pageShow event to fire after navigating forward| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/refresh |
The
Refreshcommand causes the browser to reload the page in in current top-level browsing context.
session idnullno such window (400) if the current top-level browsing context is no longer opentimeout (408) if it took longer than the page load timeout for the pageShow event to fire after navigating forward| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/title |
The
Get Titlecommand returns the document title of the current top-level browsing context, equivalent to callingdocument.title.
URL variables:
session idRequest parameters:
Response value:
a string which is the same as document.title of the current top-level browsing context.
Example:
{
"value": "My web page title"
}
Possible errors:
no such window (400) if the current top-level browsing context is no longer open| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/window |
The
Get Window Handlecommand returns the window handle for the current top-level browsing context. It can be used as an argument toSwitch To Window.
URL variables:
session idRequest parameters:
Response value:
a string which is the window handle for the current top-level browsing context
Example:
{
"value": "window-1234-5678-abcd-efgh"
}
Possible errors:
no such window (400) if the current top-level browsing context is no longer open| HTTP Method | Path Template |
|---|---|
| DELETE | /session/{session id}/window |
The Close Window command closes the current top-level browsing context. Once done, if there are no more top-level browsing contexts open, the WebDriver session itself is closed.
session idnullno such window (400) if the current top-level browsing context is not open when this command is first called| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/window |
The
Switch To Windowcommand is used to select the current top-level browsing context for the current session, i.e. the one that will be used for processing commands.
URL variables:
session idRequest parameters:
handle: a string representing a window handle. Should be one of the strings that was returned in a call to Get Window Handles.{"handle": "asdf-1234-jklo-5678"}
Response value:
nullPossible errors:
no such window (400) if the window handle string is not recognizedunsupported operation (500) if a prompt presents changing focus to the new window| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/window/handles |
The
Get Window Handlescommand returns a list of window handles for every open top-level browsing context. The order in which the window handles are returned is arbitrary.
session idAn array which is a list of window handles.
Example:
{
"value": ["handle1", "handle2", "handle3"]
}
| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/frame |
The
Switch To Framecommand is used to select the current top-level browsing context or a child browsing context of the current browsing context to use as the current browsing context for subsequent commands.
URL variables:
session idRequest parameters:
id: one of three possible types:
null: this represents the top-level browsing context (i.e., not an iframe)window object corresponding to a frameExample:
{"id": 2}
Response value:
nullPossible errors:
no such frame (400) if a frame could not be found based on the id parameter, or if the element represented by the id parameter is not a framestale element reference (400) if the element found via the id parameter is stale| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/frame/parent |
The
Switch to Parent Framecommand sets the current browsing context for future commands to the parent of the current browsing context.
session idnullno such window (400) if the current browsing context is no longer open| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/window/rect |
The
Get Window Rectcommand returns the size and position on the screen of the operating system window corresponding to the current top-level browsing context.
session idA JSON representation of a “window rect” object. This has 4 properties:
x: the screenX attribute of the window objecty: the screenY attribute of the window objectwidth: the width of the outer dimensions of the top-level browsing context, including browser chrome etc…height: the height of the outer dimensions of the top-level browsing context, including browser chrome etc…Example:
{
"value": {
"x": 0,
"y": 23,
"width": 1280,
"height": 960
}
}
| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/window/rect |
The
Set Window Rectcommand alters the size and the position of the operating system window corresponding to the current top-level browsing context.
Basically, the command takes a set of JSON parameters corresponding to the window rect object described in the Get Window Rect command. These parameters are optional. If x and y are both present, the window is moved to that location. If width and height are both present, the window (including all external chrome) is resized as close as possible to those dimensions (though not larger than the screen, smaller than the smallest possible window size, etc…).
URL variables:
session idRequest parameters:
x: optional integer (-263 < i < 263 - 1) (defaults to null)
y: optional integer (-263 < i < 263 - 1) (defaults to null)
width: optional integer (0 < i 264 - 1) (defaults to null)
height: optional integer (0 < i 264 - 1) (defaults to null)
Example:
{"x": 100, "y": 100, "width": 200, "height": 400}
Response value:
A JSON representation of a “window rect” object based on the new window state:
x: the screenX attribute of the window objecty: the screenY attribute of the window objectwidth: the width of the outer dimensions of the top-level browsing context, including browser chrome etc…height: the height of the outer dimensions of the top-level browsing context, including browser chrome etc…Example:
{
"value": {
"x": 10,
"y": 80,
"width": 900,
"height": 500
}
}
Possible errors:
no such window (400) if the top level browsing context is not openunsupported operation (500) if the remote end does not support changing window position / dimensioninvalid argument (400) if the parameters don’t conform to the restrictions| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/window/maximize |
The
Maximize Windowcommand invokes the window manager-specific “maximize” operation, if any, on the window containing the current top-level browsing context. This typically increases the window to the maximum available size without going full-screen.
session idA JSON representation of a “window rect” object based on the new window state:
x: the screenX attribute of the window objecty: the screenY attribute of the window objectwidth: the width of the outer dimensions of the top-level browsing context, including browser chrome etc…height: the height of the outer dimensions of the top-level browsing context, including browser chrome etc…Example:
{
"value": {
"x": 10,
"y": 80,
"width": 900,
"height": 500
}
}
no such window (400) if the top level browsing context is not openunsupported operation (500) if the remote end does not support maximizing windows| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/window/minimize |
The
Minimize Windowcommand invokes the window manager-specific “minimize” operation, if any, on the window containing the current top-level browsing context. This typically hides the window in the system tray.
session idA JSON representation of a “window rect” object of the (new) current top-level browsing context:
x: the screenX attribute of the window objecty: the screenY attribute of the window objectwidth: the width of the outer dimensions of the top-level browsing context, including browser chrome etc…height: the height of the outer dimensions of the top-level browsing context, including browser chrome etc…Example:
{
"value": {
"x": 10,
"y": 80,
"width": 900,
"height": 500
}
}
no such window (400) if the top level browsing context is not openunsupported operation (500) if the remote end does not support maximizing windows| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/window/fullscreen |
The
Fullscreen Windowcommand invokes the window manager-specific “full screen” operation, if any, on the window containing the current top-level browsing context. This typically increases the window to the size of the physical display and can hide browser chrome elements such as toolbars.
session idA JSON representation of a “window rect” object of the browsing context:
x: the screenX attribute of the window objecty: the screenY attribute of the window objectwidth: the width of the outer dimensions of the top-level browsing context, including browser chrome etc…height: the height of the outer dimensions of the top-level browsing context, including browser chrome etc…Example:
{
"value": {
"x": 10,
"y": 80,
"width": 900,
"height": 500
}
}
no such window (400) if the top level browsing context is not openunsupported operation (500) if the remote end does not support maximizing windows| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/element |
The
Find Elementcommand is used to find an element in the current browsing context that can be used for future commands.
URL variables:
session idRequest parameters:
using: a valid element location strategy
value: the actual selector that will be used to find an element
Example:
{"using": "css selector", "value": "#foo"}
Response value:
A JSON representation of an element object:
element-6066-11e4-a52e-4f735466cecf: a string UUID representing the found elementNote that the property above is not an example, it is literally the sole property of every returned element object
Example:
{
"value": {
"element-6066-11e4-a52e-4f735466cecf": "1234-5789-0abc-defg"
}
}
Possible errors:
invalid argument (400) if the location strategy is invalid or if the selector is undefinedno such window (400) if the top level browsing context is not openno such element (404) if the element could not be found after the session implicit wait timeout has elapsed| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/elements |
The
Find Elementscommand is used to find elements in the current browsing context that can be used for future commands.
URL variables:
session idRequest parameters:
using: a valid element location strategy
value: the actual selector that will be used to find an element
Example:
{"using": "css selector", "value": "#foo"}
Response value:
A (possibly empty) JSON list of representations of an element object. Each representation is itself a JSON object with the following property:
element-6066-11e4-a52e-4f735466cecf: a string UUID representing the found elementExample:
{
"value": [
{"element-6066-11e4-a52e-4f735466cecf": "1234-5789-0abc-defg"},
{"element-6066-11e4-a52e-4f735466cecf": "5678-1234-defg-0abc"}
]
}
Possible errors:
invalid argument (400) if the location strategy is invalid or if the selector is undefinedno such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/element/{element id}/element |
The
Find Element From Elementcommand is used to find an element from a web element in the current browsing context that can be used for future commands.
URL variables:
session idelement id: the id of an element returned in a previous call to Find Element(s). This is the element which will be taken as the root element for the context of this Find commandRequest parameters:
using: a valid element location strategy
value: the actual selector that will be used to find an element
Example:
{"using": "css selector", "value": "#foo"}
Response value:
A JSON representation of an element object:
element-6066-11e4-a52e-4f735466cecf: a string UUID representing the found elementNote that the property above is not an example, it is literally the sole property of every returned element object
Example:
{
"value": {
"element-6066-11e4-a52e-4f735466cecf": "1234-5789-0abc-defg"
}
}
Possible errors:
invalid argument (400) if the location strategy is invalid or if the selector is undefinedno such window (400) if the top level browsing context is not openno such element (404) if the element could not be found after the session implicit wait timeout has elapsedstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/element/{element id}/elements |
The
Find Elements From Elementcommand is used to find elements from a web element in the current browsing context that can be used for future commands.
URL variables:
session idelement id: the id of an element returned in a previous call to Find Element(s). This is the element which will be taken as the root element for the context of this Find commandRequest parameters:
using: a valid element location strategy
value: the actual selector that will be used to find an element
Example:
{"using": "css selector", "value": "#foo"}
Response value:
A (possibly empty) JSON list of representations of an element object. Each representation is itself a JSON object with the following property:
element-6066-11e4-a52e-4f735466cecf: a string UUID representing the found elementExample:
{
"value": [
{"element-6066-11e4-a52e-4f735466cecf": "1234-5789-0abc-defg"},
{"element-6066-11e4-a52e-4f735466cecf": "5678-1234-defg-0abc"}
]
}
Possible errors:
invalid argument (400) if the location strategy is invalid or if the selector is undefinedno such window (400) if the top level browsing context is not openstale element reference (404) if the root element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/active |
Get Active Elementreturns the active element of the current browsing context’s document element.
session idA JSON representation of the active element:
element-6066-11e4-a52e-4f735466cecf: a string UUID representing the found elementNote that the property above is not an example, it is literally the sole property of every returned element object
Example:
{
"value": {
"element-6066-11e4-a52e-4f735466cecf": "1234-5789-0abc-defg"
}
}
no such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/selected |
Is Element Selecteddetermines if the referenced element is selected or not. This operation only makes sense on input elements of the Checkbox- and Radio Button states, or option elements.
session idelement id: the id of an element returned in a previous call to Find Element(s)true or false based on the selected state
Example:
{
"value": true
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/attribute/{name} |
The
Get Element Attributecommand will return the attribute of a web element.
session idelement id: the id of an element returned in a previous call to Find Element(s)name: name of the attribute value to retrieveThe named attribute of the element. There are three possibilities
true is returnednull is returnedExample:
{
"value": "checkbox"
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/property/{name} |
The
Get Element Propertycommand will return the result of getting a property of an element.
session idelement id: the id of an element returned in a previous call to Find Element(s)name: name of the attribute property to retrieveThe named property of the element, accessed by calling GetOwnProperty on the element object. If the property is undefined, null is returned.
Example:
{
"value": "foo"
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/css/{property name} |
The
Get Element CSS Valuecommand retrieves the computed value of the given CSS property of the given web element.
session idelement id: the id of an element returned in a previous call to Find Element(s)property name: name of the CSS property to retrieveThe computed value of the parameter corresponding to property name from the element’s style declarations (unless the document type is xml, in which case the return value is simply the empty string)
Example:
{
"value": "15px"
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/text |
The
Get Element Textcommand intends to return an element’s text “as rendered”. An element’s rendered text is also used for locatingaelements by their link text and partial link text.
session idelement id: the id of an element returned in a previous call to Find Element(s)The visible text of the element (including child elements), following the algorithm defined in the Selenium Atoms for bot.dom.getVisibleText
Example:
{
"value": "Hello world"
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/name |
The
Get Element Tag Namecommand returns the qualified element name of the given web element.
session idelement id: the id of an element returned in a previous call to Find Element(s)The tagName attribute of the element
Example:
{
"value": "INPUT"
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/rect |
The
Get Element Rectcommand returns the dimensions and coordinates of the given web element.
session idelement id: the id of an element returned in a previous call to Find Element(s)A JSON object representing the position and bounding rect of the element (all in CSS reference pixels):
x: the absolute x-coordinate of the element, relative to the document (not the screen)y: the absolute y-coordinate of the element, relative to the document (not the screen)width: the width of the bounding rectangle for the elementheight: the height of the bounding rectangle for the elementExample:
{
"value": {
"x": 200,
"y": 300,
"width": 20,
"height": 50,
}
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/enabled |
Is Element Enableddetermines if the referenced element is enabled or not. This operation only makes sense on form controls.
session idelement id: the id of an element returned in a previous call to Find Element(s)If the element is in an xml document, or is a disabled form control: bolean false
Otherwise, boolean true
Example:
{
"value": true
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is stale| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/element/{element id}/click |
The
Element Clickcommand scrolls into view the element if it is not already pointer-interactable, and clicks its in-view center point. If the element’s center point is obscured by another element, anelement click interceptederror is returned. If the element is outside the viewport, anelement not interactableerror is returned.
session idelement id: the id of an element returned in a previous call to Find Element(s)null
Example:
{
"value": null
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is staleinvalid argument (400) if the element is an input element in the file upload stateelement not interactable (400) if the element is not in view even after an attempt was made to scroll it into viewelement click intercepted (400) if the element was obscured by another elementtimeout (408) if post-click navigation did not happen within the page load timeoutunknown error (500) if post-click navigation failed due to a network errorinsecure certificate (400) if post-click navigation was blocked by a content security policy| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/element/{element id}/clear |
The
Element Clearcommand scrolls into view an editable or resettable element and then attempts to clear its selected files or text content.
session idelement id: the id of an element returned in a previous call to Find Element(s)null
Example:
{
"value": null
}
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is staleinvalid element state (400) if the element is (a) neither content editable nor both editable and resettable, or (b) disabled, read-only, or has pointer events disabledelement not interactable (400) if the element does not become interactable after the implicit wait timeout| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/element/{element id}/value |
The
Element Send Keyscommand scrolls into view the form control element and then sends the provided keys to the element. In case the element is not keyboard-interactable, an element not interactable error is returned.The key input state used for input may be cleared mid-way through “typing” by sending the null key, which is U+E000 (NULL).
URL variables:
session idelement id: the id of an element returned in a previous call to Find Element(s)Request parameters:
text: string to send as keystrokes to the element. There are three basic possibilities for what happens with this:
text is typed as an appendix to existing text.text is split on newlines (\n) and considered a list of files to select. The files must actually exist.text is set as the value property of the control.Example:
{"text": "hello world"}
Response value:
null
Example:
{
"value": null
}
Possible errors:
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is staleinvalid argument (400) if the text request parameter is not a string, or if the element is a file input and actual files are not sent, or if the element is has a non-standard UI and is suffering from bad input after being setinvalid element state (400) if the element is (a) neither content editable nor both editable and resettable, or (b) disabled, read-only, or has pointer events disabledelement not interactable (400) if the element does not become keyboard-interactable after the implicit wait timeout, or if the element has a non-standard UI and no value property to set| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/source |
The
Get Page Sourcecommand returns a string serialization of the DOM of the current browsing context active document.
session idThe serialized document source
Example:
{
"value": "<html>...</html>"
}
no such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/execute/sync |
The
Execute Scriptcommand executes a JavaScript function in the context of the current browsing context and returns the return value of the function.
URL variables:
session idRequest parameters:
script: A string, the Javascript function body you want executed. Note that this is the function body, not a function definition. What happens is that script is parsed as a JS function body, and if this parsing succeeds, an internal function is created with this body. The function is then called with window as the context, and any arguments you provided applied as well (see next parameter). This whole process is wrapped in Promise.resolve, so if your script returns a Promise, its fulfillment (or rejection) will propagate to the return value of this command. If you want a return value of your script to be passed to the local end, you’ll need to explicitly return in your script.
args: An array of JSON values which will be deserialized and passed as arguments to your function (accessible via the JS arguments array). Note that WebDriver element references may be passed in their object form, and they will be deserialized to actual web elements in your function.
Example 1:
{
"script": "let [num1, num2] = arguments; return num1 + num2;",
"args": [5, 6]
}
Example 2:
{
"script": "let name = arguments[0]; return new Promise((resolve, reject) => { window.setTimeout(() => { resolve('hello ' + name); }, 1000); });",
"args": ["world"]
}
Example 3:
{
"script": "throw new Error('boo');",
"args": []
}
Response value:
Either the return value of your script, the fulfillment of the Promise returned by your script, or the error which was the reason for your script’s returned Promise’s rejection.
Example 1:
{
"value": 11
}
Example 2:
{
"value": "hello world"
}
Example 3:
{
"value": {
"error": "boo",
"message": "boo",
"stacktrace": "at <anonymous:1:11>"
}
}
Possible errors:
no such window (400) if the top level browsing context is not openjavascript error (500) if script could not be parsed as a function body, or if the function completes abruptly, or if the script results in a Promise rejected with any reason other than an errortimeout (408) if the script does not generate a return value or completed Promise by the time the session script timeout elapses| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/execute/async |
The
Execute Async Scriptcommand causes JavaScript to execute as an anonymous function. Unlike the Execute Script command, the result of the function is ignored. Instead an additional argument is provided as the final argument to the function. This is a function that, when called, returns its first argument as the response.
URL variables:
session idRequest parameters:
script: A string, the Javascript function body you want executed. Note that this is the function body, not a function definition. What happens is that script is parsed as a JS function body, and if this parsing succeeds, an internal function is created with this body. The function is then called with window as the context, and any arguments you provided applied as well (see next parameter). In addition, a callback function reference is appended to the list of arguments, available as the last item in the arguments list. When this function is called in your script, the first parameter to it is returned as the result of this command.
args: An array of JSON values which will be deserialized and passed as arguments to your function (accessible via the JS arguments array). Note that WebDriver element references may be passed in their object form, and they will be deserialized to actual web elements in your function.
Example 1:
{
"script": "let [num1, num2, cb] = arguments; cb(num1 + num2);",
"args": [5, 6]
}
Example 2:
{
"script": "let [name, cb] = arguments; window.setTimeout(() => { cb('hello ' + name); }, 1000);",
"args": ["world"]
}
Response value:
The JSON serialization of the value of the first parameter sent to the callback function, or any error triggered during execution
Example 1:
{
"value": 11
}
Example 2:
{
"value": "hello world"
}
Example 3:
{
"value": {
"error": "boo",
"message": "boo",
"stacktrace": "at <anonymous:1:11>"
}
}
Possible errors:
no such window (400) if the top level browsing context is not openjavascript error (500) if script could not be parsed as a function body, or if the function completes abruptly, or if the script results in a Promise rejected with any reason other than an errortimeout (408) if the script does not generate a return value or completed Promise by the time the session script timeout elapses| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/cookie |
The
Get All Cookiescommand returns all cookies associated with the address of the current browsing context’s active document.
session idA list of serialized cookies. Each serialized cookie has a number of optional fields which may or may not be returned in addition to name and value.
Example:
{
"value": [
{"name": "cookie1", "value": "hello"},
{"name": "cookie2", "value": "goodbye"}
]
}
no such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/cookie/{name} |
The
Get Named Cookiecommand returns the cookie with the requested name from the associated cookies in the cookie store of the current browsing context’s active document. If no cookie is found, a no such cookie error is returned.
session idname: Name of the cookie to retrieveA serialized cookie, with name and value fields. There are a number of optional fields like path, domain, and expiry-time which may also be present.
Example:
{
"value": {
"name": "cookie1",
"value": "hello"
}
}
no such window (400) if the top level browsing context is not openno such cookie (404) if no cookie exists for the given name| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/cookie |
The
Add Cookiecommand adds a single cookie to the cookie store associated with the active document’s address.
URL variables:
session idRequest parameters:
cookie: A JSON object representing a cookie. It must have at least the name and value fields and could have more, including expiry-time and so on (see full list).
Example :
{
"cookie": {
"name": "mycookie",
"value": "hi",
"path": "/",
"domain": "foo.com"
}
}
Response value:
nullPossible errors:
no such window (400) if the top level browsing context is not openinvalid argument (400) if the cookie is missing a required field (name and value), or if the cookie’s domain does not match the browsing context’s domain, or invalid data is sent for one of the optional fieldsunable to set cookie (500) if some error occurred and the cookie could not be set| HTTP Method | Path Template |
|---|---|
| DELETE | /session/{session id}/cookie/{name} |
The
Delete Cookiecommand allows you to delete either a single cookie by parameter name, or all the cookies associated with the active document’s address if name is undefined.
session idname: the name of the cookie, or undefinednullno such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| DELETE | /session/{session id}/cookie |
The
Delete All Cookiescommand allows deletion of all cookies associated with the active document’s address.
session idnullno such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/actions |
Actions are a very complex portion of the spec. Some preliminary understanding of concepts is useful:
id.null input source
pause: “Used with an integer argument to specify the duration of a tick, or as a placeholder to indicate that an input source does nothing during a particular tick.”key input source
pause: same as for nullkeyDown: “Used to indicate that a particular key should be held down.”keyUp: “Used to indicate that a depressed key should be released.”pointer input source. This kind also has a pointer type specifying which kind of pointer it is (which can be mouse, pen, or touch):
pause: same as for nullpointerDown: “Used to indicate that a pointer should be depressed in some way e.g. by holding a button down (for a mouse) or by coming into contact with the active surface (for a touch or pen device).”pointerUp: “Used to indicate that a pointer should be released in some way e.g. by releasing a mouse button or moving a pen or touch device away from the active surface.”pointerMove: “Used to indicate a location on the screen that a pointer should move to, either in its active (pressed) or inactive state.”pointerCancel: “Used to cancel a pointer action.”URL variables:
session idRequest parameters:
actions: a list of input source actions. In other words, a list of objects, each of which represents an input source and its associated actions. Each input source must have the following properties:
type: String, one of pointer, key, or noneid: String, a unique id chosen to represent this input source for this and future actionsparameters property, which is an object with a pointerType key specifying either mouse, pen, or touch. If parameters is omitted, the pointerType is considered to be mouse.)actions: a list of action objects for this particular input source. An action object has different fields based on the kind of input device it belongs to:
type: can only be the string pauseduration: integer >= 0, representing time in millisecondstype: string, one of the actions listed above (pointerDown, etc…).pause: integer property duration as abovepointerUp or pointerDown: integer property button (>= 0, representing which button is pressed/released)pointerMove:
duration: integer in msorigin: either (a) string, one of viewport or pointer, or (b) an object representing a web element. Defaults to viewport if origin is omitted.x: integer, x-value to move to, relative to either viewport, pointer, or element based on originy: integer, y-value to move to, relative to either viewport, pointer, or element based on originpointerCancel: this action is not yet defined by the spectype: string, one of the actions listed above (keyUp or keyDown)pause: integer property duration as abovekeyUp or keyDown:
value: a string containing a single Unicode code point (any value in the Unicode code space). Basically, this is either a “normal” character like “A”, or a Unicode code point like “\uE007” (Enter), which can include control characters.Example 1 (expressing a 1-second pinch-and-zoom with a 500ms wait after the fingers first touch):
{
"actions": [
{
"type": "pointer",
"id": "finger1",
"parameters": {"pointerType": "touch"},
"actions": [
{"type": "pointerMove", "duration": 0, "x": 100, "y": 100},
{"type": "pointerDown", "button": 0},
{"type": "pause", "duration": 500},
{"type": "pointerMove", "duration": 1000, "origin": "pointer", "x": -50, "y": 0},
{"type": "pointerUp", "button": 0}
]
}, {
"type": "pointer",
"id": "finger2",
"parameters": {"pointerType": "touch"},
"actions": [
{"type": "pointerMove", "duration": 0, "x": 100, "y": 100},
{"type": "pointerDown", "button": 0},
{"type": "pause", "duration": 500},
{"type": "pointerMove", "duration": 1000, "origin": "pointer", "x": 50, "y": 0},
{"type": "pointerUp", "button": 0}
]
}
]
}
Example 2 (equivalent to typing CTRL+S and releasing the keys, though releasing would be better performed by a call to Release Actions):
{
"actions": [
{
"type": "key",
"id": "keyboard",
"actions": [
{"type": "keyDown", "value": "\uE009"},
{"type": "keyDown", "value": "s"},
{"type": "keyUp", "value": "\uE009"},
{"type": "keyUp", "value": "s"}
]
}
]
}
Response value:
nullPossible errors:
no such window (400) if the top level browsing context is not openinvalid argument (400) if actions is not an array, or if an action sequence is set which has a mismatched pointerType, or if the inner actions is not an array, or in general if any of the requirements for parameter types and values described above are not met| HTTP Method | Path Template |
|---|---|
| DELETE | /session/{session id}/actions |
The
Release Actionscommand is used to release all the keys and pointer buttons that are currently depressed. This causes events to be fired as if the state was released by an explicit series of actions. It also clears all the internal state of the virtual devices.
session idnullno such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/alert/dismiss |
The
Dismiss Alertcommand dismisses a simple dialog if present. A request to dismiss an alert user prompt, which may not necessarily have a dismiss button, has the same effect as accepting it.
session idnullno such window (400) if the top level browsing context is not openno such alert (404) if there is no current user prompt| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/alert/accept |
The
Accept Alertcommand accepts a simple dialog if present.
session idnullno such window (400) if the top level browsing context is not openno such alert (404) if there is no current user prompt| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/alert/text |
The
Get Alert Textcommand returns the message of the current user prompt. If there is no current user prompt, it returns an error.
URL variables:
session idRequest parameters:
Response value:
The message of the user prompt
Example:
{
"value": "XSS Hax!"
}
Possible errors:
no such window (400) if the top level browsing context is not openno such alert (404) if there is no current user prompt| HTTP Method | Path Template |
|---|---|
| POST | /session/{session id}/alert/text |
The
Send Alert Textcommand sets the text field of awindow.promptuser prompt to the given value.
URL variables:
session idRequest parameters:
text: string to set the prompt to
Example:
{"text": "My prompt response"}
Response value:
nullPossible errors:
no such window (400) if the top level browsing context is not openno such alert (404) if there is no current user promptinvalid argument (400) if text is not a stringelement not interactable (400) if the prompt is an alert or confirmation dialog (these do not support setting text)unsupported operation (500) if the prompt is otherwise not a prompt| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/screenshot |
The
Take Screenshotcommand takes a screenshot of the top-level browsing context’s viewport.
URL variables:
session idRequest parameters:
Response value:
The base64-encoded PNG image data comprising the screenshot of the initial viewport
Example:
{
"value": "iVBORw0KGgoAAAANSUhEUgAAARMAAAFBC..."
}
Possible errors:
no such window (400) if the top level browsing context is not open| HTTP Method | Path Template |
|---|---|
| GET | /session/{session id}/element/{element id}/screenshot |
The
Take Element Screenshotcommand takes a screenshot of the visible region encompassed by the bounding rectangle of an element. If given a parameter argument scroll that evaluates to false, the element will not be scrolled into view.
URL variables:
session idelement id: the id of a web elementRequest parameters:
scroll: boolean, whether or not to scroll the element into view before taking the screenshot. Defaults to true.
Example:
{"scroll": false}
Response value:
The base64-encoded PNG image data comprising the screenshot of the initial viewport
Example:
{
"value": "iVBORw0KGgoAAAANSUhEUgAAARMAAAFBC..."
}
Possible errors:
no such window (400) if the top level browsing context is not openstale element reference (404) if the element is staleno such element (404) if the element id is unknownFrom the spec:
WebDriver capabilities are used to communicate the features supported by a given implementation.
Capabilities are used by the client (local end) in order to tell the remote end what it expects, and is also used by the remote end to tell the local end what it can do. In terms of structure, capabilities are simply a JSON object with keys and values (values which themselves can be objects). There are a set of “standard capabilities” that all remote ends must support:
| Capability | Key | Value Type | Description |
|---|---|---|---|
| Browser name | browserName |
string | Identifies the user agent. |
| Browser version | browserVersion |
string | Identifies the version of the user agent. |
| Platform name | platformName |
string | Identifies the operating system of the endpoint node. |
| Accept insecure TLS certificates | acceptInsecureCerts |
boolean | Indicates whether untrusted and self-signed TLS certificates are implicitly trusted on navigation for the duration of the session. |
| Page load strategy | pageLoadStrategy |
string | Defines the current session’s page load strategy. Can be none (doesn’t wait for readiness), normal (waits for document interactive state), or eager (waits for document complete state). |
| Proxy configuration | proxy |
JSON Object | Defines the current session’s proxy configuration. This is a potentially complex object: see the spec for more info. |
| Window dimensioning/positioning | setWindowRect |
boolean | Indicates whether the remote end supports all of the commands in Resizing and Positioning Windows. |
| Session timeouts configuration | timeouts |
JSON Object | Describes the timeouts imposed on certain session operations, as described in the Set Timeouts command. |
| Unhandled prompt behavior | unhandledPromptBehavior |
string | Describes the current session’s user prompt handler. |
Remote ends can support capabilities beyond these, but they must be prefixed with a string followed by a colon, for example moz:foobar. This is therefore a possible set of capabilities (ignoring external structure detailed in the next section):
{
"browserName": "firefox",
"browserVersion": "1234",
"moz:foobar": true
}
During the execution of the New Session command, the remote end looks at the capabilities object passed by the client, and attempts to process it in order to set up the correct automation environment. There is a complex algorithm that defines this process. The capabilities object always has two properties: alwaysMatch (a set of capabilities) and firstMatch (a list of sets of capabilities):
{
"capabilities": {
"alwaysMatch": {...},
"firstMatch": [{...}, ...]
}
}
Basically, the remote end validates the alwaysMatch set and each set within the firstMatch list. Then it merges the alwaysMatch set with each of the firstMatch sets. Call each result of this process a “merged capabilities” object. (Note that the merge will error out with invalid argument if any capability in a firstMatch is already present in the alwaysMatch set.) The remote end then tries to match each merged capabilities object one-by-one. “Matching” is the process of ensuring that each capability can be unified with the remote end capability. For example, if a merged capability is platformName with a value of mac, but the remote end’s platformName is windows, the set of merged capabilities it belongs to would not match. On the other hand, if both were mac, we would have a match. The process stops with the first match, which is then returned in the New Session response.
Window Handles are strings representing a browsing context, whether top-level or not. The precise string is up to the remote end to generate, but it must not be the string current. See the spec for more details.
User prompts are alerts, confirmation dialogs, etc…, that block the event loop and require interaction before control is returned to a browsing context. There are 2 ways these can be handled:
Dismiss Alert, Accept Alert, etc…)unhandledPromptBehavior capability. Appropriate values can be one of the two strings accept and dismiss.If the unhandledPromptBehavior capability is not set, then if a prompt is active, any command except for the Alert commands will result in an unexpected alert open error. This error may include a text property in the data field of the response, set to the text of the active prompt (in order to help with debugging).
If the unhandledPromptBehavior capability is set, then at various points in the session, if a prompt blocks the execution of one of the WebDriver algorithms, it will be automatically handled according to the capability:
accept means to accept all user prompts/alertsdismiss means all alerts should be dismissedSee the section in the spec for more detailed algorithm.
Location strategies are used in conjunction with the Find Element series of commands. They instruct the remote end which method to use to find an element using the provided locator. The valid locator strategies are:
| Strategy | Keyword |
|---|---|
| CSS selector | css selector |
| Link text selector | link text |
| Partial link text selector | partial link text |
| Tag name | tag name |
| XPath selector | xpath |