Auto Web Login - Part IV
The grand finale is here, where lazy manπΆβ Gigi defeats anti-Javascript counter-measures in password-protected forms using the Rust programming language π¦.
The complete source code for the entire project is available here:
https://github.com/the-gigi/auto-web-login
“Laziness is the mother of invention.” ~ Anonymous
π Recap π
In Auto Web Login - Part I we learned how to use TamperMonkey π to automatically click buttons on a web page. But, the script was pretty gnarly π€’.
In Auto Web Login - Part II we let Python π generate the TamperMonkey script for us. This was a great improvement, but there was one annoying loose end we needed to take care of - browser tabs lingering around.
In Auto Web Login - Part III we used AppleScript π to close the browser tabs for us. But, it turns out that some buttons are not accessible from JavaScript or Applescript.
Let’s use Rust π¦ to automate this part as well.
π Clicking Stubborn Buttons π
Some buttons are more stubborn than others. It turns out that forms with password fields can’t be clicked from Javascript (or Applescript). Even if the password is populated by auto-complete! The browser’s security apparatus requires user interaction. This comes up often when the SSO provider itself (e.g. OneLogin or Okta) requires re-authorization.
Alright, browser automation fails us here. Let’s take it to the next level (or the previous level) and use OS automation to emulate an interactive user.
The idea is to use Javascript to locate the button on the page and then use OS-level automation to move the mouse to the button and click it.
Python has a great library called pyautogui that can move
the mouse, send keyboard events, and take screenshots. Unfortunately, I discovered that
when pyautogui
runs on the Mac it creates a blinking rocket π icon in the dock . This is a
deal-breaker for me. I can’t have a blinking rocket icon in the dock every time it runs. I tried
futilely to find a way to get rid of it.
Eventually, I took the only logical path π§ , threw away my Python code ( well, it’s still on GitHub) and decided to switch Rust π¦. I mean, why not add another programming language a project that already has Javascript, Python and Applescript? π€·ββοΈ
π¦ The auto_click
Rust program π¦
This ended up as its own little project. If you’re not familiar with Rust you’ll get acquainted today :-)
Here is the directory structure:
β― tree
.
βββ Cargo.lock
βββ Cargo.toml
βββ config.yaml
βββ src
βββ main.rs
The Cargo.toml file is the project file, which contains some metadata and the dependencies
[package]
name = "auto-click"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
dirs = "5.0.1"
enigo = "0.2.0"
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9.34"
The config.yaml
file contains the URL and the button finding logic (in Javascript). This is very
similar to the
Python config file we used to generate the button finding logic in the TamperMonkey script.
clicks:
- url: "https://acke.onelogin.com/login2/"
query:
"Array.from(document.querySelectorAll('span')).some(span => span.textContent.includes('Change Authentication Factor')) ? null : document.querySelector('button[type=\"submit\"]')"
Note that the query is a Javascript expression that returns the button to click. We will see very soon how it’s used.
Finally, the src/main.rs
file contains the Rust code that makes the magic happen. Let’s lay out
the plan first and then see how the code does it.
π The Plan π
- Read the config file
- Loop forever
- For each URL in the config file:
- Check if the URL is open in the browser
- Wait for the page to load
- Run the Javascript query to locate the button
- Move the mouse to the button
- Click the button
π¦π¦π The Code Turducken π¦π¦π
The auto_click
program is a code turducken of multiple programming languages. The main Rust code
executes an Applescript that in turn runs Javascript queries in the browser.
Let’s break it down piece by piece.
π¦ The Imports π¦
It all starts with a lot of imports. Some of them are from
Rust’s standard library, some are from the serde
library that allows us to read the config file
and some are from the enigo
library, which does the heavy lifting of moving the mouse around and
clicking buttons.
use enigo::{Enigo, Button, Mouse, Settings};
use std::process::Command;
use std::thread;
use std::time::Duration;
use serde::{Deserialize, Serialize};
use std::fs;
use std::error::Error;
use enigo::Coordinate::Abs;
use enigo::Direction::Click;
π οΈ The Configuration π οΈ
The Config
struct is a Rust representation of the config.yaml
file. It contains a list
of ClickConfig
structs.
#[derive(Serialize, Deserialize, Debug)]
struct ClickConfig {
url: String,
query: String,
}
#[derive(Serialize, Deserialize, Debug)]
struct Config {
clicks: Vec<ClickConfig>,
}
The load_click_config
function reads the config.yaml
file and parses it into a Config
struct.
fn load_click_config() -> Result<Vec<ClickConfig>, Box<dyn Error>> {
// Get the config filename
let home_dir = dirs::home_dir().ok_or("Unable to find the home directory")?;
let config_file = home_dir.join(".auto-click/config.yaml");
// Read the entire file as a string
let data = fs::read_to_string(config_file)?;
// Parse the string as YAML
let config: Config = serde_yaml::from_str(&data)?;
Ok(config.clicks)
}
π Running AppleScript from Rust π
Rust can run AppleScript using the Command
struct. The osascript
function takes an AppleScript
as a string and runs it . Pretty simple.
fn osascript(script: &str) -> Result<String, Box<dyn std::error::Error>> {
let output = Command::new("osascript")
.arg("-e")
.arg(script)
.output()?;
if output.status.success() {
Ok(String::from_utf8(output.stdout)?)
} else {
Err(format!("Error executing AppleScript: {}", String::from_utf8(output.stderr)?).into())
}
}
π― Locating Buttons on the Screen π―
Now, we’re getting to the meat of the program. We need to locate the button on the screen in screen coordinates. This is done by running a Javascript query in the browser and then performing some coordinate system translations.
Let’s create a simple page with a button on it and figure out how to locate it.
<!DOCTYPE html>
<html>
<head>
<title>Ze Button!</title>
</head>
<body>
<button style="font-size: 32px;
background-color: green;
color: white;
margin: 10px;">
Ze Button!
</button>
</body>
</html>
Here is our button (note the 10px margin):
Let’s see what can we find with the Browser DevTools:
b = document.querySelector('button')
<button style="font-size: 32px; background-color: green; color: white; margin: 10px;"> Ze Button! </button>
b.offsetLeft
18
b.offsetTop
18
The offsetLeft
and offsetTop
properties give us the position of the button relative to the
top-left corner of the viewport (the reason it’s not 10 like our margin is that there are additional
elements like border width and default offsets). But it doesn’t account for the offset of the
viewport from the top of the browser window (this part includes the tab bar, the URL bar and
possibly the bookmarks bar). We can calculate it using the following formula:
> window.screenY + window.outerHeight - window.innerHeight
286
We also need to find the origin of the browser window itself in screen coordinates. Here is how to do it with Applescript:
β― echo '
tell application "System Events" to tell process "Google Chrome"
return position of first window
end tell' | osascript
171, 184
We can also find it from within the browser using Javascript:
> window.screenX
171
> window.screenY
184
Now we can calculate the absolute position of the button on the screen:
button left = browser window left + button left offset
button top = browser window top + viewport top + button top offset
if we want to find the center of the button we need to add half the width and height of the button.
Here is the get_chrome_origin()
to get the browser absolute origin (left, top) coordinates
position. There is some extra parsing and conversion of the final result into a Rust vector.
fn get_chrome_origin() -> Result<(i32, i32), Box<dyn std::error::Error>> {
let script = r#"
tell application "System Events" to tell process "Google Chrome"
set thePos to position of first window
return thePos
end tell
"#;
let res = osascript(script)?;
let coords: Vec<i32> = res.split(',')
.map(|s| s.trim().parse().unwrap())
.collect();
Ok((coords[0], coords[1]))
}
Next, we need to calculate the absolute position of the button on the screen. Note, that we first send the keystrokes ctrl+0 to the browser. This resets the zoom-level, which is important to avoid the zoom distorting the pixel calculation.
fn get_absolute_viewport_top() -> Result<i32, Box<dyn std::error::Error>> {
let script = r#"
tell application "Google Chrome"
activate
tell application "System Events"
keystroke "0" using {command down}
end tell
tell the active tab of window 1
set js to "window.screenY + window.outerHeight - window.innerHeight"
set viewportTop to execute javascript js
return viewportTop
end tell
end tell
"#;
let res = osascript(script)?;
res.trim().parse::<i32>().map_err(Into::into)
}
Finally, we can calculate the button center coordinates (in viewport cooridnates). This function operates on a pair of URL and query to locate the button element on the page (if it exists).
fn find_element_center(partial_url: &str, query: &str) -> Result<(f64, f64), Box<dyn std::error::Error>> {
let escaped_query = query.replace("\"", "\\\"");
let full_query = format!(
"var el = {}; if (el) {{ var rect = el.getBoundingClientRect(); var centerX = rect.left + rect.width / 2; var centerY = rect.top + rect.height / 2; centerX + ',' + centerY; }} else {{ 'not found'; }}",
escaped_query
);
let escaped_url = partial_url.replace("\"", "\\\"");
let script = format!(r#"
tell application "Google Chrome"
try
set currentTab to the active tab of the front window
set currentURL to the URL of currentTab
if currentURL contains "{}" then
-- Construct and execute the JavaScript query to find the center of the element
set fullQuery to "{}"
set queryResult to execute currentTab javascript fullQuery
return queryResult
else
return "URL does not contain the specified partial URL."
end if
on error errMsg
return "An error occurred: " & errMsg
end try
end tell
"#, escaped_url, full_query);
let res = osascript(&script)?;
if res.contains(",") {
let parts: Vec<&str> = res.split(',').collect();
let x = parts[0].trim().parse::<f64>()?;
let y = parts[1].trim().parse::<f64>()?;
Ok((x, y))
} else {
Err("Element not found or error occurred".into())
}
}
π The main loop π
Let’s see how everything fits together in the main loop. First, it loads the config. Then it enters a loop that goes through each URL in the config. For each URL it checks if the URL it tries to find the button center coordinates. If it finds the button it calculates the absolute position of the button on the screen and moves the mouse to that position and clicks the button using enigo. Finally, it sleeps for 3 seconds before repeating the process.
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut enigo = Enigo::new(&Settings::default()).unwrap();
let user_click_dict = load_click_config()?;
loop {
for click in &user_click_dict {
if let Ok((dx, dy)) = find_element_center(&click.url, &click.query) {
if let Ok((x, _y)) = get_chrome_origin() {
let viewport_top = get_absolute_viewport_top()?;
let abs_x = x as f64 + dx;
let abs_y = viewport_top as f64 + dy;
enigo.move_mouse(abs_x as i32, abs_y as i32, Abs).unwrap();
// Click twice - once to focus, once to click the button
enigo.button(Button::Left, Click).unwrap();
enigo.button(Button::Left, Click).unwrap();
}
}
}
thread::sleep(Duration::from_secs(3));
}
}
π Building and Running the program π
We will use the same launchd
mechanism we used to watch for tabs to close, to run the program on
startup. First, we need to build the program:
$ cargo build --release
Next, we need to copy the binary to /usr/local/bin
:
cp target/release/auto-click /usr/local/bin
Then, we create a property list file that will run the program on startup. The file is
called auto_web_login_simulate_user.plist
and it looks like so:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>auto-web-login - simulate user</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/auto-click</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/auto-web-login.out</string>
<key>StandardErrorPath</key>
<string>/tmp/auto-web-login.err</string>
</dict>
</plist>
Finally, we need to make this run automatically on startup. These commands will copy the Launchd file auto_web_login_simulate_user.plist to the right place.
$ cp auto_web_login_simulate_user.plist ~/Library/LaunchAgents
$ launchctl load ~/Library/LaunchAgents/auto_web_login_simulate_user.plist
π Conclusion π
We made it! We have a Rust program π¦ that uses AppleScript π to run JavaScript π₯ in the browser to locate π buttons on the screen and then and click them β .
This is the final piece of the puzzle 𧩠that allows us to automatically log in π to websites that require frequent re-authorization π. I use it on a daily basis, and it was a lot of fun π to build.
Learning π about the intricacies of web logins and mixing multiple π οΈ technologies to accomplish the task was a great experience.