Skip to content

Oldes/Rebol-WebDriver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rebol-webdriver

Gitter Zulip

Rebol/WebDriver

WebDriver client for the Rebol programming language.

Currently, only the Chrome scheme is implemented, designed to work with Chrome, Brave, Edge, and other Chromium-based browsers.

The browser must be started with remote-debugging enabled.

For example on macOS start a Brave browser from Terminal using command:

/Applications/Brave\ Browser.app/Contents/MacOS/Brave\ Browser --remote-debugging-port=9222

On Windows for example:

"c:\Program Files\BraveSoftware\Brave-Browser\Application\Brave.exe" --remote-debugging-port=9222

or

"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222

Available methods are documented here: https://chromedevtools.github.io/devtools-protocol/

Simple usage example

import %websocket.reb           ;; The scheme depends on WebSocket module (which may not be available yet by default)
import %webdriver.reb           ;; Importing the module from the source file direcly

system/options/quiet:    off    ;; Modifies script's output visibility
system/options/log/ws:   0      ;; No WebSocket traces
system/options/log/http: 0      ;; No HTTP traces

browser: open chrome://         ;; Initialize Chrome's WebDriver scheme (defaults to localhost:9222)

write browser [                 ;; Sends multiple commands to be evaluated by the WebDriver scheme
    Network.enable              ;; Enable network tracking to capture network events.
    http://www.rebol.com        ;; Opens the specified webpage and waits for it to finish loading.
    0:0:1                       ;; (Optional) Waits for 1 second to process potential incoming events.
                                ;; This may help with dynamically updated pages.
    DOM.getDocument [depth: -1] ;; Retrieves the root DOM node along with the entire subtree (depth -1).
]

print pick browser 'DOM.getDocument ;; Prints the resolved DOM structure

;- Printing the current webpage to PDF
tmp: write browser 'Page.printToPDF
write %page.pdf debase tmp/result/data 64 ;; Save the PDF data to a file (encoded in base64)

;- Navigating to another webpage within the session
write browser https://www.theguardian.com/news/series/ten-best-photographs-of-the-day
;; Content of this page is dynamically updated, so wait for it.
write browser 0:0:1

;; Simulate multiple mouse wheel events to scroll the webpage
loop 10 [
    write browser [
        Input.dispatchMouseEvent [type: "mouseWheel" x: 100 y: 100 deltaX: 0 deltaY: 800]
        0:0:1
    ]
]

;; Received events are stored in the session and may be processed.
;; For example, to resolve all loaded JPEG images on the page...
foreach [n m] take/all browser/extra/events [
    if all [
        n == "Network.responseReceived"       ;; Look for network responses
        m/type == "Image"                     ;; Specifically, images
        m/response/status == 200              ;; Ensure the request succeeded
        m/response/mimeType == "image/avif"   ;; Filter for AVIF images
    ][
        probe m/response/url
        url: decode-url m/response/url        ;; Decode the image URL

        local-file: rejoin [
            %img_                             ;; Prefix for the file name
            checksum to binary! url/path 'md5 ;; Generate a checksum for the image URL
            #"_" url/target                   ;; Append the target filename
        ]

        ;; Check if the image is not already downloaded.
        if exists? local-file [
            print ["File already downloaded:" as-yellow local-file]
            continue
        ]

        ;; Request the image body.
        tmp: write browser compose/deep [Network.getResponseBody [requestId: (m/requestId)]]

        ;; Decode the image data (base64 if necessary).
        bin: tmp/result/body
        if tmp/result/base64Encoded [bin: debase bin 64]

        ;; Save the image to disk.
        probe write local-file bin
    ]
]

;- Closing the session gracefully
close browser ;; Close the session (similar to closing a page in the browser).

Note: The above code demonstrates how the WebDriver module can be used to interact with webpages, including downloading images dynamically. However, for simpler use cases (e.g., static webpages), you can use a more straightforward and faster approach without the WebDriver module:

import html-entities

html: read https://www.theguardian.com/news/series/ten-best-photographs-of-the-day

;- Parse the HTML to extract image URLs and download them
parse html [any[
    thru {<picture data-size="jumbo"}                ;; Locate the relevant section for large images
    thru {<source srcSet="} copy url to dbl-quote    ;; Extract the image URL
    (
        image-url: as url! decode 'html-entities url ;; Decode HTML entities in the URL
        url: decode-url image-url                    ;; Decode the image URL for further processing

        local-file: rejoin [
            %img_                                    ;; Prefix for the file name
            checksum to binary! url/path 'md5        ;; Generate a checksum for the image URL
            #"_" url/target                          ;; Append the target filename
        ]

        ;; Check if the image is not already downloaded.
        either exists? local-file [
            print ["File already downloaded:" as-yellow local-file]
        ][
            ;; Download and save the image
            try/with [
                write local-file read image-url
                print ["New image downloaded:" as-green local-file]
            ] :print
        ]
    )
] to end]

Other useful examples

Get all links from a given web page

;; Initialize the browser scheme...
browser: open chrome://
res: write browser [
    https://www.rebol.com  ;; Open some web page.
    DOM.getDocument        ;; Get document's root node (not the full one!).
]
;; Check session results
try/with [
    session:  res/1/result ;; Not used.
    document: res/2/result ;; To get nodeId of the document root.
][
    ;; Quit early in case of insufficient info.
    print "Failed to initialize a session."
    quit
]
;; Query all nodes of type A (links)
res: write browser compose/deep [
    DOM.querySelectorAll [
        nodeId: (document/root/nodeId)
        selector: "a"
    ]
]
;; If any nodes are found, query the outer HTML of each.
if all [
    map? res
    block? nodes: res/result/nodeIDs
][ 
    links: copy []
    foreach node nodes [
        res: write browser compose/deep [DOM.getOuterHTML [nodeId: (node)]]
        try [append links res/result/outerHTML]
        
    ]
    ;; Print results.
    print ["Found" length? nodes "link (A) nodes"]
    foreach link links [ print link ]
]
;; Close page in the browser.
write browser [Page.close]
;; Close the session.
close browser
image

About

WebDriver client for the Rebol programming language. Can be used to automate browser sessions.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages