Skip to content

A command line interface to test, construct and maintain automaton scraper definitions

Notifications You must be signed in to change notification settings

open-automaton/automaton-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

auto

The Automaton CLI

Actions

auto fetch

You want to scrape the state of the DOM once the page is loaded, but if you use a tool like CURL you'll only get the transfer state of the page, which is probably not useful. auto fetch pulls the state of the DOM out of a running browser and displays that HTML.

# fetch the full body of the form you are scraping and save it in page.hml
auto fetch https://domain.com/path/ > page.html

auto xpath

The first thing you might do against the HTML you've captured is pull all the forms out of the page, like this:

# check the forms on the page
auto xpath "//form" page.html

Assuming you've identified the form name you are targeting as my-form-name, you then want to get all the inputs out of it with something like:

# select all the inputs you need to populate from this form:
auto xpath "//form[@name='my-form-name']//input|//form[@name='my-form-name']//select|//form[@name='my-form-name']//textarea" page.html

auto run

From this you should be able to construct a primitive scrape definition(See the examples below for more concrete instruction). Once you have this definition you can do sample scrapes with:

TBD

About

A command line interface to test, construct and maintain automaton scraper definitions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published