Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide additional 'select!` macro for improved readability #231

Open
Approximately-Equal opened this issue Jan 20, 2025 · 1 comment
Open

Comments

@Approximately-Equal
Copy link

A common use case is to use scraper to scrape many selections of some Html page and store them. For example:

pub struct Stats {
   pub published: String,
   pub status: Option<(String, String)>,
   pub words: u32,
   pub nchapters: u32,
   pub expected_chapters: Option<u32>,
   pub comments: Option<u32>,
   pub kudos: Option<u32>,
   pub bookmarks: Option<u32>,
   pub hits: u32,
}

impl Stats {
   pub fn new(html: &Html) -> Self {
       Self {
           published: html.select(Selector::parse("dd.published").unwrap()).nth(0).inner_html(),
           status: {
               let status_name = html.select(Selector::parse("dt.status").unwrap()).nth(0).inner_html()
               let status_date = html.select(Selector::parse("dd.status").unwrap()).nth(0).inner_html()
               if status_name.is_some() && status_date.is_some() {
                   Some((status_name.unwrap(), status_date.unwrap()))
               } else {
                   None
               }
           },
           words: html.select(Selector::parse("dd.words").unwrap()).nth(0).inner_html().parse(),
           ...
       }
   }
   ...
}

Notice that there is a massive html.select(Selector::parse("...").unwrap()) block for pretty much everything. Since this is so common, it should have a more compact version. Some examples of implementations are:

// macro
select!(html, selector) // -> Result<Select<'a, 'b>, SelectorErrorKind<'_>>
// a top-level function
fn select_from(html: &Html, selector: &str) -> Result<Select<'a, 'b>, SelectorErrorKind<'_>> ;
// method of html
html.try_select(selector: &str) -> Result<Select<'a, 'b>, SelectorErrorKind<'_>>

This would make the Stats look like

impl Stats {
    pub fn new(html: &Html) -> Self {
        Self {
            published: select!(html, "dd.published").unwrap().nth(0).inner_html(),
            status: {
                let status_name = select!(html, "dt.status").unwrap().nth(0).inner_html();
                let status_date = select!(html, "dd.status").unwrap().nth(0).inner_html();
                if status_name.is_some() && status_date.is_some() {
                    Some((status_name.unwrap(), status_date.unwrap()))
                } else {
                    None
                }
            },
            words: select!(html, "dd.words").unwrap().nth(0).inner_html().parse(),
            ...
        }
    }
    ...
}

This decreases line width by 15 characters and makes it easier to read the exact selection at a glance.

html.select(Selector::parse("dd.published").unwrap()) // 53 characters
select!(html, "dd.published").unwrap() // 38 characters
@Approximately-Equal Approximately-Equal changed the title Provide Additional 'select!` Macro for Improved Readability Provide additional 'select!` macro for improved readability Jan 20, 2025
@adamreichold
Copy link
Member

I think most of the line noise is Selector::parse("..").unwrap() and we should not encourage using selectors parsed on the fly, i.e. it is often much more efficient to parse selectors once and store them for reuse. So I would argue that a

fn selector(selectors: &str) -> Selector<'_>;

is preferable as it encourages that avoids the need for a macro.

Also note that free functions and methods to directly select items will not work as proposed AFAIU as the Select iterators borrow from the Selector values, so even the macro would have to inject a binding into the enclosing scope to keep the selector alive while the iterator is used. Alternatively, the function/methods would need to use a closure to access the selection iterator, e.g.

fn try_select<F, R>(&'a self, selectors: &str, f: F) -> R where F: FnOnce(Select<'a, '_>) -> R;

where the second lifetime of Select<'a, '_> will borrow form the temporary selector and not from selectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants