-
Notifications
You must be signed in to change notification settings - Fork 13
Navigation shortcuts
zverok edited this page Aug 7, 2015
·
1 revision
So, you already received some page from wikipedia and inspected and navigated data structure. All in all, that's enough to extract information of any kind. But Infoboxer can do it smoother.
(JFYI: API docs for Navigation::Shortcuts module lists all of theese in more orderly manner.)
Shortcuts for receiving node lists of some type:
page = Infoboxer.wp.get('Argentina')
# Get all paragraphs on a page
page.paragraphs
# => list of all paragraph-level nodes in page
# And other basic node kinds:
page.wikilinks
page.external_links
page.images
page.tables
page.templates
page.lists
page.headings
# Refine your query:
page.headings(level: 3)
# Special shortcut for template names:
page.templates('see')
# or even
page.templates(/^Infobox/)
# Wikilinks namespace
page.wikilinks
# => only default namespace wikilinks
page.wikilinks('Category')
# => wikilinks in 'Category' namespace
page.wikilinks(nil)
# => all wikilinks in all the namespaces
# All of the methods above work not only for entire page, but for any
# node on it, like:
page.tables.first.images
Shorcuts for examining node style:
node = page.wikilinks.first
node.bold? # is this link INSIDE bold tag?
node.italic?
node.heading?
node.heading?(3) # is it inside heading level 3?
# (Slightly) more useful example:
Infoboxer.wp.get('Einstein (disambiguation)').
wikilinks.select(&:bold?)
# => only bold disambiguation links,
# which typically mark the most common use(s) of this word
See API docs for full list of methods available.
Any more ideas? Drop me a line! (Or pull request ;)
Next: Navigating by sections