Skip to content

Commit

Permalink
feat: add remarklint for md docs (#213)
Browse files Browse the repository at this point in the history
* feat: add remarklint for md docs

* fix: remarkrc file and run linter on commit hook
  • Loading branch information
RalphJbeily authored Jan 24, 2019
1 parent 0aa67ee commit 2e1e4d9
Show file tree
Hide file tree
Showing 4 changed files with 685 additions and 36 deletions.
6 changes: 6 additions & 0 deletions .remarkrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"plugins": [
"remark-preset-lint-recommended",
["remark-lint-list-item-indent", false]
]
}
6 changes: 5 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"main": "./dist/mercury.js",
"scripts": {
"lint": "if test \"$CI\" != \"true\" ; then eslint . --fix; fi",
"lint:ci": "eslint .",
"lint:ci": "remark . && eslint .",
"lint-fix-quiet": "eslint --fix --quiet",
"build": "yarn lint && rollup -c && yarn test:build",
"build:web": "yarn lint && rollup -c rollup.config.web.js && yarn test:build:web",
Expand Down Expand Up @@ -79,6 +79,9 @@
"nock": "^10.0.6",
"ora": "^3.0.0",
"prettier": "^1.15.3",
"remark-cli": "^6.0.1",
"remark-lint": "^6.0.4",
"remark-preset-lint-recommended": "^3.0.2",
"requirejs": "^2.3.6",
"rollup": "^1.1.0",
"rollup-plugin-babel": "^4.0.1",
Expand Down Expand Up @@ -125,6 +128,7 @@
"git add"
],
"*.{json,css,md}": [
"remark .",
"prettier --write",
"git add"
]
Expand Down
61 changes: 32 additions & 29 deletions src/extractors/custom/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,19 @@ Custom parsers allow you to write CSS selectors that will find the content you'r

You can query for every field returned by the Mercury Parser:

- title
- author
- content
- date_published
- lead_image_url
- dek
- next_page_url
- excerpt
- title
- author
- content
- date_published
- lead_image_url
- dek
- next_page_url
- excerpt

### Using selectors

#### Basic selectors

To demonstrate, let's start with something simple: Your selector for the page's title might look something like this:

```javascript
Expand All @@ -41,12 +42,13 @@ As you might guess, the selectors key provides an array of selectors that Mercur
The selector you choose should return one element. If more than one element is returned by your selector, it will fail (and Mercury will fall back to its generic extractor).
#### Selecting an attribute
Sometimes the information you want to return lives in an element's attribute rather than its text — e.g., sometimes a more exact ISO-formatted date/time will be stored in an attribute of an element.
Sometimes the information you want to return lives in an element's attribute rather than its text — e.g., sometimes a more exact ISO-formatted date/time will be stored in an attribute of an element.
So your element looks like this:
```html
<time class="article-timestamp" datetime="2016-09-02T07:30:01-04:00">
<time class="article-timestamp" datetime="2016-09-02T07:30:01-04:00"></time>
```
The text you want isn't the text inside a matching element, but rather, inside the datetime attribute. To write a selector that returns an attribute, you provide your custom parser with a two-element array. The first element is your selector; the second element is the attribute you'd like to return.
Expand All @@ -71,7 +73,7 @@ This is all you'll need to know to handle most of the fields Mercury parses (tit
An article's content can be more complex than the other fields, meaning you sometimes need to do more than just provide the selector(s) in order to return clean content.
For example, sometimes an article's content will contain related content that doesn't translate or render well when you just want to see the article's content. The clean key allows you to provide an array of selectors identifying elements that should be removed from the content.
For example, sometimes an article's content will contain related content that doesn't translate or render well when you just want to see the article's content. The clean key allows you to provide an array of selectors identifying elements that should be removed from the content.
Here's an example:
Expand Down Expand Up @@ -195,21 +197,21 @@ If you look at your parser's test file, you'll see a few instructions to guide y
By default, the first test, which ensures your custom extractor is being selected properly, should be passing. The first failing test checks to see whether your extractor returns the correct title:
```javascript
it('returns the title', (async) () => {
// To pass this test, fill out the title selector
// in ./src/extractors/custom/www.newyorker.com/index.js.
const html =
fs.readFileSync('./fixtures/www.newyorker.com/1475245895852.html');
const articleUrl =
'http://www.newyorker.com/tech/elements/hacking-cryptography-and-the-countdown-to-quantum-computing';

const { title } =
await Mercury.parse(articleUrl, html, { fallback: false });

// Update these values with the expected values from
// the article.
assert.equal(title, 'Schrödinger’s Hack');
});
it('returns the title', async () => {
// To pass this test, fill out the title selector
// in ./src/extractors/custom/www.newyorker.com/index.js.
const html = fs.readFileSync(
'./fixtures/www.newyorker.com/1475245895852.html'
);
const articleUrl =
'http://www.newyorker.com/tech/elements/hacking-cryptography-and-the-countdown-to-quantum-computing';

const { title } = await Mercury.parse(articleUrl, html, { fallback: false });

// Update these values with the expected values from
// the article.
assert.equal(title, 'Schrödinger’s Hack');
});
```
As you can see, to pass this test, we need to fill out our title selector. In order to do this, you need to know what your selector is. To do this, open the html fixture the generator downloaded for you in the [`fixtures`](/fixtures) directory. In our example, that file is `fixtures/www.newyorker.com/1475248565793.html`. Now open that file in your web browser.
Expand All @@ -223,7 +225,7 @@ So, back to the title: We want to make sure our test finds the same title we see
The selector for this title appears to be `h1.title`. To verify that we're right, click on the Console tab in Chrome's Developer Tools and run the following check:
```javascript
$$('h1.title')
$$('h1.title');
```
If that returns only one match (i.e., an array with just one element), and the text of that element looks like the title we want, you're good to go!
Expand All @@ -247,7 +249,8 @@ export const NewYorkerExtractor = {
Save the file, and... uh oh, our example still fails.
```javascript
AssertionError: 'Hacking, Cryptography, and the Countdown to Quantum Computing' == 'Schrödinger’s Hack'
AssertionError: 'Hacking, Cryptography, and the Countdown to Quantum Computing' ==
'Schrödinger’s Hack';
```
When Mercury generated our test, it took a guess at the page's title, and in this case, it got it wrong. So update the test with thte title we expect, save it, and your test should pass!
Expand All @@ -259,7 +262,7 @@ We've been moving at a slow pace, but as you can see, once you understand the ba
For a slightly more complex example, you'll find after a bit of looking that the best place to get the most accurate datetime on the page is in the head of the document, in the value attribute of a meta tag:
```html
<meta value="2016-09-26T14:04:22-04:00" name="article:published_time">
<meta value="2016-09-26T14:04:22-04:00" name="article:published_time" />
```
As [explained above](#selecting-an-attribute), to return an attribute rather than the text inside an element, your selector should be an array where the first element is the element selector and the second element is the attribute you want to return. So, in this example, the date_published selector should look like this:
Expand Down
Loading

0 comments on commit 2e1e4d9

Please sign in to comment.