Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
robbiemu committed Oct 20, 2024
1 parent dee70eb commit f7ea84c
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ crawl-docs https://example.com / -o output.md \
--allowed-paths "/docs/" "/api/"
```

#### Skipping Pre-Fetch and Post-Fetch URLs with Ignore Paths
#### Skipping URLs with Ignore Paths

```bash
Copiar código
Expand Down
6 changes: 4 additions & 2 deletions src/libcrawler/libcrawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,8 @@ def crawl_and_convert(
delay_range=0.5,
extra_remove_selectors=None,
similarity_threshold=0.8,
allowed_paths=None
allowed_paths=None,
ignore_paths=None
):
# Build the tree and get page_markdowns and url_to_anchor
page_markdowns, url_to_anchor = build_tree(
Expand All @@ -381,7 +382,8 @@ def crawl_and_convert(
delay=delay,
delay_range=delay_range,
extra_remove_selectors=extra_remove_selectors,
allowed_paths=allowed_paths
allowed_paths=allowed_paths,
ignore_paths=ignore_paths
)

# Deduplicate content
Expand Down

0 comments on commit f7ea84c

Please sign in to comment.