-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
75 lines (49 loc) · 2.02 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
output: rmarkdown::github_document
---
[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/pubcrawl.svg?branch=master)](https://travis-ci.org/hrbrmstr/pubcrawl)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/hrbrmstr/pubcrawl?branch=master&svg=true)](https://ci.appveyor.com/project/hrbrmstr/pubcrawl)
[![Coverage Status](https://img.shields.io/codecov/c/github/hrbrmstr/pubcrawl/master.svg)](https://codecov.io/github/hrbrmstr/pubcrawl?branch=master)
# pubcrawl
Convert 'epub' Files to Text
## Description
Convert 'epub' Files to Text
The 'epub' file format is really just a structured 'ZIP' archive with
metadata, graphics and (usually) 'HTML' text. Tools are provided to turn an 'epub'
file into a tidy data frame.
## What's Inside The Tin
The following functions are implemented:
- `epub_to_text`: Convert an epub file into a data frame of plaintext chapters
## NOTE
There are edge cases I've totally not covered yet. Feel free to jump in and make this a real, useful package!
## TODO
- [X] Refactor so there aren't so many heavy dependencies
- <strike> [ ] Try to get `hgr` on CRAN so it's not a GH dep</strike> Moved the cleaner code into here
- [ ] Better docs
- [X] Embed some epubs for examples and tests
- [X] Setup Travis, Appveyor, code coverage
## Installation
```{r eval=FALSE}
devtools::install_github("hrbrmstr/pubcrawl")
```
```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```
## Usage
```{r message=FALSE, warning=FALSE, error=FALSE}
library(pubcrawl)
library(tidyverse)
# current verison
packageVersion("pubcrawl")
```
### An O'Reilly epub
```{r}
epub_to_text("~/Data/R Packages.epub")
```
### A Project Gutenberg epub that comes with the package
```{r}
epub_to_text(system.file("extdat", "augustine.epub", package="pubcrawl")) %>%
mutate(path = abbreviate(path))
```
## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.