Skip to content

Commit

Permalink
Merge pull request #26 from Ayushk4/doc_fix_patch
Browse files Browse the repository at this point in the history
Fixing documentation
  • Loading branch information
Ayushk4 authored Aug 6, 2019
2 parents 114fa32 + 96a8ce0 commit e8d5d61
Show file tree
Hide file tree
Showing 13 changed files with 86 additions and 36 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
*.swp
**.ipynb_checkpoints

/docs/build
/docs/site
docs/build
docs/site
13 changes: 10 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,17 @@ email: false
# - julia -e 'Pkg.clone(pwd()); Pkg.build("CorpusLoaders"); Pkg.test("CorpusLoaders"; coverage=true)'

after_success:
# Push Documentation
- julia -e 'Pkg.add("Documenter")'
- julia -e 'cd(Pkg.dir("CorpusLoaders")); include(joinpath("docs", "make.jl"))'
# push coverage results to Coveralls
- julia -e 'cd(Pkg.dir("CorpusLoaders")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())'
# push coverage results to Codecov
- julia -e 'cd(Pkg.dir("CorpusLoaders")); Pkg.add("Coverage"); using Coverage; Codecov.submit(Codecov.process_folder())'

jobs:
include:
- stage: "Documentation"
julia: 1.0
os: linux
script:
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- julia --project=docs/ docs/make.jl
after_success: skip
3 changes: 3 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
CorpusLoaders = "214a0ac2-f95b-54f7-a80b-442ed9c2c9e8"
20 changes: 16 additions & 4 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,21 @@
using Documenter
using CorpusLoaders

makedocs(modules=[CorpusLoaders])
makedocs(modules = [CorpusLoaders],
sitename = "CorpusLoaders",
pages = [
"Home" => "index.md",
"CoNLL" => "CoNLL.md",
"IMDB" => "IMDB.md",
"SemCor" => "SemCor.md",
"Senseval3" => "Senseval3.md",
"StanfordSentimentTreebank" => "StanfordSentimentTreebank.md",
"Twitter" => "Twitter.md",
"WikiCorpus" => "WikiCorpus.md",
"API References" => "APIReference.md"
])


deploydocs(deps = Deps.pip("mkdocs", "python-markdown-math"),
repo = "github.com/oxinabox/CorpusLoaders.jl.git",
osname = "linux")
deploydocs(deps = Deps.pip("mkdocs", "python-markdown-math"),
repo = "github.com/oxinabox/CorpusLoaders.jl.git"
)
6 changes: 6 additions & 0 deletions docs/src/APIReference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# API References

```@autodocs
Modules = [CorpusLoaders]
Order = [:function, :type]
```
2 changes: 1 addition & 1 deletion docs/src/CoNLL.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## CoNLL 2003
# CoNLL 2003
The CoNLL-2003 shared task data files
is made from the the Reuters Corpus,
is a collection of news wire articles.
Expand Down
6 changes: 3 additions & 3 deletions docs/src/IMDB.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
### IMDB
# IMDB

IMDB movie reviews dataset a standard collection for Binary Sentiment Analysis task. It is used for benchmarking Sentiment Analysis algorithms. It provides a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided

Expand All @@ -19,7 +19,7 @@ Example:

#Using "test_neg" keywords for negative test set examples

```
```julia
julia> dataset_test_neg = load(IMDB("test_neg"))
Channel{Array{Array{String,1},1}}(sz_max:4,sz_curr:4)

Expand All @@ -32,7 +32,7 @@ julia> docs = collect(take(dataset_test_neg, 2))

#Using "train_pos" keyword for positive train set examples

```
```julia
julia> dataset_train_pos = load(IMDB()) #no need to specify category because "train_pos" is default
Channel{Array{Array{String,1},1}}(sz_max:4,sz_curr:4)

Expand Down
3 changes: 1 addition & 2 deletions docs/src/SemCor.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

## SemCor
# SemCor

The classical Sense Annotated corpus.
See also [WordNet.jl](https://github.com/JuliaText/WordNet.jl)
Expand Down
3 changes: 2 additions & 1 deletion docs/src/Senseval3.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
## Senseval-3
# Senseval-3

Senseval-3 is a sense annotated corpus
Has a structure of documents, sentences, words.
The words are either tagged with part of speech, or tagged with full lemma, part of speech and sensekey.
Expand Down
21 changes: 11 additions & 10 deletions docs/src/StanfordSentimentTreebank.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
### StanfordSentimentTreebank
# StanfordSentimentTreebank

This contains sentiment part of famous dataset Stanford Sentiment Treebank V1.0 for [Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf) paper by Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng and Christopher Potts.
The dataset gives the phases with their sentiment labels between 0 to 1. This dataset can be used as binary or fine-grained sentiment classification problems.

Expand All @@ -7,11 +8,11 @@ documents/tweets, sentences, words, characters

To get desired levels, `flatten_levels` function from [MultiResolutionIterators.jl](https://github.com/oxinabox/MultiResolutionIterators.jl) can be used.

## Usage:
## Usage

The output dataset is a 2-dimensional `Array` with first column as `Vector`s of sentences as tokens and second column as their respective sentiment scores.

```
```julia
julia> dataset = load(StanfordSentimentTreebank())
239232Γ—2 Array{Any,2}:
Array{String,1}[["!"]]
Expand Down Expand Up @@ -50,9 +51,9 @@ julia> dataset = load(StanfordSentimentTreebank())

```

# To get phrases from `data`:
### To get phrases from `data`:

```
```julia
julia> phrases = dataset[1:5, 1] #Here `data1`is a 2-D Array
5-element Array{Any,1}:
Array{String,1}[["!"]]
Expand All @@ -62,9 +63,9 @@ julia> phrases = dataset[1:5, 1] #Here `data1`is a 2-D Array
Array{String,1}[["!"], ["Brilliant"]]
```

# To get sentiments values:
### To get sentiments values:

```
```julia
julia> values = data[1:5, 2] #Here "data" is a 2-D Array
5-element Array{Any,1}:
0.5
Expand All @@ -74,11 +75,11 @@ julia> values = data[1:5, 2] #Here "data" is a 2-D Array
0.86111
```

# Using `flatten_levels`
### Using `flatten_levels`

To get an `Array` of all sentences from all the `phrases` (since each phrase can contain more than one sentence):

```
```julia
julia> sentences = flatten_levels(phrases, (lvls)(StanfordSentimentTreebank, :documents))|>full_consolidate
9-element Array{Array{String,1},1}:
["!"]
Expand All @@ -94,7 +95,7 @@ julia> sentences = flatten_levels(phrases, (lvls)(StanfordSentimentTreebank, :do

To get `Array` of all the from `phrases`:

```
```julia
julia> words = flatten_levels(phrases, (!lvls)(StanfordSentimentTreebank, :words))|>full_consolidate
10-element Array{String,1}:
"!"
Expand Down
6 changes: 3 additions & 3 deletions docs/src/Twitter.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Twitter
# Twitter

Twitter sentiment dataset by Nick Sanders. Downloaded from [Sentiment140 site](http://help.sentiment140.com/for-students).
It is large dataset for the Sentiment Analysis task. Every tweets falls in either three categories positive(4), negative(0) or neutral(2).It contains 1600000 training examples and 498 testing examples.
Expand All @@ -18,7 +18,7 @@ Example:

#Using "test_pos" keyword for getting positive polarity sentiment examples

```
```julia
julia> dataset_test_pos = load(Twitter("test_pos"))
Channel{Array{Array{String,1},1}}(sz_max:4,sz_curr:4)

Expand Down Expand Up @@ -67,7 +67,7 @@ julia> tweets = collect(take(dataset_test_pos, 2))

#Using "train_pos" category to get positive polarity sentiment examples

```
```julia
julia> dataset_train_pos = load(Twitter()) #no need to specify category because "train_pos" is default
Channel{Array{Array{String,1},1}}(sz_max:4,sz_curr:4)

Expand Down
5 changes: 2 additions & 3 deletions docs/src/WikiCorpus.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

### WikiCorpus
# WikiCorpus

Very commonly used corpus in general.
The loader (and default datadep) is for [Samuel Reese's 2006 based corpus](http://www.lsi.upc.edu/~nlp/wikicorpus/).
Expand All @@ -17,7 +16,7 @@ so should use `flatten_levels` (from MultiResolutionIterators.jl) to get rid of

Example:

```
```julia
julia> using CorpusLoaders;
julia> using MultiResolutionIterators;
julia> using Base.Iterators;
Expand Down
30 changes: 26 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,26 @@
```@autodocs
Modules = [CorpusLoaders]
Order = [:function, :type]
```
# CorpusLoaders.jl

A collection of various means for loading various different corpora used in NLP.

## Installation

pkg> add https://github.com/JuliaText/CorpusLoaders.jl

## Common Structure

For some corpus which we will say has type `Corpus`,
it will have a constructior `Corpus(path)``
where `path` argument is a path to the files describing it.
That path will default to a predefined data dependency, if not provided.
The data dependency will be downloaded the first time you call `Corpus()`.
When the datadep resolves it will give full bibliograpghic details on the corpus etc.
For more on that like configuration details, see [DataDeps.jl](https://github.com/oxinabox/DataDeps.jl).

Each corpus has a function `load(::Corpus)`.
This will return some iterator of data.
It is often lazy, e.g. using a `Channel`,
as many corpora are too large to fit in memory comfortably.
It will often be an iterator of iterators of iterators ...
Designed to be manipulated by using [MultiResolutionIterators.jl](https://github.com/oxinabox/MultiResolutionIterators.jl).
The corpus type is an indexer for using named levels with MultiResolutionInterators.jl,
so `lvls(Corpus, :para)` works

2 comments on commit e8d5d61

@oxinabox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/2638

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if Julia TagBot is installed, or can be done manually through the github interface, or via:

git tag -a v0.3.0 -m "<description of version>" e8d5d61137dfa69662e8f19274c21a820ed9c593
git push origin v0.3.0

Also, note the warning: This looks like a new registration that registers version 0.3.0.
Ideally, you should register an initial release with 0.0.1, 0.1.0 or 1.0.0 version numbers
This can be safely ignored. However, if you want to fix this you can do so. Call register() again after making the fix. This will update the Pull request.

Please sign in to comment.