Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocabulary extraction #280

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
All tests are passed locally.
atantos committed Jan 10, 2024
commit 0e53fc89b7e59a03e7c920ccf875bf97927796c8
2 changes: 1 addition & 1 deletion src/document.jl
Original file line number Diff line number Diff line change
@@ -331,7 +331,7 @@ OrderedDict{String, Int64} with 7 entries:
"sentence" => 5
⋮ => ⋮
"""
function vocab(input::Union{StringDocument,Vector{String}})
function ordered_vocab(input::Union{StringDocument,Vector{String}})
string_vector = to_string_vector(input) |> unique

# preallocating the ordered dictionary with the size of the string_vector
4 changes: 2 additions & 2 deletions test/document.jl
Original file line number Diff line number Diff line change
@@ -35,8 +35,8 @@ using DataStructures: OrderedDict
@test "a" in keys(ngrams(sd, 1))
@test "string" in keys(ngrams(sd, 1))

@test vocab(sd) == OrderedDict("This" => 1, "is" => 2, "a" => 3, "string" => 4)
@test vocab(["This", "is", "a", "string"]) == OrderedDict("This" => 1, "is" => 2, "a" => 3, "string" => 4)
@test ordered_vocab(sd) == OrderedDict("This" => 1, "is" => 2, "a" => 3, "string" => 4)
@test ordered_vocab(["This", "is", "a", "string"]) == OrderedDict("This" => 1, "is" => 2, "a" => 3, "string" => 4)

@test length(sd) == 16