diff --git a/README.md b/README.md
index 8c24159..ae8d7bf 100644
--- a/README.md
+++ b/README.md
@@ -67,7 +67,6 @@ module.help()
```text
Split texts into sentences.
-
Args:
text (Union[str, List[str], Tuple[str]]): single text or list/tuple of texts
backend (str): morpheme analyzer backend. 'mecab', 'pecab', 'punct' are supported
@@ -120,12 +119,10 @@ Because there are so many modules, I apologize for not being able to explain eac
1. augment
-
This augments text with synonym replacement method and,
optionally it postprocesses the text by correcting josa.
For this, Kss uses the Korean wordnet from KAIST.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- replacement_ratio (`float`): ratio of words to be replaced
@@ -154,11 +151,9 @@ References:
2. collocate
-
This returns collocation (연어) of given words.
The collocation is a set of words that frequently appear together.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single word or list of words
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -184,11 +179,9 @@ References:
3. g2p
-
This function provides a way to convert Korean graphemes to phonemes.
The 'grapheme' means a letter or a character, and the 'phoneme' means a sound.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- descriptive (`bool`): return descriptive pronunciation, the 'descriptive' means a real-life pronunciation
@@ -220,10 +213,8 @@ References:
4. hangulize
-
This converts the given text to Hangul pronunciation.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- lang (`str`): source language code
@@ -249,10 +240,8 @@ References:
5. split_hanja
-
This splits the given text into hanja string and non-hanja string.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -277,10 +266,8 @@ This was copied from [hanja](https://github.com/suminb/hanja) and modified by Ks
6. is_hanja
-
This checks if the given character is a hanja character.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single character or list of characters
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -309,10 +296,8 @@ This was copied from [hanja](https://github.com/suminb/hanja) and modified by Ks
7. hanja2hangul
-
This converts hanja to hangul.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- combination (`bool`): whether to return hanja and hangul together or not
@@ -340,10 +325,8 @@ References:
8. h2j
-
This converts a string of Hangul to jamo.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -369,10 +352,8 @@ References:
9. h2hcj
-
This converts a string of Hangul to Hangul Compatibility Jamo.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -397,10 +378,8 @@ References:
10. j2h
-
This converts a string of jamo to Hangul.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- add_placeholder_for_leading_vowels (`bool`): add 'ㅇ' for leading vowels (e.g. 'ㅐ플' -> '애플')
@@ -427,10 +406,8 @@ References:
11. j2hcj
-
This converts a string of jamo to Hangul Compatibility Jamo.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -456,10 +433,8 @@ References:
12. hcj2h
-
This converts a string of Hangul Compatibility Jamo to Hangul.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -485,10 +460,8 @@ References:
13. hcj2j
-
This converts a string of Hangul Compatibility Jamo to jamo.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- position (`str`): the position of the HCJ character to convert to jamo character, one of 'lead', 'vowel', 'tail'
@@ -515,10 +488,8 @@ References:
14. is_jamo
-
This checks if a character is a jamo character.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -544,10 +515,8 @@ References:
15. is_jamo_modern
-
This checks if a character is a modern jamo character.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -573,10 +542,8 @@ References:
16. is_hcj
-
This checks if a character is a Hangul Compatibility Jamo character.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -602,10 +569,8 @@ References:
17. is_hcj_modern
-
This checks if a character is a modern Hangul Compatibility Jamo character.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -631,10 +596,8 @@ References:
18. is_hangul_char
-
This checks if a character is a Hangul character.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -659,10 +622,8 @@ References:
19. select_josa
-
This selects the correct josa for the given prefix.
-
Args:
- prefix (`Union[str, List[str]`): single prefix or list of prefixes
- josa (`Union[str, List[str]`): single josa or list of josas
@@ -689,10 +650,8 @@ References:
20. combine_josa
-
This combines the given prefix and josa.
-
Args:
- prefix (`Union[str, List[str]`): single prefix or list of prefixes
- josa (`Union[str, List[str]`): single josa or list of josas
@@ -719,11 +678,9 @@ References:
21. extract_keywords
-
This extracts keywords from the given text.
This uses TextRank algorithm to extract keywords.
-
Args:
- text (`Union[str, List[str]`): single text or list of texts
- num_keywords (`int`): the number of keywords to extract
@@ -763,10 +720,8 @@ References:
22. split_morphemes
-
This splits texts into morphemes.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts
- backend (`str`): morpheme analyzer backend. 'mecab', 'pecab' are supported.
@@ -790,10 +745,8 @@ Examples:
23. paradigm
-
This searches paradigms of the given text.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -819,10 +772,8 @@ References:
24. anonymize
-
This anonymizes sensitive information in the given text.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- phone_number_anonymization (`bool`): whether to anonymize phone numbers or not
@@ -866,10 +817,8 @@ Examples:
25. clean_news
-
This cleans news articles by removing useless headers and footers.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): Input text or list of texts.
- min_sentences (`int`): Minimum number of sentences to keep. Defaults to 3.
@@ -895,10 +844,8 @@ Examples:
26. is_completed_form
-
This checks if the given text is in completed form.
-
Args:
- text (`Union[str, List[str], Tuple[str]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -924,7 +871,6 @@ False
27. get_all_completed_form_hangul_chars
-
This returns all completed form Hangul characters.
Returns:
@@ -943,7 +889,6 @@ Examples:
28. get_all_incompleted_form_hangul_chars
-
This returns all incompleted form Hangul characters.
Returns:
@@ -962,10 +907,8 @@ Examples:
29. filter_out
-
This filters out bad text based on various conditions.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- min_length (`int`): minimum length of text
@@ -1027,10 +970,8 @@ Examples:
30. half2full
-
This converts half-width characters to full-width characters.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -1051,10 +992,8 @@ Examples:
31. normalize
-
This normalizes text with various options.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- normalization_type (`Optional[str]`): normalization type
@@ -1084,11 +1023,9 @@ Examples:
32. preprocess
-
This preprocesses text with various options.
This does 1) normalization, 2) filtering out, and 3) anonymization in order.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- normalization_type (`Optional[str]`): normalization type
@@ -1171,10 +1108,8 @@ Returns:
33. reduce_char_repeats
-
This reduces character repeats in text.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- num_repeats (`int`): the number of character that can be repeated
@@ -1199,10 +1134,8 @@ References:
34. reduce_emoticon_repeats
-
This reduces emoticon repeats in text.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- num_repeats (`int`): the number of emoticon that can be repeated
@@ -1227,10 +1160,8 @@ References:
35. remove_invisible_chars
-
This removes invisible characters from text.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- num_workers (`Union[int, str]`): the number of multiprocessing workers
@@ -1251,10 +1182,8 @@ Examples:
36. qwerty
-
This converts text from one language to another using QWERTY keyboard layout.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- src (`str`): source language
@@ -1282,10 +1211,8 @@ References:
37. romanize
-
This romanizes Korean text.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- use_morpheme_info (`bool`): whether to use morpheme information or not
@@ -1315,10 +1242,8 @@ References:
38. is_unsafe
-
This checks if the text is unsafe or not.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list of texts
- return_matches (`bool`): whether to return matches or not
@@ -1352,10 +1277,8 @@ True
39. split_sentences
-
This splits texts into sentences.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts
- backend (`str`): morpheme analyzer backend. 'mecab', 'pecab', 'punct' are supported
@@ -1380,10 +1303,8 @@ Examples:
40. correct_spacing
-
This corrects the spacing of the text.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts
- backend (`str`): morpheme analyzer backend. 'mecab', 'pecab', 'punct' are supported
@@ -1409,10 +1330,8 @@ References:
41. summarize_sentences
-
This summarizes the given text, using TextRank algorithm.
-
Args:
- text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts
- backend (`str`): morpheme analyzer backend. 'mecab', 'pecab' are supported.