diff --git a/README.md b/README.md index 8c24159..ae8d7bf 100644 --- a/README.md +++ b/README.md @@ -67,7 +67,6 @@ module.help() ```text Split texts into sentences. -
Args: text (Union[str, List[str], Tuple[str]]): single text or list/tuple of texts backend (str): morpheme analyzer backend. 'mecab', 'pecab', 'punct' are supported @@ -120,12 +119,10 @@ Because there are so many modules, I apologize for not being able to explain eac
1. augment -
This augments text with synonym replacement method and, optionally it postprocesses the text by correcting josa. For this, Kss uses the Korean wordnet from KAIST. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - replacement_ratio (`float`): ratio of words to be replaced @@ -154,11 +151,9 @@ References:
2. collocate -
This returns collocation (연어) of given words. The collocation is a set of words that frequently appear together. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single word or list of words - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -184,11 +179,9 @@ References:
3. g2p -
This function provides a way to convert Korean graphemes to phonemes. The 'grapheme' means a letter or a character, and the 'phoneme' means a sound. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - descriptive (`bool`): return descriptive pronunciation, the 'descriptive' means a real-life pronunciation @@ -220,10 +213,8 @@ References:
4. hangulize -
This converts the given text to Hangul pronunciation. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - lang (`str`): source language code @@ -249,10 +240,8 @@ References:
5. split_hanja -
This splits the given text into hanja string and non-hanja string. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -277,10 +266,8 @@ This was copied from [hanja](https://github.com/suminb/hanja) and modified by Ks
6. is_hanja -
This checks if the given character is a hanja character. -
Args: - text (`Union[str, List[str], Tuple[str]`): single character or list of characters - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -309,10 +296,8 @@ This was copied from [hanja](https://github.com/suminb/hanja) and modified by Ks
7. hanja2hangul -
This converts hanja to hangul. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - combination (`bool`): whether to return hanja and hangul together or not @@ -340,10 +325,8 @@ References:
8. h2j -
This converts a string of Hangul to jamo. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -369,10 +352,8 @@ References:
9. h2hcj -
This converts a string of Hangul to Hangul Compatibility Jamo. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -397,10 +378,8 @@ References:
10. j2h -
This converts a string of jamo to Hangul. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - add_placeholder_for_leading_vowels (`bool`): add 'ㅇ' for leading vowels (e.g. 'ㅐ플' -> '애플') @@ -427,10 +406,8 @@ References:
11. j2hcj -
This converts a string of jamo to Hangul Compatibility Jamo. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -456,10 +433,8 @@ References:
12. hcj2h -
This converts a string of Hangul Compatibility Jamo to Hangul. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -485,10 +460,8 @@ References:
13. hcj2j -
This converts a string of Hangul Compatibility Jamo to jamo. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - position (`str`): the position of the HCJ character to convert to jamo character, one of 'lead', 'vowel', 'tail' @@ -515,10 +488,8 @@ References:
14. is_jamo -
This checks if a character is a jamo character. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -544,10 +515,8 @@ References:
15. is_jamo_modern -
This checks if a character is a modern jamo character. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -573,10 +542,8 @@ References:
16. is_hcj -
This checks if a character is a Hangul Compatibility Jamo character. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -602,10 +569,8 @@ References:
17. is_hcj_modern -
This checks if a character is a modern Hangul Compatibility Jamo character. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -631,10 +596,8 @@ References:
18. is_hangul_char -
This checks if a character is a Hangul character. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -659,10 +622,8 @@ References:
19. select_josa -
This selects the correct josa for the given prefix. -
Args: - prefix (`Union[str, List[str]`): single prefix or list of prefixes - josa (`Union[str, List[str]`): single josa or list of josas @@ -689,10 +650,8 @@ References:
20. combine_josa -
This combines the given prefix and josa. -
Args: - prefix (`Union[str, List[str]`): single prefix or list of prefixes - josa (`Union[str, List[str]`): single josa or list of josas @@ -719,11 +678,9 @@ References:
21. extract_keywords -
This extracts keywords from the given text. This uses TextRank algorithm to extract keywords. -
Args: - text (`Union[str, List[str]`): single text or list of texts - num_keywords (`int`): the number of keywords to extract @@ -763,10 +720,8 @@ References:
22. split_morphemes -
This splits texts into morphemes. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts - backend (`str`): morpheme analyzer backend. 'mecab', 'pecab' are supported. @@ -790,10 +745,8 @@ Examples:
23. paradigm -
This searches paradigms of the given text. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -819,10 +772,8 @@ References:
24. anonymize -
This anonymizes sensitive information in the given text. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - phone_number_anonymization (`bool`): whether to anonymize phone numbers or not @@ -866,10 +817,8 @@ Examples:
25. clean_news -
This cleans news articles by removing useless headers and footers. -
Args: - text (`Union[str, List[str], Tuple[str]]`): Input text or list of texts. - min_sentences (`int`): Minimum number of sentences to keep. Defaults to 3. @@ -895,10 +844,8 @@ Examples:
26. is_completed_form -
This checks if the given text is in completed form. -
Args: - text (`Union[str, List[str], Tuple[str]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -924,7 +871,6 @@ False
27. get_all_completed_form_hangul_chars -
This returns all completed form Hangul characters. Returns: @@ -943,7 +889,6 @@ Examples:
28. get_all_incompleted_form_hangul_chars -
This returns all incompleted form Hangul characters. Returns: @@ -962,10 +907,8 @@ Examples:
29. filter_out -
This filters out bad text based on various conditions. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - min_length (`int`): minimum length of text @@ -1027,10 +970,8 @@ Examples:
30. half2full -
This converts half-width characters to full-width characters. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -1051,10 +992,8 @@ Examples:
31. normalize -
This normalizes text with various options. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - normalization_type (`Optional[str]`): normalization type @@ -1084,11 +1023,9 @@ Examples:
32. preprocess -
This preprocesses text with various options. This does 1) normalization, 2) filtering out, and 3) anonymization in order. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - normalization_type (`Optional[str]`): normalization type @@ -1171,10 +1108,8 @@ Returns:
33. reduce_char_repeats -
This reduces character repeats in text. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - num_repeats (`int`): the number of character that can be repeated @@ -1199,10 +1134,8 @@ References:
34. reduce_emoticon_repeats -
This reduces emoticon repeats in text. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - num_repeats (`int`): the number of emoticon that can be repeated @@ -1227,10 +1160,8 @@ References:
35. remove_invisible_chars -
This removes invisible characters from text. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - num_workers (`Union[int, str]`): the number of multiprocessing workers @@ -1251,10 +1182,8 @@ Examples:
36. qwerty -
This converts text from one language to another using QWERTY keyboard layout. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - src (`str`): source language @@ -1282,10 +1211,8 @@ References:
37. romanize -
This romanizes Korean text. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - use_morpheme_info (`bool`): whether to use morpheme information or not @@ -1315,10 +1242,8 @@ References:
38. is_unsafe -
This checks if the text is unsafe or not. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list of texts - return_matches (`bool`): whether to return matches or not @@ -1352,10 +1277,8 @@ True
39. split_sentences -
This splits texts into sentences. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts - backend (`str`): morpheme analyzer backend. 'mecab', 'pecab', 'punct' are supported @@ -1380,10 +1303,8 @@ Examples:
40. correct_spacing -
This corrects the spacing of the text. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts - backend (`str`): morpheme analyzer backend. 'mecab', 'pecab', 'punct' are supported @@ -1409,10 +1330,8 @@ References:
41. summarize_sentences -
This summarizes the given text, using TextRank algorithm. -
Args: - text (`Union[str, List[str], Tuple[str]]`): single text or list/tuple of texts - backend (`str`): morpheme analyzer backend. 'mecab', 'pecab' are supported.