text pre processing 비교 #22

presto105 · 2021-10-03T06:13:06Z

pull request에 올려놓은 것처럼 전처리를 총 4 단계로 분리하여 성능비교를 해보았습니다.

0: remove_special_char, 독일어, 사우디어, 라틴어 제거
1: substitution_special_char, 특수문자 제거
2: substitution_date, 기간 표시 수정 '-' => '~' (ex: 1223년 – => 1223년 ~ )
3: add_space_char, 단어사이 ','로 붙어있는 것 띄어쓰기 추가

전체 적용은 아래와 같은 코드로 실험해보았습니다.
python train.py --PLM klue/roberta-large --preprocessing_cmb 0 1 2 3 --entity_flag --mecab_flag
성능비교 wandb 링크
https://wandb.ai/klue-level2-nlp-02/Relation-Extraction_1001/groups/klue%2Froberta-large_pp_test/workspace?workspace=user-presto105

단일 전처리 비교 (0 vs 1 vs 2 vs 3)

거의 비슷한 성능을 보이긴 하지만 0번과 3번을 적용하였을 때는 조금은 성능 증가 하는 듯함
전처리 2개 조합 (01 vs 02 vs 03 vs ....), 혼돈의 시작...

역시나 거의 비슷하지만 01 or 03 조합이 조금 성능 높여줌
전처리 3개 조합 (012 vs 013 vs 023 .... )

모든 전처리를 조합했을 때 보다 013 조합이 좋은 것 같습니다. 해당 실험을 위해서는 아래 코드를 적용시키시면 됩니다.
python train.py --PLM klue/roberta-large --preprocessing_cmb 0 1 3 --entity_flag --mecab_flag

The text was updated successfully, but these errors were encountered:

j961224 · 2021-10-03T06:58:30Z

와 고생하셨네요 bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text pre processing 비교 #22

text pre processing 비교 #22

presto105 commented Oct 3, 2021 •

edited

Loading

j961224 commented Oct 3, 2021

text pre processing 비교 #22

text pre processing 비교 #22

Comments

presto105 commented Oct 3, 2021 • edited Loading

j961224 commented Oct 3, 2021

presto105 commented Oct 3, 2021 •

edited

Loading