Skip to content

boostcampaitech6/level2-nlp-datacentric-nlp-09

Folders and files

NameName
Last commit message
Last commit date

Latest commit

daebbe1 ยท Apr 9, 2024

History

5 Commits
Jan 23, 2024
Feb 7, 2024
Apr 9, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024
Feb 7, 2024

Repository files navigation

๐Ÿ ๋ฉค๋ฒ„ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

์ „ํ˜„์šฑ ๊ณฝ์ˆ˜์—ฐ ๊น€๊ฐ€์˜ ๊น€์‹ ์šฐ ์•ˆ์œค์ฃผ
  • ์ „ํ˜„์šฑ
    • ํŒ€ ๋ฆฌ๋”, Label Error Detection, G2P Noise
  • ๊ณฝ์ˆ˜์—ฐ
    • ํŠน์ˆ˜๋ฌธ์ž ๋ฐ ํ•œ์ž ์ฒ˜๋ฆฌ, Back Translation
  • ๊น€๊ฐ€์˜
    • Semantic Similarity Analysis
  • ๊น€์‹ ์šฐ
    • Data Augmentation
  • ์•ˆ์œค์ฃผ
    • Text Keyword Extraction

๐Ÿ ํ”„๋กœ์ ํŠธ ๊ธฐ๊ฐ„

2024.01.24 10:00 ~ 2024.02.01 19:00

๐ŸŒ ํ”„๋กœ์ ํŠธ ์†Œ๊ฐœ

  • ์ž์—ฐ์–ด์—์„œ ๋…ํ•ด ๋ฐ ๋ถ„์„ ๊ณผ์ •์„ ๊ฑฐ์ณ ์ฃผ์–ด์ง„ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ž์—ฐ์–ด์˜ ์ฃผ์ œ์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ํ•„์ˆ˜์ ์ด๋‹ค. KLUE-Topic Classification benchmark๋Š” ๋‰ด์Šค์˜ ํ—ค๋“œ๋ผ์ธ์„ ํ†ตํ•ด ๊ทธ ๋‰ด์Šค๊ฐ€ ์–ด๋–ค topic์„ ๊ฐ–๋Š”์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•ด ๋‚ด๋Š” task๋กœ, ๊ฐ ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ์—์„œ ์ƒํ™œ๋ฌธํ™”, ์Šคํฌ์ธ , ์„ธ๊ณ„, ์ •์น˜, ๊ฒฝ์ œ, IT๊ณผํ•™, ์‚ฌํšŒ ๋“ฑ ๋‹ค์–‘ํ•œ ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜๋กœ ๋ผ๋ฒจ๋งํ•œ๋‹ค.
  • ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” Data-Centric์˜ ๋ชฉ์ ์— ๋งž๊ฒŒ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ์˜ ์ˆ˜์ • ์—†์ด ์˜ค๋กœ์ง€ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜์ •์œผ๋กœ๋งŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋Œ์–ด๋‚ด์•ผ ํ•œ๋‹ค.

๐Ÿฅฅ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

  • Train Data : 7,000๊ฐœ
  • Test Data : 47,785๊ฐœ

๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์กฐ

Column ์„ค๋ช…
ID ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ๊ณ ์œ ๋ฒˆํ˜ธ
text ๋ถ„๋ฅ˜์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ์—ฐํ•ฉ ๋‰ด์Šค ๊ธฐ์‚ฌ์˜ ํ—ค๋“œ๋ผ์ธ. ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ์— ์ผ๋ถ€ ์˜์–ด, ํ•œ์ž ๋“ฑ์˜ ๋‹จ์–ด๊ฐ€ ํฌํ•จ
target ์ •์ˆ˜๋กœ ์ธ์ฝ”๋”ฉ๋œ ๋ผ๋ฒจ
url ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ๋‰ด์Šค url (์ถœ์ฒ˜)
date ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ๋‰ด์Šค๊ฐ€ ์ž‘์„ฑ๋œ ๋‚ ์งœ์™€ ์‹œ๊ฐ„

Label Class ๊ธฐ์ค€

id 0 1 2 3 4 5 6
์„ค๋ช… IT๊ณผํ•™ ๊ฒฝ์ œ ์‚ฌํšŒ ์ƒํ™œ๋ฌธํ™” ์„ธ๊ณ„ ์Šคํฌ์ธ  ์ •์น˜

ํ‰๊ฐ€ ์ง€ํ‘œ

  • macro F1 score : ๋ชจ๋“  class f1 score์˜ ํ‰๊ท 
  • accuracy

๐Ÿคฟ ์‚ฌ์šฉ ๋ชจ๋ธ

  • klue/bert-base (๊ณ ์ •)

๐Ÿ‘’ ํด๋” ๊ตฌ์กฐ

.
|-- README.md
|-- Special_character_check.ipynb
|-- back_translation.ipynb
|-- category_per_cnt.ipynb
|-- category_word_add.ipynb
|-- data
|   |-- culture.txt
|   |-- economy.txt
|   |-- it_science.txt
|   |-- politics.txt
|   |-- society.txt
|   |-- sport.txt
|   |-- train_special_characters.csv
|   `-- world.txt
|-- error_detection.ipynb
|-- functions.py
|-- g2pk.ipynb
|-- hanja.ipynb
|-- kmeans.ipynb
|-- sentence_similarty.py
|-- special_character.ipynb
`-- wrap-up_report.pdf

๐Ÿธ Leaderboard

f1 accuracy
Public 0.8454 0.8484
Private 0.8414 0.8443

About

level2-nlp-datacentric-nlp-09 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published