Skip to content
View doeun-235's full-sized avatar
  • Seoul, Korea

Block or report doeun-235

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
doeun-235/README.md

KOR ยท ENG


๐Ÿ‘‹ Hi there, I'm Doeun Oh.

Tech Stack

  • NumPy, Pandas, Keras, Scikit-learn, Matplotlib, MySql

์ฃผ์š” ๊ฒฝํ—˜

๊ฐœ์š”

  • 24.07.24 - 24.08.21, 24.09.27 - (ํ˜„์žฌ)
  • Libraries : huggingface, langchain, peft, faiss, trl, pymupdf, gmft
  • ์ฃผ์–ด์ง„ ์žฌ์ •์ •๋ณด pdf ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์งˆ๋ฌธ์— ๋‹ต๋ณ€ํ•˜๋Š” gemma2 ๊ธฐ๋ฐ˜ LLM ๋ชจ๋ธ์„ RAG, LoRA๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต.
  • ๋Œ€ํšŒ ์„ฑ์ 
    • metric : ๋ฌธ์žฅ์—์„œ ๋ฌธ์ž ๋‹จ์œ„์˜ F1 score
    • Public 0.666, Private 0.673, ์ตœ์ข…์ˆœ์œ„ 38/359 (์ƒ์œ„ 10.58%)
  • ๊ฒฝ์ง„๋Œ€ํšŒ ๋งˆ๊ฐ ์ดํ›„, ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์œ„ํ•œ ์‹คํ—˜ ์„ค๊ณ„ ๋ฐ ์‹คํ—˜ ์ง„ํ–‰ ์ค‘
    • ํ˜„์žฌ ์„ฑ์  | Public 0.715 : ํ˜„์žฌ 25/359, Private 0.693 : ๋Œ€ํšŒ ์ข…๋ฃŒ ์‹œ์ ์˜ 23๋“ฑ์— ์ค€ํ•˜๋Š” ์„ฑ์ 

๊ธฐ์—ฌ

  • pymupdf์™€ gmft๋ฅผ ์ด์šฉํ•œ ํ‘œ ์ „์ฒ˜๋ฆฌ, ์ฝ”๋“œ ๋ฆฌํŒฉํ† ๋ง ๋“ฑ์— ๊ธฐ์—ฌ
    • ํ‘œ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด Public ๊ธฐ์ค€, 0.657์—์„œ 0.666์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ , ์ดํ›„ 0.690์œผ๋กœ ์ฆ๊ฐ€

๊ฐœ์š”

  • 24.07.10 - 24.07.22, 24.10.19~24.10.23

  • Libraries : NumPy, Pandas, Matplotlib, Beautifulsoup, re, Scikit-learn, xgboost, Mecab

  • ์•Œ๋ผ๋”˜ 00๋…„ 1์›” 1์ฃผ์ฐจ ~ 24๋…„ 7์›” 2์ฃผ์ฐจ์˜ ๋ฒ ์ŠคํŠธ์…€๋Ÿฌ ๋ชฉ๋ก์„ ํฌ๋กค๋งํ•˜์—ฌ 141.5๋งŒ ํ–‰์˜ DB ๊ตฌ์ถ•

    • 15.8๋งŒ ์—ฌ์ข…์˜ ๋„์„œ์— ๋Œ€ํ•˜์—ฌ, ํ•ด๋‹น ์ฃผ์ฐจ์—์„œ์˜ ์ˆœ์œ„ ๋ฐ ๋„์„œ ๊ด€๋ จ ์ •๋ณด๋ฅผ ํฌํ•จ
  • ์ฃผ๊ฐ„ ๋ฒ ์ŠคํŠธ ์…€๋Ÿฌ DB๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ, 78๋งŒ ํ–‰์˜ ์•Œ๋ผ๋”˜ ์ค‘๊ณ  ๋งค์žฅ์˜ ์ค‘๊ณ  ๋„์„œ DB ๊ตฌ์ถ•

    • 10.3๋งŒ ์—ฌ์ข…์˜ ์—ญ๋Œ€ ๋ฒ ์ŠคํŠธ์…€๋Ÿฌ ๋„์„œ์— ๋Œ€ํ•œ ์ค‘๊ณ  ๋„์„œ ๋งค๋ฌผ ๋ฐ์ดํ„ฐ
  • XGBoost Regressor๋ฅผ ์ด์šฉํ•˜์—ฌ ์ค‘๊ณ ๊ฐ€ ์˜ˆ์ธก ๋ชจ๋ธ ๊ฐœ๋ฐœ

    • cross validation๊ณผ grid search๋ฅผ ์ด์šฉํ•˜์—ฌ 486๊ฐœ์˜ ์กฐํ•ฉ ์ค‘ ์šฐ์ˆ˜ hyperparameter 14๊ฐœ๋ฅผ ์ถ”๋ฆผ
      • Python API ๋ฐ cupy๋ฅผ ์ด์šฉํ•˜์—ฌ GridSearchCV๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ์—ฐ์‚ฐ ์†๋„๋ฅผ ๊ฐœ์„ 
    • ์šฐ์ˆ˜ hyperparameter๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ๋“ค์— ๋Œ€ํ•ด์„œ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ํ‰๊ฐ€
      • test 1 : ์ดˆ๊ธฐ์— test set์œผ๋กœ ๋‚˜๋ˆˆ ๋ฐ์ดํ„ฐ๋กœ ํ‰๊ฐ€
      • test 2 : test set ์ค‘ train set์— ํฌํ•จ๋œ ์  ์—†๋Š” ์ข…๋ฅ˜์˜ ๋„์„œ์— ํ•œํ•ด์„œ ํ‰๊ฐ€
  • Best model

    • ๋…๋ฆฝ๋ณ€์ˆ˜ : ์ค‘๊ณ ํ’ˆ์งˆ, ์ทจ๊ธ‰์ง€์ , ๋„์„œ๋ช…, ๋„์„œ๋ช…์— ํฌํ•จ๋œ ๋ถ€๊ฐ€์  ๋ฌธ๊ตฌ(์–‘์žฅ๋ณธ, ํ•œ์ •ํŒ ๋“ฑ), ์ €์ž, ๊ธฐํƒ€ ์ €์ž, ์ถœํŒ์‚ฌ, ์ถœ๊ฐ„์ผ, ์ •๊ฐ€, ๋Œ€๋ถ„๋ฅ˜
    • hyperparameter
      • num_boost_round : 2500
      • learning_rate : 0.3
      • max_depth : 6
      • min_child_weight : 4
      • colsample_bytree : 1
      • subsample : 1

    h5_rslt

    ๋„ํ‘œ. best model์˜ ์˜ˆ์ธก๊ฐ’ ๋ฐ ์˜ค์ฐจ ๋ถ„ํฌ์™€ ์„ฑ๋Šฅ

    h5_fi

    ๋„ํ‘œ. best model์˜ feature importance
RMSE R2 score N
test 1 610.7 0.973 784,213
test 2 1,440 0.914 5,968
harmonic mean 857.8 0.943
๋„ํ‘œ. test๋ณ„ ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ ๋ฐ XGBoost Regressor์—์„œ์˜ ์ตœ๊ณ  ์„ฑ์ 

๊ธฐ์—ฌ

  • ์กฐ์žฅ์œผ๋กœ์„œ ํ”„๋กœ์ ํŠธ ๊ธฐํš ๋ฐ ์ง„ํ–‰
  • ํฌ๋กค๋ง ์ฝ”๋“œ ๊ฐœ๋ฐœ, DB ๋ฐ model์˜ prototype ๊ฐœ๋ฐœ, ์‹คํ—˜ ์„ค๊ณ„, ์ง„ํ–‰ ๋ฐ ํ‰๊ฐ€ ๋“ฑ์— ๊ธฐ์—ฌ

๋ฐฐ์šด ์ 

  • ์ ์ ˆํ•œ ๋ชจ๋“ˆํ™”๊ฐ€ ๊ฐœ๋ฐœ์˜ ํšจ์œจ์„ฑ ๋ฐ ์ฝ”๋“œ์˜ ๊ฐ€๋…์„ฑ์— ์ฃผ๋Š” ์˜ํ–ฅ๋ ฅ์„ ์ฒด๊ฐํ•จ.
  • ์†Œ์ˆ˜์˜ ์ƒ˜ํ”Œ๋กœ ๋น ๋ฅธ ๊ฐœ๋ฐœ์„ ์ง„ํ–‰ํ•˜์—ฌ, ํ˜„์žฌ์˜ ๋ฐฉ๋ฒ•๋ก ์ด ๊ฐ€๋Šฅํ•œ์ง€ ํ˜น์€ ์ ์ ˆํ•œ์ง€ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ์ „๋žต์ ์œผ๋กœ ์œ ํšจํ•จ.
    • ํ”„๋กœ์ ํŠธ์˜ ๋ฐฉํ–ฅ์„ฑ์„ ์žก๋Š”๋ฐ ๋„์›€์ด ๋˜๊ณ , ์ข‹์€ baseline์˜ ๊ธฐ์ค€์ด ๋  ์ˆ˜ ์žˆ์Œ.
    • ๋น ๋ฅด๊ฒŒ prototype๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š”๋ฐ ๋„๋ฉ”์ธ ์ง€์‹ ๋“ฑ์„ ์ด์šฉํ•ด ํœด๋ฆฌ์Šคํ‹ฑํ•œ ํŒ๋‹จ์„ ํ•˜๋Š” ๊ฒƒ์€ ์œ ํšจํ•œ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Œ.
    • ํ•˜์ง€๋งŒ ํœด๋ฆฌ์Šคํ‹ฑํ•œ ๊ฒฐ์ •๋“ค์— ๋Œ€ํ•ด์„œ ์ฒด๊ณ„์ ์ธ ๊ธฐ์ค€์„ ์„ธ์šฐ๊ธฐ ์œ„ํ•ด์„œ๋Š” ์˜ˆ์ƒ๋ณด๋‹ค ํฐ ๋…ธ๊ณ ๊ฐ€ ๋“ค ์ˆ˜ ์žˆ์Œ.
  • ๋ชจ๋ธ์ด ์ ‘ํ•œ์  ์—†๋Š” ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ถ”๊ฐ€์ ์ธ test๋ฅผ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ํ•™์Šต ์ •๋„์— ๋Œ€ํ•ด์„œ ์ ๊ทน์ ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์Œ.
    • train set์— ํฌํ•จ ๋œ ์  ์—†๋Š” ์ข…๋ฅ˜์˜ ๋„์„œ์— ๋Œ€ํ•ด์„œ๋งŒ ์ถ”๊ฐ€์ ์ธ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰.
    • ํ•ด๋‹น test์—์„œ๋„ ์„ฑ์ ์ด ํฐ ์ฐจ์ด ๋‚˜์ง€ ์•Š๊ฒŒ ์ž˜ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•จ.
    • ๋„์„œ ๋ณ„ ๊ฐ€๊ฒฉ์„ ๋ชจ๋ธ์ด ์™ธ์šด ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋ธ์ด ๋ฐ˜์˜ํ•˜๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜๊ณ  ์žˆ์—ˆ์Œ.
  • ๋ฐ์ดํ„ฐ ์…‹์˜ column ์ค‘ ๋ถˆ๋ช…ํ™•ํ•œ ๊ฒƒ์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„๋„, ๋ชจ๋ธ์˜ ๋ณต์žก๋„๋ฅผ ์œ ํšจํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ๋†’ํžˆ๋ฉด ์„ฑ๋Šฅ์ด ์ข‹๊ณ  ๋” ๊ฐ•๊ฑดํ•œ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธ.
    • ์•Œ๋ผ๋”˜์ด ๊ฐœ๋ฐœํ•œ ํŒ๋งค์ง€์ˆ˜(SalesPoint)๋ฅผ ์ค‘๊ณ ๋„์„œ ์˜ˆ์ธก์— ์ด์šฉํ•˜๋ฉด, ๋‹จ์ˆœํ•œ ๋ชจ๋ธ๋กœ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์—ˆ์ง€๋งŒ ๋‹จ์ ๋„ ์žˆ์—ˆ์Œ
    • best model์— ์“ฐ์ธ hyperparamter๋ฅผ ํฌํ•จํ•˜์—ฌ, ๋™์ผํ•œ hyperparameter๋กœ SalesPoint๋ฅผ ์ œ์™ธํ•˜๊ณ  ํ•™์Šต์‹œ์ผฐ์„ ๋•Œ ์„ฑ๋Šฅ์ด ๋” ์ข‹๊ณ  ๋” ๊ฐ•๊ฑดํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋ช‡ ์žˆ์—ˆ์Œ.
    • ์ถ”์‚ฐ๋ฒ•์ด ๊ณต๊ฐœ ๋˜์ง€ ์•Š์•„ ๋ถˆ๋ช…ํ™•ํ•  ๋ฟ ์•„๋‹ˆ๋ผ, ์ค‘๊ณ ๊ฐ€ ์˜ˆ์ธก์˜ ์„ฑ๋Šฅ์„ ๋” ๊ณ ๋„ํ™”ํ•˜๋Š” ๋‹จ๊ณ„์—์„œ๋Š” ๋ฐฉํ•ด๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํŒ๋‹จ.
  • ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ๋กœ ๋ฆฌ๋ฒ„์Šค ์—”์ง€๋‹ˆ์–ด๋ง์„ ์ง„ํ–‰ํ•˜์—ฌ, ์‹œ์Šคํ…œ์ด ์–ด๋Š ์ •๋„๋กœ ๋ณต์žกํ•˜๊ฑฐ๋‚˜ ๋‹จ์ˆœํ•œ์ง€ ํ‰๊ฐ€ํ•ด๋ณผ ์ˆ˜ ์žˆ์Œ.
    • ๊ฐ„๋‹จํ•˜๊ณ  ๊ธฐ๋ณธ์ ์ธ ์ „์ฒ˜๋ฆฌ๋งŒ ์ง„ํ–‰ํ•œ ์ƒํ™ฉ์—์„œ, ๋ชจ๋ธ์ด ์ฒ˜์Œ ๋ณด๋Š” ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ XGBoost ๋งŒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํžˆ ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์—ˆ์Œ.
    • ์•Œ๋ผ๋”˜์—์„œ ์ค‘๊ณ ๋„์„œ์— ๋Œ€ํ•ด์„œ ์ €์ž, ์ถœํŒ์‚ฌ, ์ค‘๊ณ  ํ’ˆ์งˆ ๋“ฑ์„ ๊ธฐ์ค€์œผ๋กœ ๊ฐ€๊ฒฉ์„ ์ฑ…์ •ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ฑ…์ • ์‹œ์Šคํ…œ์ด ์•„์ฃผ ๋ณต์žกํ•˜์ง€๋Š” ์•Š์œผ๋ฆฌ๋ผ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ์—ˆ์Œ.
  • ์—ฐ์‚ฐ๋Ÿ‰์˜ ๊ด€์ ์—์„œ grid search๋Š” hyperparameter ํƒ์ƒ‰์— ๋งค์šฐ ๋น„ํšจ์œจ์ .
    • grid search๋ฅผ ์ด์šฉํ•˜๋ฉด hyperparameter์— ๋”ฐ๋ฅธ ๋ณ€ํ™”๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์–ด ๊ฒฐ๊ณผ ๋ถ„์„๊ณผ ์•ž์œผ๋กœ์˜ ๋ฐฉํ–ฅ ์„ค์ •์— ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์ง€๋งŒ, ์—ฐ์‚ฐ๋Ÿ‰์˜ ์ธก๋ฉด์—์„œ ์ง€๋‚˜์น˜๊ฒŒ ๋น„ํšจ์œจ์ .
    • ๋ชจ๋ธ์— ๋งž๊ฒŒ hyperparameter์˜ ํƒ์ƒ‰ ์ˆœ์„œ๋ฅผ ์„ค์ •ํ•˜๊ฑฐ๋‚˜, Bayesian search ๋“ฑ์„ ํ™œ์šฉํ•˜๋ฉด ์—ฐ์‚ฐ์— ๋“œ๋Š” ์ž์› ๋ฐ ์‹œ๊ฐ„์„ ๋ณด๋‹ค ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์—ˆ์„ ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€.
  • ๋ช‡ ์‹ญ๋งŒ ๊ฐœ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ๋ฅผ XGBoost์— ์ ์šฉํ•˜๊ณ ์ž ํ•˜๋ฉด, Sci-kit API ๋ณด๋‹ค Python API๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ์—ฐ์‚ฐ ์†๋„์˜ ๋ฉด์—์„œ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ๊ณ , ํŠนํžˆ cupy๋ฅผ ์ด์šฉํ•ด gpu๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์—ฐ์‚ฐ์†๋„๋ฅผ ๋น„์•ฝ์ ์œผ๋กœ ๋น ๋ฅด๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Œ.

๊ฐœ์š”

  • 24.06.14 - 24.06.24
  • Libraries : NumPy, Pandas, Matplotlib, Scikit-learn, PyTorch, Jit
  • ๋ฏธ๊ตญ ๋Œ€๋„์‹œ ๋ณด๊ฑด ๋ฐ์ดํ„ฐ์…‹(BCHI Dataset)์€ 35๊ฐœ ๋Œ€๋„์‹œ์˜ 16์ข…์œผ๋กœ ์ธตํ™”๋œ ์ธ์ข… ยท ์„ฑ๋ณ„ ์ธ๊ตฌ ์ง‘๋‹จ ๋ณ„๋กœ ๋‹ค์–‘ํ•œ ํ†ต๊ณ„ํ•ญ๋ชฉ์„ 2010-2022 ๋™์•ˆ ์ง‘๊ณ„ํ•œ ๋ฐ์ดํ„ฐ ์…‹.
    • ํ†ต๊ณ„ ํ•ญ๋ชฉ์€ All Cancer Death, Lung Cancer Death, Diabetes Death, Drug Overdose Death ๋“ฑ ์ด 118 ์ข…์œผ๋กœ ๊ตฌ์„ฑ.
      • e.g. "Minneapolis์—์„œ 2015๋…„์— ์ธ์ข… ์ƒ๊ด€์—†์ด ์—ฌ์„ฑ์— ๋Œ€ํ•ด All Cancer Death๋ฅผ ์กฐ์‚ฌํ•œ ๊ฒฐ๊ณผ, ์‹ญ๋งŒ๋ช…๋‹น 157๋ช…"
    • ๊ฐ ๋Œ€๋„์‹œ๋Š” '์ง€์—ญ'/ '๊ฒฝ์ œ์  ๋นˆ๊ณค'/ '์ธ๊ตฌ'/ '์ธ๊ตฌ๋ฐ€๋„'/ '์ธ์ข…๋ณ„ ๊ฑฐ์ฃผ์ง€ ๋ถ„๋ฆฌ ์ •๋„' 5๊ฐ€์ง€ ํŠน์„ฑ์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฅ˜ ๋˜์–ด ์žˆ์Œ.
      • 35๊ฐœ ๋„์‹œ๊ฐ€ ์ด 19์ข…์˜ ๋„์‹œ ์œ ํ˜•์œผ๋กœ ๋ถ„๋ฅ˜๋จ.
      • e.g. "Minneapolis์˜ ๋„์‹œ ์œ ํ˜• : ์ค‘์„œ๋ถ€, ๋œ ๋นˆ๊ณคํ•œ, ์ธ๊ตฌ๊ทœ๋ชจ๊ฐ€ ์ž‘์€, ๋‚ฎ์€ ์ธ๊ตฌ๋ฐ€๋„, ์ธ์ข… ๋ณ„ ๊ฑฐ์ฃผ์ง€ ๋ถ„๋ฆฌ ์ •๋„๊ฐ€ ๋‚ฎ์€ ๋„์‹œ"
  • BCHI Dataset์˜ ๋‹ค์–‘ํ•œ ํ†ต๊ณ„ ํ•ญ๋ชฉ๊ณผ ์ธ์ข…, ์„ฑ๋ณ„, ๋„์‹œ์œ ํ˜•์˜ ์ธตํ™” ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•ด๋‹น ์ง‘๋‹จ์˜ ํŠน์ • ํ†ต๊ณ„ ํ•ญ๋ชฉ์˜ ๊ฐ’์„ ํšŒ๊ท€ ์˜ˆ์ธกํ•˜๋Š” ํ”„๋กœ์ ํŠธ ์ง„ํ–‰.
    • All Cancer Deaths, Lung Cancer Deathes ๋“ฑ ์ด 14๊ฐ€์ง€ ํ†ต๊ณ„ ํ•ญ๋ชฉ์— ๋Œ€ํ•˜์—ฌ ํšŒ๊ท€ ์˜ˆ์ธก ์ง„ํ–‰.
    • e.g. ๋„์‹œ์˜ ํŠน์„ฑ,์ธ์ข…,์„ฑ๋ณ„๋กœ ์ธตํ™”๋œ ์ธ๊ตฌ์ง‘๋‹จ์— ๋Œ€ํ•˜์—ฌ, ์ธตํ™”๋œ ์ •๋ณด ๋ฐ Adult Physical Inactivity, Diabetes, Teen Obesity, Adult Obesity, Population : Seniors, Income : Poverty in All Ages ๋“ฑ์˜ ํ†ต๊ณ„๊ฐ’๋ฅผ ์ด์šฉํ•˜์—ฌ, All Cancer Deaths ํ†ต๊ณ„๊ฐ’์„ ์˜ˆ์ธก
    • ์˜ˆ์ธก ๋ฐฉ๋ฒ•์œผ๋กœ XGBoost Regressor, Random Forest Regressor, Multilayer Perceptron, k-NN Regressor์„ ์‚ฌ์šฉ.
      • k-NN์˜ ๊ฒฝ์šฐ๋Š” ์ธตํ™” ํ•ญ๋ชฉ์— ๋Œ€ํ•ด $L_p$ norm์„ ์‘์šฉํ•œ custom metric์„ ์ด์šฉํ•ด ์˜ˆ์ธกํ•˜๊ณ , ๋‹ค๋ฅธ ์ฐธ๊ณ  ํ•ญ๋ชฉ์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ.
      • ๊ธฐํƒ€ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ๊ฒฐ์ธก ๊ฐ’๋“ค์„ ์ œ์™ธํ•˜๊ณ  ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ๊ฒฝ์šฐ์™€ ๊ฒฐ์ธก๊ฐ’์„ k-NN์„ ์ด์šฉํ•œ ์˜ˆ์ธก๊ฐ’์œผ๋กœ ๋ณด๊ฐ„ํ•œ ๋’ค ์ง„ํ–‰ํ•œ ๊ฒฝ์šฐ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•จ.
      • ํ‰๊ฐ€ metric์œผ๋กœ RMSE, MAPE, R2 score ๋“ฑ์„ ์‚ฌ์šฉ.
      • ํ†ต๊ณ„ ํ•ญ๋ชฉ ๋ณ„๋กœ ์ฐจ์ด๊ฐ€ ์žˆ์ง€๋งŒ, k-NN, k-NN์œผ๋กœ ๊ฒฐ์ธก์„ ๋ณด๊ฐ„ํ•œ XGBoost, k-NN์œผ๋กœ ๊ฒฐ์ธก์„ ๋ณด๊ฐ„ํ•˜์ง€ ์•Š์€ XGBoost ์„ธ ๋ชจ๋ธ์—์„œ ์„ฑ๋Šฅ์ด ์ œ์ผ ๋†’๊ฒŒ ๋‚˜์˜ด.
์˜ˆ์ธก ๋ชฉํ‘œ ํ•ญ๋ชฉ ์ฐธ๊ณ  ํ•ญ๋ชฉ
All Cancer Deaths Adult Physical Inactivity, Diabetes, Teen Obesity, Adult Obesity, Population : Seniors, Income : Poverty in All Ages, e.t.c.
Colorectal Cancer Deaths Teen Obesity, Adult Obesity, Health Insurance : Uninsured in All Ages, Births : Low Birthweight, Dietary Quality : Teen Soda, e.t.c.
๋„ํ‘œ. ๊ฐ ์˜ˆ์ธก ๋ชฉํ‘œ ํ•ญ๋ชฉ ๋ณ„๋กœ ์„ค์ •๋œ ์ฐธ๊ณ  ํ•ญ๋ชฉ ํ›„๋ณด์˜ ์˜ˆ์‹œ

๊ฒฐ๊ณผ๋น„๊ต

๋„ํ‘œ. k-NN, k-NN ์ „์ฒ˜๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ XGBoost, ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ XGBoost ๊ฐ„์˜ ์„ฑ๋Šฅ ๋น„๊ต

๊ธฐ์—ฌ

  • ์กฐ์žฅ์œผ๋กœ์„œ ํ”„๋กœ์ ํŠธ ๋ฐฉํ–ฅ ์ œ์‹œ.
  • ํ”„๋กœ์ ํŠธ ๋ฐฉํ–ฅ ๊ฒฐ์ •์„ ์œ„ํ•œ EDA, k-NN์—์„œ ์‚ฌ์šฉํ•œ custom metric ์ œ์‹œ ๋ฐ ๊ตฌํ˜„, k-NN์„ ํ™œ์šฉํ•œ ๊ฒฐ์ธก์น˜ ๋ณด๊ฐ„ ์ œ์•ˆ, ์ฝ”๋“œ ๋ฆฌํŒฉํ† ๋ง ๋“ฑ์— ๊ธฐ์—ฌ.

๋ฐฐ์šด ์ 

  • ํšŒ๊ท€ ์˜ˆ์ธก์„ ํ‰๊ฐ€ํ•  ๋•Œ, ํ‰๊ท  ์˜ค์ฐจ์— ๊ด€ํ•œ score์™€ r2 score๋ฅผ ๋ณตํ•ฉ์ ์œผ๋กœ ์ด์šฉํ•ด์•ผ ํ•จ์„ ์ตํž˜.
    • r2 score๊ฐ€ ์ข‹์„์ˆ˜๋ก x์—์„œ์˜ ์ฐจ์ด๊ฐ€ y๊ฐ’ ์˜ˆ์ธก์— ์ž˜ ๋ฐ˜์˜๋˜๊ณ  ์žˆ๊ณ , ํ‰๊ท  ์˜ค์ฐจ์— ๊ด€ํ•œ score(RMSE, MAPE ๋“ฑ)๊ฐ€ ์ข‹์„์ˆ˜๋ก ์‹ค์ œ๊ฐ’๊ณผ ์˜ค์ฐจ๊ฐ€ ์ ์€ ๊ฒƒ์„ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ์ง์ ‘์ ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Œ.
    • ์ผ๋ฐ˜์ ์œผ๋กœ ํ‰๊ท  ์˜ค์ฐจ์— ๊ด€ํ•œ score๊ฐ€ ์ข‹์„ ์ˆ˜๋ก r2 score๋„ ์ข‹์•˜์œผ๋‚˜, ํ•ญ์ƒ ๊ทธ๋Ÿฐ ๊ฒƒ์€ ์•„๋‹ˆ์—ˆ์Œ.
  • ๋ฐ์ดํ„ฐ ์…‹ ํŠน์„ฑ์— ๋”ฐ๋ผ, k-NN์„ ์ ์šฉํ•˜์—ฌ ๊ฒฐ์ธก ๋ณด๊ฐ„์„ ํ•˜๋Š” ๊ฒƒ์ด ์œ ํšจํ•  ์ˆ˜ ์žˆ์Œ.
    • ๋‹ค๋งŒ, ๋‹ค๋ฅธ ๋ณด๊ฐ„ ๋ฐฉ๋ฒ• ํ˜น์€ ๋ฐ์ดํ„ฐ๋ฅผ dropํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด ํ•ญ์ƒ ์••๋„์ ์œผ๋กœ ์ข‹์ง€๋Š” ์•Š์Œ.
      • ํ‰๊ท  ์˜ค์ฐจ์— ๊ด€๋ จ๋œ score๋Š” ๋Œ€๊ฐœ ์ข‹์•„์กŒ์ง€๋งŒ, r2 score๋Š” ๋‚˜๋น ์ง€๋Š” ๊ฒฝ์šฐ๋“ค์ด ์žˆ์—ˆ์Œ.
    • ๋„๋ฉ”์ธ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ custom metric์„ ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์ด ์œ ํšจํ•  ์ˆ˜ ์žˆ์Œ.
    • numpy ๋ฐ cython์— ๋งž๊ฒŒ ์ตœ์ ํ™”๋ฅผ ์‹œํ‚ค์ง€ ์•Š์„ ๊ฒฝ์šฐ, custom metric์„ scikit-learn ์˜ k-NN์— ์‚ฌ์šฉํ•˜๋ฉด ์†๋„๊ฐ€ ๋งค์šฐ ๋Š๋ฆผ.
      • ์•ฝ 4์ฒœ ~ 5์ฒœ ์—ฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด 3์ฒœ ~ 2์ฒœ ์—ฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ๋ถ„ ๋‹จ์œ„์˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆผ.
  • c๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฆฌํŒฉํ† ๋งํ•˜์—ฌ, Jit์„ ์ ์šฉ์‹œํ‚ฌ ๊ฒฝ์šฐ ์†๋„๊ฐ€ ๋น„์•ฝ์ ์œผ๋กœ ๋นจ๋ผ์ง.
    • custom metric์— Jit์„ ์ ์šฉํ•˜์ž, ๋ถ„ ๋‹จ์œ„์—์„œ ์ดˆ ๋‹จ์œ„๋กœ ๋นจ๋ผ์ง.
    • input์ด ํ•จ์ˆ˜์—์„œ ์ฒ˜๋ฆฌ๋  ๋•Œ ์ค‘๊ฐ„๊ฐ’์œผ๋กœ ๋ฌธ์ž๋ฅผ ๊ฒฝ์œ ํ•˜๋ฉด ์•ˆ๋จ.
    • dict ์ž๋ฃŒํ˜•์„ ์‚ฌ์šฉํ•˜๋ฉด ์•ˆ๋˜๊ณ , array๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•จ.
  • baseline์„ ์žก๊ธฐ ์œ„ํ•ด XGBoost ๋“ฑ์˜ machine learning์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐœ๋ฐœ ์†๋„ ๋“ฑ์˜ ์ธก๋ฉด์—์„œ ๋งค์šฐ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Œ.
  • Cucker-Smale ๋ชจ๋ธ์€ ๋น„์„ ํ˜• ODE system์œผ๋กœ, ์šด๋™ํ•˜๋Š” ๋ฌผ์ฒด๋“ค์ด ์ƒ๋Œ€์†๋„ ์ •๋ณด๋ฅผ ์ฃผ๊ณ  ๋ฐ›์Œ์œผ๋กœ์จ ๊ฐ™์€ ์†๋„๋กœ ๋™๊ธฐํ™” ๋˜์–ด ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ.
  • Cucker-Smale ๋ชจ๋ธ ๋ฐ ๊ทธ ํ™•์žฅ๋“ค์˜ ์ˆ˜์น˜์  ํ•ด๋ฅผ ๊ตฌํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ง„ํ–‰.
    • NumPy๋ฅผ ์ด์šฉํ•ด ODE์˜ ์ˆ˜์น˜์  ํ•ด๋ฅผ ๊ตฌํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜(Runge-Kutta 4th order) ๋ฐ SDE์˜ ์ˆ˜์น˜์  ํ•ด๋ฅผ ๊ตฌํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜(Improved Euler-Maruyama Method)๋ฅผ ๊ตฌํ˜„ํ•จ.
    • Matplotlib์„ ์ด์šฉํ•ด ์ด๋ก ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ด ๋ถ€ํ•ฉํ•จ์„ ์‹œ๊ฐํ™”ํ•˜๊ณ , ์„ค๊ณ„์— ๋งž๊ฒŒ ์šด๋™์ด ๋™๊ธฐํ™” ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ์‹œ์—ฐ ์˜์ƒ ์ œ์ž‘.
  • ์„์‚ฌ ํ•™์œ„ ๋…ผ๋ฌธ : "Flocking Behavior in Stochastic Cucker-Smale Model with Formation Control on Symmetric Digraphs" (๊ฐœ๋ช… ์ „ ์ด๋ฆ„์œผ๋กœ ํ‘œ๊ธฐ๋จ)
    • ์šด๋™ํ•˜๋Š” ๋ฌผ์ฒด๋“ค์ด ์˜๋„๋œ ๋ชจ์–‘์˜ ๊ตฐ์ง‘์„ ์ด๋ฃจ๋„๋ก ๋™๊ธฐํ™” ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ƒํ˜ธ์ž‘์šฉ์˜ ์˜ˆ์‹œ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์ œ์‹œ.
    • ์ƒ๋Œ€์œ„์น˜ ๋ฐ ์ƒ๋Œ€์†๋„์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋กœ ํ‘œํ˜„๋˜๋Š” ํž˜์„ ๋…ธ์ด์ฆˆ๊ฐ€ ์„ž์ธ ํ˜•ํƒœ๋กœ ๋ฌผ์ฒด๋“ค ๊ฐ„์— ์ฃผ๊ณ ๋ฐ›๋Š” ์‹œ์Šคํ…œ.
    • Cucker-Smale์„ ํ™•๋ฅ  ๋ฏธ๋ถ„๋ฐฉ์ •์‹์œผ๋กœ ํ™•์žฅํ•œ ๋ชจ๋ธ๋กœ, ์—๋„ˆ์ง€ ๊ด€๋ จ ์ง€ํ‘œ๋ฅผ ์ œ์‹œํ•ด ํŠน์ • ์กฐ๊ฑด์—์„œ ํ•ด์˜ ์กด์žฌ์„ฑ๊ณผ ์ˆ˜๋ ด์„ฑ์„ ๋ณด์ž„.
  • ํ›„์† ์—ฐ๊ตฌ ๋…ผ๋ฌธ : "Controlled pattern formation of stochastic Cucker-Smale systems with network structures"
    • ์œ„ ๋ชจ๋ธ์—์„œ์˜ ์ˆ˜๋ ด ์†๋„์— ๋Œ€ํ•œ ์ด๋ก ์  ยท ์ˆ˜์น˜์  ์ถ”์ •์„ ์ง„ํ–‰.
    • SCIE๊ธ‰ ์ €๋„์ด์ž SCOPUS ๋“ฑ์žฌ์ง€์ธ "Communications in Nonlinear Science and Numerical Simulation"์— ๊ฒŒ์žฌ.
    • ๊ธฐ์—ฌ : ๋ชจ๋ธ ์ œ์•ˆ, ํ•ด์˜ ์กด์žฌ์„ฑ ๋ฐ ์ˆ˜๋ ด์„ฑ ์ฆ๋ช…, ์ˆ˜์น˜์  ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ตฌํ˜„, ์ง„ํ–‰ ๋ฐ ์ด๋ก ์— ๋ถ€ํ•ฉ๋˜๋Š”์ง€ ๊ฒ€ํ†  ๋“ฑ์— ๊ธฐ์—ฌ

์ด๋ก ์‹œ๊ฐํ™”

๋„ํ‘œ. ๋ณ€์ˆ˜ ๋ณ„ ๊ธฐ๋Œ€๊ฐ’ ๊ฐ„์˜ ๋ถ€๋“ฑ์‹์ด ์ด๋ก ์— ๋งž๊ฒŒ ์„ฑ๋ฆฝํ•จ์„ ๋ณด์ธ ์˜ˆ์‹œ

์‹œ๋ฎฌ๋ ˆ์ด์…˜

๋„ํ‘œ. ์ด๋ก ์— ๋งž๊ฒŒ ์„ค๊ณ„๋Œ€๋กœ ์šด๋™์ด ๋™๊ธฐํ™” ๋จ์„ ๋ณด์ธ ์˜ˆ์‹œ

๊ฒฝ๋ ฅ

์ฃผ์‹ํšŒ์‚ฌ ๋”ฅ๋ฉ”ํŠธ๋ฆญ์Šค

  • Researcher / 22.06 - 23.05
  • ์„œ์šธ๋Œ€๋ณ‘์› ์ธ๊ณตํ˜ธํก๊ธฐ ์ž์œจ์ฃผํ–‰ AI ํ”„๋กœ์ ํŠธ ๋ฐ ๋ถ„๋‹น ์„œ์šธ๋Œ€ ๋ณ‘์› ์ธ๊ณตํ˜ธํก๊ธฐ ์ž์œจ์ฃผํ–‰ AI ํ”„๋กœ์ ํŠธ ์ฐธ์—ฌ
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ํ”„๋กœ์„ธ์Šค ๊ตฌ์ถ•, ์œ ์ง€๋ณด์ˆ˜ ๋ฐ ๊ฐœ์„ ์— ์ฐธ์—ฌ

๊ต์ˆ˜ ๊ฒฝํ—˜

  • ๊ณตํ•™์ˆ˜ํ•™ ์กฐ๊ต (์—ฐ์„ธ๋Œ€ํ•™๊ต)
    • 2018-2020 (4ํ•™๊ธฐ)
    • ์ˆ˜ํ•™ ์ด๋ก  ์„ค๋ช… ๋ฐ ๋ฌธ์ œํ’€์ด
      • ๋ฏธ์ ๋ถ„ํ•™, ์„ ํ˜•๋Œ€์ˆ˜, ์ƒ๋ฏธ๋ถ„๋ฐฉ์ •์‹ ๋ฐ ํŽธ๋ฏธ๋ถ„๋ฐฉ์ •์‹, ๋ณต์†Œํ•ด์„ ๋“ฑ.

ํ•™๋ ฅ

  • M.S in Mathematics, 2021 (Yonsei University, Seoul)
  • B.S in Mathematics & Philosophy, 2018 (Yonsei University,Seoul)

Pinned Loading

  1. Cucker-Smale-Model Cucker-Smale-Model Public

    Works about Cucker-Smale model and its extensions. =Keywords: ODE, Runge-Kutta methods, SDE, Euler-Maruyama method, NumPy, Matplotlib

    Python 6 2

  2. WASSUP-AIModel-3rd-Project1/Project-1 WASSUP-AIModel-3rd-Project1/Project-1 Public

    Regression model for Big City Health Inventory data ; statistics about health issuses stratifed with race, sex and properties of a city.

    Jupyter Notebook

  3. kdt-3-second-Project/aladin_usedbook kdt-3-second-Project/aladin_usedbook Public

    Jupyter Notebook 1

  4. theNocturni/WASSUP-DACON-FinAI theNocturni/WASSUP-DACON-FinAI Public

    Jupyter Notebook 1

  5. aladin_book_price aladin_book_price Public

    Forked from kdt-3-second-Project/aladin_usedbook

    Jupyter Notebook