Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2023-08-15] Intro to Database Systems 10๊ฐ• - Sorting & Aggregation Algorithms #10

Open
jonyejin opened this issue Aug 7, 2023 · 4 comments
Assignees
Labels
Database ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๐Ÿ“ ์Šคํ„ฐ๋”” ์ผ์ง€ ์Šคํ„ฐ๋”” ์ผ์ง€

Comments

@jonyejin
Copy link

jonyejin commented Aug 7, 2023

๐Ÿ“Œ ๊ฐœ์š”

โœ๏ธ ์ค€๋น„

์•„๋ž˜ ์–‘์‹์„ ๋ณต์‚ฌํ•ด ์Šคํ„ฐ๋”” ์ค€๋น„๋ฅผ ์™„๋ฃŒํ•ด์ฃผ์„ธ์š”!

## ๐Ÿ’ฌ ์ƒ๊ฐ

<!--๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  ์ƒ๊ฐํ•œ ๋‚ด์šฉ์„ 3์ค„ ๋‚ด๋กœ ๊ฐ„๋‹จํžˆ ์ •๋ฆฌํ•ด์ฃผ์„ธ์š”!-->

- ...

## โ“ ์งˆ๋ฌธ

<!--๊ณต๋ถ€ํ•˜๋ฉด์„œ ๊ถ๊ธˆํ•œ ์ ์„ ์ž‘์„ฑํ•ด์ฃผ์„ธ์š”!-->

- ...

## ๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ

<!--๊ณต๋ถ€ํ•˜๋ฉด์„œ ๊ฐ™์ด ๊ณต๋ถ€ํ•œ ์ž๋ฃŒ๋ฅผ ์ฒจ๋ถ€ ๋ฐ ๊ณต์œ ํ•ด์ฃผ์„ธ์š”!-->

- [...](...)
@CoodingPenguin CoodingPenguin changed the title [2023-08-15] Intro to Database Systems 9๊ฐ• -Sorting & Aggregations [2023-08-15] Intro to Database Systems 10๊ฐ• - Sorting & Aggregations Aug 7, 2023
@CoodingPenguin CoodingPenguin added ๐Ÿ“ ์Šคํ„ฐ๋”” ์ผ์ง€ ์Šคํ„ฐ๋”” ์ผ์ง€ Database ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค labels Aug 7, 2023
@CoodingPenguin CoodingPenguin changed the title [2023-08-15] Intro to Database Systems 10๊ฐ• - Sorting & Aggregations [2023-08-15] Intro to Database Systems 10๊ฐ• - Sorting & Aggregation Algorithms Aug 8, 2023
@CoodingPenguin
Copy link
Member

CoodingPenguin commented Aug 8, 2023

๐Ÿ’ฌ ์ƒ๊ฐ

  • ์—ฌ๊ธฐ๊นŒ์ง€ ๋“ฃ๊ณ ๋‚˜์„œ ๋“œ๋Š” ์ƒ๊ฐ์€ ํ•ด๊ฒฐํ•ด์•ผ ํ•˜๋Š” ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ , ์ด๋ฅผ ์ ์ ˆํ•œ ์ž๋ฃŒ๊ตฌ์กฐ์™€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•ด์„œ ํ•ด๊ฒฐํ•˜๋Š” ๋Š๋‚Œ์ด๋‹ค.
    • ๊ทผ๋ณธ ์ „๊ณต์€ ์ปดํ“จํ„ฐ๊ตฌ์กฐ, ์•Œ๊ณ ๋ฆฌ์ฆ˜, ์ž๋ฃŒ๊ตฌ์กฐ์ธ ๋“ฏ..
  • ์ •๋ ฌํ•  ๋•Œ ๊ฐ’(Value)๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ๋Š” Early Materialization(ํŠœํ”Œ ์ž์ฒด๋ฅผ ์ €์žฅ)๊ณผ Late Materialization(Record ID๋งŒ ์ €์žฅ)์ด ์žˆ๋‹ค.
  • ์ง‘๊ณ„ ๋ฐฉ์‹์œผ๋กœ๋Š” Sorting๊ณผ Hashing์ด ์žˆ๋‹ค.

โ“ ์งˆ๋ฌธ

  • 2-Way External Merge Sort์—์„œ Number of Passes์—์„œ $1 + [\log_2 N]$์—์„œ $1$์ด ์˜๋ฏธํ•˜๋Š” ๋ฐ”๋ฅผ ์ดํ•ด ๋ชปํ–ˆ๋‹ค.
    • N-Way External Merge Sort ๋ถ€๋ถ„์˜ Number of Passes๋„ ์„ค๋ช… ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค..
  • External Hashing Aggregate ๋ฐฉ์‹์˜ Phase 2๋Š” Aggregation ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•œ ๋‹จ๊ณ„์ธ์ง€?
    • ์›๋ž˜ ์˜ˆ์‹œ์ธ distinct ๊ฐ’๋งŒ ๊ตฌํ•˜๋Š” ๊ฒƒ Phase 1์ธ Partition๋งŒ ์žˆ์–ด๋„ ๋  ๊ฒƒ ๊ฐ™์•„์„œ ๋ฌผ์–ด๋ด…๋‹ˆ๋‹ค.
    • Phase 1์€ ๊ณ ์œ ํ•œ ๊ฐ’์˜ ๊ฐœ์ˆ˜๋งŒํผ ํŒŒํ‹ฐ์…˜์ด ์ƒ๊ธฐ๋Š” ๊ฒƒ์ด๋ผ ๋ณด๋ฉด ๋ ์ง€?

๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ

@jonyejin
Copy link
Author

jonyejin commented Aug 15, 2023

๐Ÿ“ ๋‚ด์šฉ

  • ์˜ค๋Š˜์€ ์ข€ ์žฌ๋ฏธ์žˆ์–ด๋ณด์ด๋Š” Query Executor์„ ๋‹ค๋ฃฌ๋‹ค!
  • ์™œ ์ •๋ ฌํ•ด์•ผํ•˜๋Š”๊ฐ€?
  • ์ธ๋ฉ”๋ชจ๋ฆฌ ์ •๋ ฌ
  • top-N heap sort(ORDER BY + LIMIT)
  • External Merge Sort
    • ์ •๋ ฌ -> ํ•ฉ์น˜๊ธฐ
    • 2-way external merge sort
    • double buffering optimization
    • comparison optimization
    • B+Tree์จ์„œ ์ •๋ ฌํ•˜๊ธฐ
  • Aggregations

@Nagunt
Copy link
Contributor

Nagunt commented Aug 15, 2023

๐Ÿ’ฌ ์ƒ๊ฐ

  • Query๋ฅผ ์–ด๋–ป๊ฒŒ ์‹คํ–‰ํ•˜๋Š”๊ฐ€? ๋ฅผ ๋ฐฐ์› ๋‹ค.
  • Query๋Š” ์ •๋ ฌ๋˜์–ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค. (ORDER BY)
    ๊ทธ๋Ÿฐ ๊ฒฝ์šฐ๊ฐ€ ์•„๋‹ˆ๋”๋ผ๋„, ์†๋„ ํ–ฅ์ƒ ์ด๋‚˜ ์—ฌ๋Ÿฌ ๊ธฐ๋Šฅ๋“ค์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ •๋ ฌ๋œ ์ƒํƒœ๋กœ ๊ฒฐ๊ณผ๊ฐ’์„ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • Top-N Heap Sort
  • External Merge Sort
  • Aggregations

๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ

@IamGroooooot
Copy link
Member

๐Ÿ’ฌ ์ƒ๊ฐ

์—ฌ๋Ÿฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์— ๊ด€ํ•ด ๋ฐฐ์› ๋‹ค.

  • Memory์— Fit์ด ๋˜๋ƒ ์•ˆ ๋˜๋ƒ์—์„œ ๋”ฐ๋ผ์„œ ๋‹ค๋ฅธ ์ „๋žต์„ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค.
    • ์‹ค์ œ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์ฝ”๋“œ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ผ๊ฐ€๋ฉฐ ์‹คํ–‰์ด๋˜๋Š”์ง€ ์ข€ ๋” ๋ฉด๋ฐ€ํ•˜๊ฒŒ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋…ธ๋ ฅํ•ด์•ผ ๊ฒ ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ฆ.
  • Sorted Run์„ ํ•  ๋•Œ ์‹ค์ œ value๋กœ ๊ฐ€์ ธ๊ฐ€๋Š” ๊ฒƒ์ด early ํ˜น์€ lazyํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„ ๋ฐฐ์› ๋Š”๋ฐ key ์—ฌ๋Ÿฌ ์ž‘์—…์„ ๋‹คํ•œ ๋‹ค์Œ์— ๋‚˜์ค‘์— ํ•„์š”ํ•  ๋•Œ Tuple์„ fetchํ•˜๋Š” Late materialization์ด Columnar์—์„œ ์œ ๋ฆฌํ•˜๋‹ค๋Š” ๊ฒƒ์ด ๋‚ฉ๋“์ด ๋˜๋ฉฐ ํฅ๋ฏธ๋กœ์› ์Œ. ์ด๋Ÿฐ์‹์œผ๋กœ ๋ถˆํ•„์š”ํ•œ data fetching ๋ง‰์„ ์ˆ˜ ์žˆ๊ตฌ๋‚˜ ์‹ถ์—ˆ์Œ.
  • 2-way merge์—์„œ ์›Œ์ปค๊ฐ€ ๋ธ”๋ก IO๋ฅผ ๋ธ”๋กํ•˜์ง€ ์•Š๊ฒŒ ๋”๋ธ” ๋ฒ„ํผ๋ง์„ ํ†ตํ•ด prefetchํ•˜๊ณ  ๋น„๋™๊ธฐ์ ์œผ๋กœ writeํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ„๋‹จํ•˜๊ฒŒ ์ตœ์ ํ™”ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ๋ฐฐ์› ๋‹ค.
  • Comparison Optimizaton์—์„œ๋Š” suffix truncation์—์„œ ๊ฐ™์€ ๊ฒƒ๋งŒ full string comparison์„ ํ•˜๋Š” ๋“ฑ ์—ฌ๋Ÿฌ ํ‘๋งˆ์ˆ (์ตœ์ ํ™” ๊ธฐ๋ฒ•)์„ ๊ณต๋ถ€ํ•˜๋Š” ๊ธฐ๋ถ„์ด์—ˆ๋‹ค.
  • Aggregation์˜ ๊ตฌํ˜„ ์„ ํƒ์ง€๋ฅผ ๋‘๊ฐœ๋กœ ๋‚˜๋ˆ ์„œ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์ด ํฅ๋ฏธ๋กœ์› ๋‹ค. ์—ฌ๊ธฐ์„œ๋„ ๋žœ๋ค IO์„ ์ตœ๋Œ€ํ•œ ํšŒํ”ผํ•˜๋ ค๋Š” ๊ด€์ ์—์„œ ์„ค๋ช…ํ•˜๋Š”๋ฐ ๋ชจ๋“  ์ƒํ™ฉ์—์„œ ๋งž๋‹ค๊ณ  ์ผ๋ฐ˜ํ™”ํ•˜์ง€ ์•Š๋„๋ก ์กฐ์‹ฌํ•ด์•ผ ํ•  ๋“ฏ.
    1. hashing์€ sorting์—์„œ ์œ ๋ฆฌ (ํŠนํžˆ, disk๊ฐ€ ๋Š๋ฆด ๋•Œ)
      = cheaper, ephemeral, ordering ์•ˆ ํ•  ๋•Œ ์œ ๋ฆฌ
    2. sorting

โ“ ์งˆ๋ฌธ

  • suffix truncation์„ ์“ฐ๋Š” ๋ถ€๋ถ„์„ ์•Œ๋ฉด ๋Œ€์ถฉ ๋””๋น„ ์•ˆ์— ์–ด๋–ค ๋ฐ์ดํ„ฐ๋“ค์ด ์žˆ๋Š”์ง€ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž„.
    • ๊ทธ๋Ÿฌ๋ฉด ๋ฏผ๊ฐ ์ •๋ณด ๊ฐ™์€ ๊ฒƒ์€ ์˜คํžˆ๋ ค DB์—์„œ ์ตœ์ ํ™”ํ•ด์„œ searchํ•˜๋Š” ๊ฒƒ์„ ๋ฐฐ์ œํ•˜๊ณ  ์˜๋„์ ์œผ๋กœ ๊ณ ์˜์ ์œผ๋กœ ํ•ด์•ผํ•˜๋‚˜ ์‹ถ์Œ.
  • ์ตœ์ ํ™”์˜ ๋ฐฉ์‹์„ ๋ถ„์„ํ•˜๋‹ค ๋ณด๋ฉด ์–ผ๋งˆ๋‚˜ ๋นจ๋ผ์ง€๋Š๋ƒ์— ๋”ฐ๋ผ DB ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€์ถฉ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? ์ด๋Ÿฐ ์ทจ์•ฝ์ ์ด๋‚˜ ์ด๋Ÿฐ ๋ณด์•ˆ์€ ๋””๋น„ ๊ตฌํ˜„์˜ ๊ด€์‹ฌ์‚ฌ๊ฐ€ ์•„๋‹Œ๊ฐ€ ๊ถ๊ธˆํ•จ.
    • ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ ˆ๋ฒจ์—์„œ ์ด๋Ÿฐ ๊ฒƒ์„ ์ฑ™๊ธฐ๋‹ค๋ณด๋ฉด ๋ณด์•ˆ์ด๋ž‘ ์ตœ์ ํ™”๋„ trade-off๊ฐ€ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ..
  • ์—ฌ๊ธฐ์„œ ์“ฐ๋Š” hashing์˜ cost๋Š” ๋งค์šฐ ๊ฐ€๋ณ๋‚˜?? ๋ญ”๊ฐ€ ์—ฌ๊ธฐ์„œ ์ž์ฃผ ์ผ์–ด๋‚˜๋Š” ์—ฐ์‚ฐ์ด๋‚˜ ๋ณ€ํ™˜์— ๊ด€ํ•ด์„œ ํšŒ๋กœ ๋ ˆ๋ฒจ์—์„œ ์ตœ์ ํ™”๋œ ์ „์šฉ ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ๋ฅผ ๋‹ฌ๋ฉด ๋” ๋นจ๋ผ์งˆ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Database ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๐Ÿ“ ์Šคํ„ฐ๋”” ์ผ์ง€ ์Šคํ„ฐ๋”” ์ผ์ง€
Projects
None yet
Development

No branches or pull requests

4 participants