Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pipeline to aggregate data from all years #100

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d1bddff
feat: start pipeline to aggregate data
arnaldog12 Sep 5, 2022
09724e9
⚒️ chore: add profile command to makefile
arnaldog12 Sep 7, 2022
5472f40
🎲feat: base aggregate pipeline
arnaldog12 Sep 7, 2022
09c0f52
📆 feat: add year columns to preprocessed datasets
arnaldog12 Sep 9, 2022
3048587
🎲 feat: more clube ids
arnaldog12 Sep 19, 2022
398b803
chore: add Paraná to parameters.yml
arnaldog12 Oct 10, 2022
b12dbb6
Merge branch 'master' into feat/aggregate-pipeline
arnaldog12 Oct 21, 2022
4757009
chore: dependencies and remove unused tests
arnaldog12 Jun 15, 2024
2d3ff16
chore: remove black, isort and flake8
arnaldog12 Jun 15, 2024
6139ea2
chore: replace pandas_profiling by ydata_profiling
arnaldog12 Jun 15, 2024
db85c9c
fix: minor in parameters and data
arnaldog12 Jun 15, 2024
9dcaa56
feat: fix accumulated scouts in preprocessing
arnaldog12 Jun 15, 2024
c4e1961
feat: start schema validation with pandera
arnaldog12 Jun 15, 2024
e5301e7
test: start data tests in aggregated data
arnaldog12 Jun 15, 2024
098f384
feat: rename scouts columns
arnaldog12 Jun 16, 2024
fb1def8
feat: drop columns and convert types
arnaldog12 Jun 16, 2024
7efd2f4
feat: drop columns and improve schemas
arnaldog12 Jun 16, 2024
271ae38
feat: normalize player position
arnaldog12 Jun 16, 2024
94eebbb
test: convert_types node of aggregate pipeline
arnaldog12 Jun 16, 2024
30b4a38
test: improve test case
arnaldog12 Jun 16, 2024
d202e60
chore: improve coverage and schemas
arnaldog12 Jun 16, 2024
25bf634
tests: preprocessing nodes
arnaldog12 Jun 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,7 @@ conf/**/*credentials*
# custom
.telemetry
logs/

# ydata
report.html
stats.json
28 changes: 4 additions & 24 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,36 +14,16 @@ repos:
hooks:
- id: markdownlint-fix

- repo: https://github.com/dannysepler/rm_unneeded_f_str
rev: v0.1.0
hooks:
- id: rm-unneeded-f-str

- repo: https://github.com/MarcoGorelli/absolufy-imports
rev: v0.3.1
hooks:
- id: absolufy-imports

- repo: https://github.com/pycqa/isort
rev: 5.10.1
hooks:
- id: isort

- repo: https://github.com/python/black
rev: 22.3.0
hooks:
- id: black

- repo: https://github.com/fsouza/autoflake8
rev: v0.2.2
hooks:
- id: autoflake8
args: ['--recursive', '--in-place']

- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.2
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.9
hooks:
- id: flake8
- id: ruff
- id: ruff-format

- repo: https://github.com/jendrikseipp/vulture
rev: v2.3
Expand Down
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10.11
12 changes: 3 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,15 @@ clean:
@rm -f MANIFEST
@rm -f .coverage.*

black:
@black $(FOLDER_PROJECT) --config pyproject.toml $(args)

isort:
@isort $(FOLDER_PROJECT) $(args)

flake8:
@flake8 $(FOLDER_PROJECT)

mypy:
@mypy --ignore-missing-imports --exclude download_data.py$$ --exclude __main__.py$$ --strict src/cartola

pre-commit:
@pre-commit run --all-files

profile:
@ydata_profiling -m -e $(args) report.html

docker-build:
@kedro docker build --image cartola

Expand Down
12 changes: 12 additions & 0 deletions conf/base/catalog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,15 @@
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: https://kedro.readthedocs.io/en/stable/data/data_catalog.html
aggregated.primary:
type: PartitionedDataSet
path: data/03_primary
dataset:
type: pandas.CSVDataSet
filename_suffix: .csv

aggregated.aggregated:
type: pandas.CSVDataSet
filepath: data/04_feature/aggregated.csv
save_args:
index: False
105 changes: 104 additions & 1 deletion conf/base/parameters.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,27 +22,130 @@ preprocessing.map_col_names:
clube_id: id_clube
ClubeID: id_clube
ClubeNome: nome_clube
DD: DE # Defesa Difícil -> Defesa
jogos_num: num_jogos
Jogos: num_jogos
media_num: media
Nome: nome_clube
Participou: participou
PE: PI # Passe Errado -> Passe Incompleto
pontos_num: pontuacao
Pontos: pontos_num
Pontos: pontuacao
PontosMedia: media
posicao_id: posicao
Posicao: posicao
PosicaoID: posicao
preco_num: preco
Preco: preco
PrecoVariacao: variacao
RB: DS # Roubada de Bola -> Desarmes
rodada_id: rodada
Rodada: rodada
status_id: status
variacao_num: variacao


preprocessing.map_clube_to_id_clube:
AME: 327
AMÉRICA-MG: 327
ATHLÉTICO-PR: 293
ATLÉTICO-GO: 373
ATLÉTICO-MG: 282
ATLÉTICO-PR: 293
AVA: 314
AVAÍ: 314
BAH: 265
BAHIA: 265
BOT: 263
BOTAFOGO: 263
BRAGANTINO: 280
CAM: 282
CAP: 293
CEA: 354
CEARÁ: 354
CFC: 294
CHA: 315
CHAPECOENSE: 315
COR: 264
CORINTHIANS: 264
CORITIBA: 294
CRI: 288
CRU: 283
CRUZEIRO: 283
CSA: 341
CUIABÁ: 1371
FIG: 316
FIGUEIRENSE: 316
FLA: 262
FLAMENGO: 262
FLU: 266
FLUMINENSE: 266
FORTALEZA: 356
GOI: 290
GOIÁS: 290
GRE: 284
GRÊMIO: 284
INT: 285
INTERNACIONAL: 285
JEC: 317
JOINVILLE: 317
JUVENTUDE: 286
PAL: 275
PALMEIRAS: 275
PAR:
PARANÁ:
PON: 303
PONTE PRETA: 303
SAN: 277
SANTA CRUZ: 344
SANTOS: 277
SÃO PAULO: 276
SAO: 276
SCZ: 344
SPO: 292
SPORT: 292
SPT: 292
VAS: 267
VASCO: 267
VIT: 287
VITÓRIA: 287


preprocessing.map_status_id_to_str:
2: Dúvida
3: Suspenso
5: Contundido
6: Nulo
7: Provável


preprocessing.map_posicao_to_str:
"1": gol
"2": lat
"3": zag
"4": mei
"5": ata
"6": tec


aggregated.map_types:
ano: int
rodada: int
A: int
CA: int
CV: int
DE: int
DP: int
DS: int
FC: int
FD: int
FF: int
FS: int
FT: int
G: int
GC: int
GS: int
I: int
PI: int
PP: int
SG: int
18 changes: 14 additions & 4 deletions conf/base/parameters_2014.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@
# Documentation for this file format can be found in "Parameters"
# Link: https://kedro.readthedocs.io/en/0.18.2/kedro_project_setup/configuration.html#parameters

2014.preprocessing.year: 2014

2014.preprocessing.drop_columns:
- Mando
- Nota
- Partida
- Substituido
- TempoJogado
- Titular

2014.preprocessing.scouts:
# ataque
G: 8.0 # Gol marcado
Expand All @@ -14,7 +24,7 @@
FS: 0.5 # Falta Sofrida
PP: -3.5 # Pênalti Perdido
I: -0.5 # Impedimento
PE: -0.3 # Passe Errado
PI: -0.3 # Passe Incompleto = Passe Errado (PE)
# defesa
SG: 5.0 # jogo sem Sofrer Gols (somente defensores)
CV: -5.0 # Cartão Vermelho
Expand All @@ -23,8 +33,8 @@
FC: -0.5 # Falta Cometida
GC: -6.0 # Gol Contra
DP: 7.0 # Defesa de Pênalti
RB: 1.7 # Roubada de Bola
DD: 3.0 # Defesa Difícil
DS: 1.7 # DeSarme = Roubada de Bola (RB)
DE: 3.0 # DEfesa = Defesa Difícil (DD) *

# * exclusivo de goleiro
# ** não válido para goleiros
# ** não válido para goleiros
12 changes: 8 additions & 4 deletions conf/base/parameters_2015.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
# Documentation for this file format can be found in "Parameters"
# Link: https://kedro.readthedocs.io/en/0.18.2/kedro_project_setup/configuration.html#parameters

2015.preprocessing.year: 2015

2015.preprocessing.drop_columns: []

2015.preprocessing.scouts:
# ataque
G: 8.0 # Gol marcado
Expand All @@ -14,7 +18,7 @@
FS: 0.5 # Falta Sofrida
PP: -3.5 # Pênalti Perdido
I: -0.5 # Impedimento
PE: -0.3 # Passe Errado
PI: -0.3 # Passe Incompleto = Passe Errado (PE)
# defesa
SG: 5.0 # jogo sem Sofrer Gols (somente defensores)
CV: -5.0 # Cartão Vermelho
Expand All @@ -23,8 +27,8 @@
FC: -0.5 # Falta Cometida
GC: -6.0 # Gol Contra
DP: 7.0 # Defesa de Pênalti
RB: 1.7 # Roubada de Bola
DD: 3.0 # Defesa Difícil
DS: 1.7 # DeSarme = Roubada de Bola (RB)
DE: 3.0 # DEfesa = Defesa Difícil (DD) *

# * exclusivo de goleiro
# ** não válido para goleiros
# ** não válido para goleiros
12 changes: 8 additions & 4 deletions conf/base/parameters_2016.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
# Documentation for this file format can be found in "Parameters"
# Link: https://kedro.readthedocs.io/en/0.18.2/kedro_project_setup/configuration.html#parameters

2016.preprocessing.year: 2016

2016.preprocessing.drop_columns: []

2016.preprocessing.scouts:
# ataque
G: 8.0 # Gol marcado
Expand All @@ -14,7 +18,7 @@
FS: 0.5 # Falta Sofrida
PP: -3.5 # Pênalti Perdido
I: -0.5 # Impedimento
PE: -0.3 # Passe Errado
PI: -0.3 # Passe Incompleto = Passe Errado (PE)
# defesa
SG: 5.0 # jogo sem Sofrer Gols (somente defensores)
CV: -5.0 # Cartão Vermelho
Expand All @@ -23,8 +27,8 @@
FC: -0.5 # Falta Cometida
GC: -6.0 # Gol Contra
DP: 7.0 # Defesa de Pênalti
RB: 1.7 # Roubada de Bola
DD: 3.0 # Defesa Difícil
DS: 1.7 # DeSarme = Roubada de Bola (RB)
DE: 3.0 # DEfesa = Defesa Difícil (DD) *

# * exclusivo de goleiro
# ** não válido para goleiros
# ** não válido para goleiros
13 changes: 9 additions & 4 deletions conf/base/parameters_2017.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
# Documentation for this file format can be found in "Parameters"
# Link: https://kedro.readthedocs.io/en/0.18.2/kedro_project_setup/configuration.html#parameters

2017.preprocessing.year: 2017

2017.preprocessing.drop_columns:
- athletes.atletas.scout

2017.preprocessing.scouts:
# ataque
G: 8.0 # Gol marcado
Expand All @@ -14,7 +19,7 @@
FS: 0.5 # Falta Sofrida
PP: -3.5 # Pênalti Perdido
I: -0.5 # Impedimento
PE: -0.3 # Passe Errado
PI: -0.3 # Passe Incompleto = Passe Errado (PE)
# defesa
SG: 5.0 # jogo sem Sofrer Gols (somente defensores)
CV: -5.0 # Cartão Vermelho
Expand All @@ -23,8 +28,8 @@
FC: -0.5 # Falta Cometida
GC: -6.0 # Gol Contra
DP: 7.0 # Defesa de Pênalti
RB: 1.7 # Roubada de Bola
DD: 3.0 # Defesa Difícil
DS: 1.7 # DeSarme = Roubada de Bola (RB)
DE: 3.0 # DEfesa = Defesa Difícil (DD) *

# * exclusivo de goleiro
# ** não válido para goleiros
# ** não válido para goleiros
10 changes: 7 additions & 3 deletions conf/base/parameters_2018.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
# Documentation for this file format can be found in "Parameters"
# Link: https://kedro.readthedocs.io/en/0.18.2/kedro_project_setup/configuration.html#parameters

2018.preprocessing.year: 2018

2018.preprocessing.drop_columns: []

2018.preprocessing.scouts:
# ataque
G: 8.0 # Gol marcado
Expand All @@ -14,7 +18,7 @@
FS: 0.5 # Falta Sofrida
PP: -3.5 # Pênalti Perdido
I: -0.5 # Impedimento
PE: -0.3 # Passe Errado
PI: -0.3 # Passe Incompleto = Passe Errado (PE)
# defesa
SG: 5.0 # jogo sem Sofrer Gols (somente defensores)
CV: -5.0 # Cartão Vermelho
Expand All @@ -23,8 +27,8 @@
FC: -0.5 # Falta Cometida
GC: -6.0 # Gol Contra
DP: 7.0 # Defesa de Pênalti
RB: 1.7 # Roubada de Bola
DD: 3.0 # Defesa Difícil
DS: 1.7 # DeSarme = Roubada de Bola (RB)
DE: 3.0 # DEfesa = Defesa Difícil (DD) *

# * exclusivo de goleiro
# ** não válido para goleiros
Loading