Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR] Preparing 2.0 Release #39

Merged
merged 139 commits into from
Mar 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
139 commits
Select commit Hold shift + click to select a range
b95aba4
docs: drafting doc changes, docker as main distribution channel
Kaszanas Nov 11, 2024
c6a62c1
docs: adjusted readibility
Kaszanas Nov 11, 2024
776de34
docs: adjusted the description in processed mapping copier
Kaszanas Nov 11, 2024
e7486f4
docs: added full package names in README
Kaszanas Nov 11, 2024
ef5a36d
docs: simplified docs, sc2egset using docker
Kaszanas Nov 11, 2024
d5e28cb
refactor: no random uuid, using file hash in flattener
Kaszanas Nov 11, 2024
3d03363
docs: fixing typo in PR template
Kaszanas Nov 11, 2024
b5b9c9b
refactor: multiprocessing off in sc2_replaypack_processor
Kaszanas Nov 12, 2024
b7c31d2
refactor: renamed sc2_replaypack_processor -> sc2egset_replaypack
Kaszanas Nov 12, 2024
a7c84c2
docs: added link to citation at the top
Kaszanas Nov 12, 2024
3fd124e
perf: downloading maps as a pre-process step
Kaszanas Nov 13, 2024
b172e40
docs: added more README documentation, added TOC
Kaszanas Nov 17, 2024
82282ca
docs: formatting CONTRIBUTING
Kaszanas Nov 17, 2024
4aa2092
refactor: capitalized "AS" in docker
Kaszanas Nov 17, 2024
ec5cbe8
docs: drafted script README files with Docker
Kaszanas Nov 17, 2024
cd01672
Merge pull request #41 from Kaszanas/40-script-docker-docs
Kaszanas Nov 17, 2024
7a968ce
docs: updated all CLI Usage for scripts
Kaszanas Nov 17, 2024
fd8ad2a
fix: fixed log level, fixing path initialization
Kaszanas Nov 17, 2024
8b10ac8
fix: fixing glob issues, testing directory flattener
Kaszanas Nov 18, 2024
be6d51e
docs: solving #42 and #43, refined documentation
Kaszanas Nov 18, 2024
7088375
docs: removed redundant information from README
Kaszanas Nov 18, 2024
02870fc
docs: added generic information in README, editing
Kaszanas Nov 18, 2024
2b633c3
perf: directory_flattener, hash from filepath, added tqdm
Kaszanas Nov 18, 2024
a9e2bd4
fix: converting paths with click, changed target name
Kaszanas Nov 18, 2024
f601edf
docs: fixed READMEs after review
Kaszanas Nov 18, 2024
302a8ef
build: bumped dependency versions
Kaszanas Nov 20, 2024
6e76dfe
Merge pull request #45 from Kaszanas/44-bump-dependency-versions
Kaszanas Nov 20, 2024
67bbba0
refactor: renamed dir_packager to directory_packager
Kaszanas Nov 20, 2024
02a669e
Merge pull request #47 from Kaszanas/46-dir-packager-full-name
Kaszanas Nov 20, 2024
fe4a914
fix: fixing paths in Dockerfile
Kaszanas Nov 20, 2024
f222ec1
Merge branch 'dev' of https://github.com/Kaszanas/SC2DatasetPreparato…
Kaszanas Nov 20, 2024
a93e82d
fix: mounting curdir as a dot
Kaszanas Nov 20, 2024
227f080
Merge pull request #49 from Kaszanas/48-current-directory-docker
Kaszanas Nov 20, 2024
c00b38e
test: added dotenv to set TEST_WORKSPACE
Kaszanas Nov 20, 2024
046ff31
refactor: refreshed ci installing poetry
Kaszanas Nov 20, 2024
af14698
build: bumped poetry version in Dockerfile
Kaszanas Nov 20, 2024
e1b1349
test: commented out test, file_renamer_test not ready
Kaszanas Nov 20, 2024
b500fd6
feat: added default flag values for golang
Kaszanas Nov 20, 2024
7e8b72c
Merge pull request #52 from Kaszanas/51-set-default-flags-go
Kaszanas Nov 20, 2024
28ee746
fix: fixing imports in sc2reset
Kaszanas Nov 20, 2024
c6e5c49
test: added extractor arguments in test
Kaszanas Nov 20, 2024
a3a31c7
fix: fixing opening and writing to file
Kaszanas Nov 20, 2024
9203288
feat: sc2infoextractorgo executable path in settings
Kaszanas Nov 20, 2024
6d7447d
fix: fixing return value, removed range loop
Kaszanas Nov 20, 2024
06d5513
build: adjusted dockerfiles, copying files separately
Kaszanas Nov 20, 2024
748f840
feat: test workspace in .env
Kaszanas Nov 20, 2024
74236e0
test: adjusted test target in make
Kaszanas Nov 20, 2024
fba1bf2
Merge pull request #54 from Kaszanas/53-run-tests-fix-commands
Kaszanas Nov 20, 2024
9ce6193
fix: fixing pre-commit in dev docker
Kaszanas Nov 20, 2024
e3ba2b2
ci: removing volume from docker-test-compose
Kaszanas Nov 20, 2024
e39f5c1
build: copying CONTRIBUTING to dev docker image
Kaszanas Nov 20, 2024
7f67860
ci: adjusted TEST_COMMAND, not writing logs
Kaszanas Nov 20, 2024
391fe50
build: copying scripts to top in docker images
Kaszanas Nov 20, 2024
1ae85c5
Merge pull request #56 from Kaszanas/55-docker-copy-scripts-top-dir
Kaszanas Nov 20, 2024
41caf7a
docs: added info on pre-commit and commitizen, #34
Kaszanas Nov 20, 2024
985c6c2
docs: added information on code standards, #34
Kaszanas Nov 20, 2024
e0d82da
docs: updated all README files for scripts
Kaszanas Nov 20, 2024
cff767f
refactor: changed the processing dir structure
Kaszanas Nov 20, 2024
76a5416
refactor: adjusted make targets for sc2egset, removed unused param
Kaszanas Nov 20, 2024
6bce3cb
ci: added docker releases
Kaszanas Nov 20, 2024
e8faa7d
Merge pull request #58 from Kaszanas/57-docker-release-on-branch-pushes
Kaszanas Nov 20, 2024
f6eb987
build: added maps needed for SC2InfoExtractorGo
Kaszanas Nov 20, 2024
79a29cd
Merge pull request #60 from Kaszanas/59-copy-maps-sc2infoextractorgo
Kaszanas Nov 20, 2024
7ed1f1a
refactor: using dev dockerfile in sc2reset_sc2egset process
Kaszanas Nov 20, 2024
4f117c6
docs: changed docs for a more concise read
Kaszanas Jan 3, 2025
623fb61
build: bumped ruff and commitizen versions
Kaszanas Jan 3, 2025
ce58589
build: ran poetry lock
Kaszanas Jan 3, 2025
e82d355
docs: refined documentation, added TODO
Kaszanas Jan 5, 2025
d257a0d
build: added variables in makefile, adjusted targets, added echo
Kaszanas Jan 5, 2025
d5a5393
docs: changed docs, new CLI text, renamed container
Kaszanas Jan 5, 2025
18a6abc
build: removed dockerfiles per script, using main dockerfile
Kaszanas Jan 5, 2025
4d7b41f
refactor: drafting refactor of sc2egset_replaypack_processor
Kaszanas Jan 5, 2025
828a356
feat: added processed_mapping_copier target to makefile
Kaszanas Jan 5, 2025
d4e3cb7
feat: draft functionality of sc2egset_replaypack... full pipeline
Kaszanas Jan 5, 2025
67ec3e0
feat: drafted utils/user_prompt
Kaszanas Jan 6, 2025
76be1bc
refactor: renamed user prompting function
Kaszanas Jan 6, 2025
78a8d00
refactor: applied user prompting in sc2egset_replaypack_processor
Kaszanas Jan 6, 2025
e6118de
feat(directory_flattener.py): added user_prompt feature
Kaszanas Jan 6, 2025
2969e07
refactor(user_prompt.py): added logging
Kaszanas Jan 6, 2025
308b772
feat(directory_packager.py): added user prompting
Kaszanas Jan 6, 2025
e3bf197
refactor: using glob instead of os.walk
Kaszanas Jan 6, 2025
cc6a65a
docs: changed CLI description
Kaszanas Jan 6, 2025
75c744e
refactor: renamed force to force_overwrite
Kaszanas Jan 6, 2025
8a40d05
feat: added force_overwrite flag to CLI
Kaszanas Jan 6, 2025
aa10695
feat(json_merger.py): added user prompting, and CLI flag
Kaszanas Jan 6, 2025
cf28564
refactor(processed_mapping_copier.py): using pathlib, refactored func…
Kaszanas Jan 6, 2025
2efea73
refactor: applied user prompting for every script
Kaszanas Jan 6, 2025
e99c242
Merge pull request #64 from Kaszanas/63-prompt-user-possible-overwrite
Kaszanas Jan 6, 2025
7b0ae21
ci: attempt at fixing GH Actions, new make target name
Kaszanas Jan 6, 2025
7b31022
ci: fixing next step in CI pipeline, new target name
Kaszanas Jan 6, 2025
bcb41db
test: fixing tests with new features, fixing assertions
Kaszanas Jan 7, 2025
e1a1a00
feat: drafted full SC2ReSet/SC2EGSet pipeline
Kaszanas Jan 8, 2025
bc9f7ca
refactor: added logging statements
Kaszanas Jan 8, 2025
0c9288f
refactor: removed old directory structure from processing
Kaszanas Jan 8, 2025
64458f7
fix: manually tested directory_packager, working version
Kaszanas Jan 8, 2025
157bd50
feat: (directory_packager.py) added tqdm progres bar
Kaszanas Jan 8, 2025
8719e88
refactor: command saved to a variable
Kaszanas Jan 8, 2025
579f345
build(makefile): added targets for seeding maps locally
Kaszanas Jan 9, 2025
14c8cf9
build(docker): changed location of the maps directory in docker
Kaszanas Jan 9, 2025
7356c1d
feat: ignoring maps directory
Kaszanas Jan 9, 2025
9569bc8
fix(directory_flattener.py): manually tested flattening directories
Kaszanas Jan 9, 2025
377d838
feat: separate sc2egset_pipeline and replaypack_processor
Kaszanas Jan 9, 2025
af66764
test: fixing tests after func args change
Kaszanas Jan 9, 2025
73d8948
fix: continue instead of break after download
Kaszanas Mar 1, 2025
b2a4138
feat: added sc2egset_pipeline to dockerfiles
Kaszanas Mar 1, 2025
403a149
build: fixed directory_flattener make target
Kaszanas Mar 1, 2025
c263a80
docs: simplified documentation, removed unused
Kaszanas Mar 1, 2025
c55f304
build: development container python version bump
Kaszanas Mar 1, 2025
09eeeab
docs: added force_overwrite flag to CLI usage
Kaszanas Mar 1, 2025
fab067a
refactor: force_overwrite continues directory flattning
Kaszanas Mar 1, 2025
8c4f4d9
refactor: removed unused code
Kaszanas Mar 1, 2025
0c353ac
ci: bumped actions, pinning to commits, preparing repo name change
Kaszanas Mar 1, 2025
b3d7581
refactor: maps directory is created, skipping check
Kaszanas Mar 1, 2025
833fd0a
refactor: removing unused code, replaypack processor
Kaszanas Mar 2, 2025
87b7db3
fix(sc2_replaypack_processor): fixing maps directory as arg
Kaszanas Mar 2, 2025
382a0d3
fix: fixing non-existent argument
Kaszanas Mar 2, 2025
a79870e
fix: using Path as the inferred path type
Kaszanas Mar 3, 2025
2c216de
fix: fixing replaypack_processor, no exceptions
Kaszanas Mar 3, 2025
826681f
fix: fixing file_renamer using directory name
Kaszanas Mar 3, 2025
81e527d
docs: added docs to sc2egset_replaypack_processor
Kaszanas Mar 4, 2025
5cb26a1
build(poetry): bumped mkdocstrings, ruff
Kaszanas Mar 4, 2025
e86a5be
feat: added new replaypacks to available_replaypacks
Kaszanas Mar 6, 2025
1a83a76
feat(directory_flattener): multithreading file copying
Kaszanas Mar 6, 2025
55197c6
refactor(directory_flattener): renamed n_processes -> n_threads
Kaszanas Mar 6, 2025
2a2f11c
Merge pull request #67 from Kaszanas/65-parallel-directory-flattener
Kaszanas Mar 6, 2025
f3116d3
Merge branch 'dev' of https://github.com/Kaszanas/SC2DatasetPreparato…
Kaszanas Mar 6, 2025
2063b4d
feat(directory_packager): multithreading in directory packager
Kaszanas Mar 6, 2025
9a4abd4
Merge pull request #68 from Kaszanas/66-parallel-directory-packager
Kaszanas Mar 6, 2025
d10dc62
test(test): attempting to fix tests
Kaszanas Mar 6, 2025
e7ff81c
Merge pull request #69 from Kaszanas/66-parallel-directory-packager
Kaszanas Mar 6, 2025
91bf3a0
Merge branch 'dev' of https://github.com/Kaszanas/SC2DatasetPreparato…
Kaszanas Mar 6, 2025
5bf04cb
fix(directory_flattener): returning list of processed directories
Kaszanas Mar 6, 2025
0ca9dec
feat(sc2egset_pipeline): tested full processing pipeline manually
Kaszanas Mar 7, 2025
a3bee7f
Merge pull request #70 from Kaszanas/37-sc2egset-processing-pipeline
Kaszanas Mar 7, 2025
be1d8aa
fix(docker): fixing volume mount for CI
Kaszanas Mar 7, 2025
7bee5ac
docs(README): refined documentation
Kaszanas Mar 7, 2025
d4a3571
Merge pull request #71 from Kaszanas/33-getting-started-readme
Kaszanas Mar 7, 2025
101cb4f
feat(sc2_map_downloader): using SC2InfoExtractor go, adjusted documen…
Kaszanas Mar 8, 2025
ac60d3f
docs(release): 2.0.0 ready
Kaszanas Mar 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# To have imports resolve correctly this should be the path to the root of the project:
TEST_WORKSPACE=
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.MD
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
## Description
<!--- Describe your changes in detail -->

## Related IssueS
## Related Issues
<!--- This project only accepts pull requests related to open issues -->
<!--- If suggesting a new feature or change, please discuss it in an issue first -->
<!--- If fixing a bug, there should be an issue describing it with steps to reproduce -->
Expand Down
19 changes: 11 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,29 @@
name: continuous integration (ci)

on: [pull_request, workflow_dispatch]
on:
pull_request:
push:
branches:
- main
- dev
workflow_dispatch:

# To successfully find the files that are required for testing:
env:
TEST_WORKSPACE: ${{ github.workspace }}

jobs:

pre_commit:
# Set up operating system
runs-on: ubuntu-latest

# Define job steps
steps:

- name: Check-out repository
uses: actions/checkout@v4
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

- name: Build Dev Docker Image
run: |
make docker_build_dev
make docker_build_devcontainer

- name: Docker Run pre-commit on all files.
run: |
Expand All @@ -37,11 +40,11 @@ jobs:
# Define job steps
steps:
- name: Check-out repository
uses: actions/checkout@v4
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

- name: Build Dev Docker Image
run: |
make docker_build_dev PYTHON_VERSION=${{ matrix.python-version }}
make docker_build_devcontainer PYTHON_VERSION=${{ matrix.python-version }}

- name: Build Docker Image With Python ${{ matrix.python-version }}
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/cla.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@ jobs:
- name: "CLA Assistant"
if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the CLA Document and I hereby sign the CLA') || github.event_name == 'pull_request_target'
# Beta Release
uses: contributor-assistant/github-action@v2.3.0
uses: contributor-assistant/github-action@ca4a40a7d1004f18d9960b404b97e5f30a505a08
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# the below token should have repo scope and must be manually added by you in the repository's secret
PERSONAL_ACCESS_TOKEN : ${{ secrets.PERSONAL_ACCESS_TOKEN }}
with:
path-to-signatures: 'signatures/version1/cla.json'
path-to-document: 'https://github.com/Kaszanas/SC2DatasetPreparator/blob/main/CLA.md' # e.g. a CLA or a DCO document
path-to-document: 'https://github.com/Kaszanas/DatasetPreparator/blob/main/CLA.md' # e.g. a CLA or a DCO document
# branch should not be protected
branch: 'main'
allowlist: Kaszanas,bot*
Expand Down
47 changes: 47 additions & 0 deletions .github/workflows/docker_images.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Publish Docker Images

# This should run only after the tests from the CI pipeline have passed.
# On a rare ocassion contributors can trigger this manually, and it should also
# run after a release has been published.
on:
workflow_run:
workflows: ["continuous integration (ci)"]
types:
- completed
push:
branches:
- main
- dev
workflow_dispatch:
release:
types: [published]

jobs:
push_to_registries:
name: Push Docker Image to Docker Hub
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- name: Check out Code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- name: Log in to Docker Hub
uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Extract Metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804
with:
images: |
kaszanas/datasetpreparator
- name: Build and Push Docker images
uses: docker/build-push-action@471d1dc4e07e5cdedd4c2171150001c434f0b7a4
with:
context: .
file: ./docker/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
/.vscode
/venv*

/processing
processing/
maps/
profiler/

*.SC2Replay
*.SC2Map
Expand Down Expand Up @@ -34,3 +36,5 @@ ruff_cache/

# PyCharm
/.idea

.env
108 changes: 108 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
## 2.0.0 (2025-03-08)

### Feat

- **sc2_map_downloader**: using SC2InfoExtractor go, adjusted documentation
- **sc2egset_pipeline**: tested full processing pipeline manually
- **directory_packager**: multithreading in directory packager
- **directory_flattener**: multithreading file copying
- added new replaypacks to available_replaypacks
- added sc2egset_pipeline to dockerfiles
- separate sc2egset_pipeline and replaypack_processor
- ignoring maps directory
- (directory_packager.py) added tqdm progres bar
- drafted full SC2ReSet/SC2EGSet pipeline
- **json_merger.py**: added user prompting, and CLI flag
- added force_overwrite flag to CLI
- **directory_packager.py**: added user prompting
- **directory_flattener.py**: added user_prompt feature
- drafted utils/user_prompt
- draft functionality of sc2egset_replaypack... full pipeline
- added processed_mapping_copier target to makefile
- test workspace in .env
- sc2infoextractorgo executable path in settings
- added default flag values for golang
- added directory checks before file_renamer
- retry functionality for download_replaypack
- split download_file from download_replaypack
- md5 checksum verification for downloaded replaypacks
- added md5 checksums to available replaypacks

### Fix

- **docker**: fixing volume mount for CI
- **directory_flattener**: returning list of processed directories
- fixing file_renamer using directory name
- fixing replaypack_processor, no exceptions
- using Path as the inferred path type
- fixing non-existent argument
- **sc2_replaypack_processor**: fixing maps directory as arg
- continue instead of break after download
- **directory_flattener.py**: manually tested flattening directories
- manually tested directory_packager, working version
- fixing pre-commit in dev docker
- fixing return value, removed range loop
- fixing opening and writing to file
- fixing imports in sc2reset
- mounting curdir as a dot
- fixing paths in Dockerfile
- converting paths with click, changed target name
- fixing glob issues, testing directory flattener
- fixed log level, fixing path initialization
- pointing CLA to main
- fixed import issues in replaypack downloader
- getting list of files instead of generator
- fixed missing argument in json_merger
- lowercase makefile name
- added logs directory
- different retry logic, using path to merge suffix
- changed commitizen pre-commit config
- removed commitizen-branch hook
- **deps**: pre-commit autoupdate
- **deps**: added commitizen to pre-commit

### Refactor

- **directory_flattener**: renamed n_processes -> n_threads
- removing unused code, replaypack processor
- maps directory is created, skipping check
- removed unused code
- force_overwrite continues directory flattning
- command saved to a variable
- removed old directory structure from processing
- added logging statements
- applied user prompting for every script
- **processed_mapping_copier.py**: using pathlib, refactored functionality with iterdir
- renamed force to force_overwrite
- using glob instead of os.walk
- **user_prompt.py**: added logging
- applied user prompting in sc2egset_replaypack_processor
- renamed user prompting function
- drafting refactor of sc2egset_replaypack_processor
- using dev dockerfile in sc2reset_sc2egset process
- adjusted make targets for sc2egset, removed unused param
- changed the processing dir structure
- refreshed ci installing poetry
- renamed dir_packager to directory_packager
- capitalized "AS" in docker
- renamed sc2_replaypack_processor -> sc2egset_replaypack
- multiprocessing off in sc2_replaypack_processor
- no random uuid, using file hash in flattener
- using new argument classes, refactor
- added utils directory, drafted README
- fixed end of file in cla.json
- sync commit
- changed dir_flattener logic, using a list of files
- deleted legacy setup.py
- ran pre-commit on all files

### Perf

- directory_flattener, hash from filepath, added tqdm
- downloading maps as a pre-process step

## 1.2.0 (2022-06-08)

## 1.1.0 (2022-03-17)

## 1.0.0 (2021-08-27)
32 changes: 22 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,24 +56,36 @@ docker run -it -v .:/app datasetpreparator:devcontainer

### Local Development

Ready to contribute? Here's how to set up `datasetpreparator` for local development.
Ready to contribute? Here's how to set up `datasetpreparator` for local development. The code style standards that we use are defined in the `.pre-commit-config.yaml` file.

1. Download a copy of `datasetpreparator` locally.
2. Install `datasetpreparator` using `poetry`:

```console
poetry install
```
```console
poetry install
```

3. Install the pre-commit hooks:

```console
poetry run pre-commit install
```

3. Use `git` (or similar) to create a branch for local development and make your changes:
4. Use `git` (or similar) to create a branch for local development and make your changes:

```console
git checkout -b name-of-your-bugfix-or-feature
```
```console
git checkout -b name-of-your-bugfix-or-feature
```

5. When you're done making changes, check that your changes conform to any code formatting requirements and pass any tests.

4. When you're done making changes, check that your changes conform to any code formatting requirements and pass any tests.
6. Format your commit with `commitizen`:

```console
poetry run cz commit
```

5. Commit your changes and open a pull request.
7. Commit your changes (we are using commitizen to check commit messages) and open a pull request.

## Pull Request Guidelines

Expand Down
Loading