-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code update for https://github.com/mitdbg/palimpzest/issues/84 #101
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst
This implementation basically resolves #84. One implementation is different from the #84: .add_columns( cols=[ {"name": "sender", "type": "string", "udf": compute_sender}, ... ] ) If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
This was
linked to
issues
Feb 1, 2025
This was
unlinked from
issues
Feb 1, 2025
chjuncn
added a commit
that referenced
this pull request
Feb 3, 2025
This change is based on #101 and #102, please review them first then this change. 1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert(). 2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.
… in tests; updated docs and README
mdr223
approved these changes
Feb 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some minor changes, mostly just after I realized we should probably re-use Python's built-in types (rather than mapping strings as I had suggested). Otherwise, this LGTM!
mdr223
added a commit
that referenced
this pull request
Feb 6, 2025
…vert(), remove Schema from demos when possible. (#104) * Create chat.rst (#96) * Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst * code update for #84 This implementation basically resolves #84. One implementation is different from the #84: .add_columns( cols=[ {"name": "sender", "type": "string", "udf": compute_sender}, ... ] ) If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns. * use field_values instead of field_types as field_values have the actual values, use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types. records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record. * Remove .convert() and use .sem_add_columns or .add_columns instead This change is based on #101 and #102, please review them first then this change. 1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert(). 2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later. * ruff check --fix * fix unittest * demos fixed and unit tests running * fix add_columns --> sem_add_columns in demo * udpate quickstart to reflect code changes; shorten text as much as possible * passing unit tests * remove convert() everywhere * fixes to correct errors in demos; update quickstart and docs --------- Co-authored-by: Gerardo Vitagliano <[email protected]> Co-authored-by: Matthew Russo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implementation basically resolves #84.
One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)
If add_columns() uses cols, udf, types as params, and cols supports "udf" and possibly "desc", it will make this function confusing again, as it has totally different ways to use this function.
Instead, we just support udf and types for add_columns. If users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.