Code update for https://github.com/mitdbg/palimpzest/issues/84 #101

chjuncn · 2025-02-01T12:19:35Z

This implementation basically resolves #84.

One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)

If add_columns() uses cols, udf, types as params, and cols supports "udf" and possibly "desc", it will make this function confusing again, as it has totally different ways to use this function.

Instead, we just support udf and types for add_columns. If users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.

* Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst

This implementation basically resolves #84. One implementation is different from the #84: .add_columns( cols=[ {"name": "sender", "type": "string", "udf": compute_sender}, ... ] ) If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.

This change is based on #101 and #102, please review them first then this change. 1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert(). 2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.

… in tests; updated docs and README

mdr223

Made some minor changes, mostly just after I realized we should probably re-use Python's built-in types (rather than mapping strings as I had suggested). Otherwise, this LGTM!

…vert(), remove Schema from demos when possible. (#104) * Create chat.rst (#96) * Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst * code update for #84 This implementation basically resolves #84. One implementation is different from the #84: .add_columns( cols=[ {"name": "sender", "type": "string", "udf": compute_sender}, ... ] ) If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns. * use field_values instead of field_types as field_values have the actual values, use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types. records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record. * Remove .convert() and use .sem_add_columns or .add_columns instead This change is based on #101 and #102, please review them first then this change. 1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert(). 2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later. * ruff check --fix * fix unittest * demos fixed and unit tests running * fix add_columns --> sem_add_columns in demo * udpate quickstart to reflect code changes; shorten text as much as possible * passing unit tests * remove convert() everywhere * fixes to correct errors in demos; update quickstart and docs --------- Co-authored-by: Gerardo Vitagliano <[email protected]> Co-authored-by: Matthew Russo <[email protected]>

vitaglianog and others added 2 commits January 30, 2025 18:41

Create chat.rst (#96)

ef28691

* Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst

chjuncn requested a review from mdr223 February 1, 2025 12:19

This was linked to issues Feb 1, 2025

Update Syntax to Reflect New Design Goals #84

Open

Eliminate Need for User-Facing Schema #94

Open

This was unlinked from issues Feb 1, 2025

Update Syntax to Reflect New Design Goals #84

Open

Eliminate Need for User-Facing Schema #94

Open

chjuncn mentioned this pull request Feb 3, 2025

Refactor demos to use .sem_add_columns or .add_columns instead of convert(), remove Schema from demos when possible. #104

Merged

mdr223 linked an issue Feb 3, 2025 that may be closed by this pull request

Update Syntax to Reflect New Design Goals #84

Open

mdr223 removed a link to an issue Feb 3, 2025

Update Syntax to Reflect New Design Goals #84

Open

mdr223 added 2 commits February 3, 2025 10:12

changed types to make use of Python type system; updated use of types…

3fd18ac

… in tests; updated docs and README

update test to match no longer allowing None default

75ad0de

mdr223 approved these changes Feb 3, 2025

View reviewed changes

mdr223 merged commit 53b6055 into dev Feb 3, 2025

mdr223 deleted the chjun-0201 branch February 3, 2025 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code update for https://github.com/mitdbg/palimpzest/issues/84 #101

Code update for https://github.com/mitdbg/palimpzest/issues/84 #101

chjuncn commented Feb 1, 2025

mdr223 left a comment

Code update for https://github.com/mitdbg/palimpzest/issues/84 #101

Code update for https://github.com/mitdbg/palimpzest/issues/84 #101

Conversation

chjuncn commented Feb 1, 2025

mdr223 left a comment

Choose a reason for hiding this comment