Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code update for https://github.com/mitdbg/palimpzest/issues/84 #101

Merged
merged 4 commits into from
Feb 3, 2025

Conversation

chjuncn
Copy link
Collaborator

@chjuncn chjuncn commented Feb 1, 2025

This implementation basically resolves #84.

One implementation is different from the #84:
.add_columns(
cols=[
{"name": "sender", "type": "string", "udf": compute_sender},
...
]
)

If add_columns() uses cols, udf, types as params, and cols supports "udf" and possibly "desc", it will make this function confusing again, as it has totally different ways to use this function.

Instead, we just support udf and types for add_columns. If users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.

vitaglianog and others added 2 commits January 30, 2025 18:41
* Create chat.rst

* Update pyproject.toml

Hotfix for chat

* Update conf.py

Hotfix for chat.rst
This implementation basically resolves #84.

One implementation is different from the #84:
.add_columns(
  cols=[
    {"name": "sender", "type": "string", "udf": compute_sender},
    ...
  ]
)

If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.
@chjuncn chjuncn requested a review from mdr223 February 1, 2025 12:19
chjuncn added a commit that referenced this pull request Feb 3, 2025
This change is based on #101 and #102, please review them first then this change.

1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert().

2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.
@mdr223 mdr223 linked an issue Feb 3, 2025 that may be closed by this pull request
Copy link
Collaborator

@mdr223 mdr223 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some minor changes, mostly just after I realized we should probably re-use Python's built-in types (rather than mapping strings as I had suggested). Otherwise, this LGTM!

@mdr223 mdr223 merged commit 53b6055 into dev Feb 3, 2025
@mdr223 mdr223 deleted the chjun-0201 branch February 3, 2025 15:16
mdr223 added a commit that referenced this pull request Feb 6, 2025
…vert(), remove Schema from demos when possible. (#104)

* Create chat.rst (#96)

* Create chat.rst

* Update pyproject.toml

Hotfix for chat

* Update conf.py

Hotfix for chat.rst

* code update for #84

This implementation basically resolves #84.

One implementation is different from the #84:
.add_columns(
  cols=[
    {"name": "sender", "type": "string", "udf": compute_sender},
    ...
  ]
)

If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns.

* use field_values instead of field_types as field_values have the actual values,

use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types.

records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record.

* Remove .convert() and use .sem_add_columns or .add_columns instead

This change is based on #101 and #102, please review them first then this change.

1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert().

2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later.

* ruff check --fix

* fix unittest

* demos fixed and unit tests running

* fix add_columns --> sem_add_columns in demo

* udpate quickstart to reflect code changes; shorten text as much as possible

* passing unit tests

* remove convert() everywhere

* fixes to correct errors in demos; update quickstart and docs

---------

Co-authored-by: Gerardo Vitagliano <[email protected]>
Co-authored-by: Matthew Russo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants