Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparkOfflineStore doesn't work with delta format #5084

Open
nboyarkin opened this issue Feb 24, 2025 · 0 comments
Open

SparkOfflineStore doesn't work with delta format #5084

nboyarkin opened this issue Feb 24, 2025 · 0 comments

Comments

@nboyarkin
Copy link

I was trying to use SparkOfflineStore with delta formatted files, but got the error while "feast apply":
TypeError("Comparisons should only involve FileSource class objects.")

In the traceback I see that there is a attempt to compare self.batch_source == other.batch_source, there batch_source is supposed to be a valid format for FileSource like 'csv' or 'parquet', so, it doesn't work with delta.

Initial settings for the feature store

  • Yaml file:
    project: feast
    registry:
    path: s3://bucket-name/registry.pb
    offline_store:
    type: spark
    spark_conf:
    spark.master: "local[*]"
    spark.ui.enabled: "false"
    spark.eventLog.enabled: "false"
    spark.sql.parser.quotedRegexColumnNames: "true"
    spark.sql.session.timeZone: "UTC"
    spark.sql.execution.arrow.fallback.enabled: "true"
    spark.sql.execution.arrow.pyspark.enabled: "true"
    spark.sql.catalog.spark_catalog: "org.apache.spark.sql.delta.catalog.DeltaCatalog"
    spark.sql.extensions: "io.delta.sql.DeltaSparkSessionExtension"
    spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
    spark.hadoop.fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
    spark.databricks.delta.retentionDurationCheck.enabled: "false"
    spark.hadoop.fs.s3a.endpoint: "https://storage.yandexcloud.net"
    spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    spark.jars.packages: "io.delta:delta-core_2.12:2.3.0,org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.348"
    online_store: null
    provider: local

  • feature definition in repo:
    calendar_source_delta_test = SparkSource(
    file_format='delta',
    name="calendar_enconded_source_delta_test",
    path="s3://bucket-name/features/calendar_delta_test/",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
    )

calendar_fv_delta_test = FeatureView(
name="calendar_stats_delta_test",
entities=[store, product],
ttl=timedelta(days=7),
schema=[
Field(name="year", dtype=Float64),
Field(name="month", dtype=Float64),
Field(name="week", dtype=Float64),
Field(name="quarter", dtype=Float64),
],
online=False,
source=calendar_source_delta_test
)

  • Error traceback:
    Traceback (most recent call last):
    File "/home/nboyarkin/miniconda3/envs/feast_env/bin/feast", line 8, in
    sys.exit(cli())
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/click/core.py", line 1161, in call
    return self.main(*args, **kwargs)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/cli.py", line 778, in apply_total_command
    apply_total(repo_config, repo, skip_source_validation)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/repo_operations.py", line 405, in apply_total
    apply_total_with_repo_instance(
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/repo_operations.py", line 352, in apply_total_with_repo_instance
    store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/feature_store.py", line 951, in apply
    self._registry.apply_feature_view(view, project=self.project, commit=False)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/infra/registry/registry.py", line 462, in apply_feature_view
    feature_view.class.from_proto(existing_feature_view_proto)
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/feature_view.py", line 244, in eq
    if not super().eq(other):
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/base_feature_view.py", line 160, in eq
    or self.projection != other.projection
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/attr/_make.py", line 1583, in ne
    result = self.eq(other)
    File "", line 13, in eq
    self.batch_source == other.batch_source
    File "/home/nboyarkin/miniconda3/envs/feast_env/lib/python3.9/site-packages/feast/infra/offline_stores/file_source.py", line 93, in eq
    raise TypeError("Comparisons should only involve FileSource class objects.")
    TypeError: Comparisons should only involve FileSource class objects.
  • Version: 0.46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant