Update dependency io.openlineage:openlineage-java to v1.28.0 #3002
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
1.23.0
->1.28.0
Release Notes
OpenLineage/OpenLineage (io.openlineage:openlineage-java)
v1.28.0
Compare Source
Added
#3444
@pawel-big-lebowskiEnable providing configuration for SSL context within HTTP transport.
#3442
@pawel-big-lebowskiSpark integration filters OpenLineage events for specific plan node classes. This can be now extended with extra config entries:
allowedSparkNodes
anddeniedSparkNodes
. See Spark Configuration documentation for more details.#3437
@aritrabandyoThis circuit breaker that executes task on a queue backed threadpool, gives up tasks if the queue is full, and keeps track of rejected tasks.
#3429
@whitleykeithThis allows Trino integration to emit proper events containing Trino datasets.
#3430
@ssanthanam185Adds coverage for AlterTableRecoverPartitionsCommandVisitor, RefreshTableCommandVisitor, RepairTableCommandVisitor.
Changed
#3425
@d-m-hThis presents no functional change to the listener, however it will allow for improved initialisation of the listener in the future.
#3435
@pawel-big-lebowskiIn case of unsupported classes,
warn
logs without a stacktrace should be produced.#3443
@d-m-hThis is an initial refactor to a larger code base change that will see the removal of direct access of the QueryExecution object. It has no functional change on the way the integration behaves.
Fixed
COMPLETE
events.#3434
@pawel-big-lebowski*Send input datasets in
COMPLETE
events while making sure version facet is attached onSTART
only.#3432
@MassyBFixes incorrect structure of ParentRunFacet.
v1.27.0
Compare Source
Added
#3099
@pawel-big-lebowskiNew flink listener to extract lineage through native Flink interfaces. Supports Flink SQL. Requires Flink 2.0.
#3362
@MassyBNew option for dbt integration now can handle test and build commands too.
#3379
@ssanthanam185Events emitted from RDDExecutionContext now include custom facets that get loaded as part of InternalHandlerFactory.
#3390
@ssanthanam185Events emitted from RDDExecutionContext now include custom facets that get loaded as part of InternalHandlerFactory.
Changed
#3391
@JDarDagranAdds
with_additonal_properties
method that allows to create modified instance of facet with additional properties.#3379
@ssanthanam185Events emitted from RDDExecutionContext now include custom facets that get loaded as part of InternalHandlerFactory.
#3403
@ssanthanam185Those events shouldn't be filtered outside Databricks/Delta ecosystem.
Fixed
#3368
@ddebowczyk92Fixes ClassNotFoundException issue when using the openlineage-spark integration alongside a Spark connector that implements the spark-extension-interfaces due to class loader conflicts.
#3368
@cisenbeSQL parser won't error on Snowflake's LATERAL keyword.
#3311
@dsaxton-1passworddbt integration won't fail when looking at tests on seeds.
#3379
@ssanthanam185Spark integration now correctly handles complex jobs that have cycles and nested RDD trees.
json
file extension.#3404
@kacpermudaWhen append=False, the json file extension wasn't properly added before.
v1.26.0
Compare Source
Added
#3314
@MassyBIf --consume-structured-logs flag is set, dbt integration will consume dbt structured logs and report execution progress in real time.
transform
transport to allow event modification.#3301
@pawel-big-lebowskiNew transport type allows to modify the event based on the specified transformer class.
#3305
[#3305
] @pawel-big-lebowskiEmit events in parallel for composite transport. Running in parallel is a default behaviour
continueOnFailure
set totrue
. Default value ofcontinueOnFailure
got changed fromfalse
totrue
.ScanReport
andCommitReport
in OpenLineage events when dealing with Iceberg tables.#3256
@pawel-big-lebowskiCollects additional Iceberg metrics for datasets read or written through the library. Visit
Dataset Metrics
docs for more details.#3280
@mobuchowskiAdds support for duckdb adapter for dbt integration.
Changed
DatasetFactory
to supportDataset
creation.#3207
@pawel-big-lebowskiAdds
DatasetFactory
to supportDataset
creation. This class is used to createDataset
instances forDatasetFactory
.Fixed
#3285
@pawel-big-lebowskiGCS path now has correctly stripped leading slash
v1.25.0
Compare Source
Added
#3264
@mayurmadnaniDbt integration now uses SQL parser to add information about collected column-level lineage.
#3240
#3263
@pawel-big-lebowskiFix issues related to existing output statistics collection mechanism and fetch input statistics. Output statistics contain now amount of files written, bytes size as well as records written. Input statistics contain bytes size and number of files read, while record count is collected only for DataSourceV2 sources.
#3238
@pawel-big-lebowskiExtend spec with a new facet InputStatisticsInputDatasetFacet modelled after a similar OutputStatisticsOutputDatasetFacet to contain statistics about input dataset read by a job.
Changed
#3244
@tnazarewExcludes META-INF/*TransportBuilder to avoid version conflicts
DatasetFactory
#3207
@pawel-big-lebowskiAdds extra capabilities into
DatasetFactory
class, marks some public developers' API methods as deprecated.Fixed
#3228
@NJA010dbt integration now takes into account modified
test_metadata
field#3253
@JorricksTake into account modified initialSnapshot name
v1.24.2
Compare Source
Added
#3167
@codelixirUpdates the GCP Dataproc run facet to include jobType property
#3186
@JDarDagranUse EnvironmentVariablesRunFacet in Python client
#3221
@JDarDagran#3142
@arturowczarekSpark integration has integration tests for EMR
Changed
#3205
@mobuchowskiMoves Kinesis integration to a separate module and updates HTTP transport to use HttpClient 5.x
#3219
@arturowczarek#3148
@codelixirAdds flag to limit the logs in RddPathUtils::extract() to avoid OutOfMemoryError for large jobs
Fixed
#3215
@mobuchowski#3217
@arturowczarek#3208
@MassyBFix: Consider dbt sources when looking for test results
#3141
@pawel-leszczynskiv1.24.1
Compare Source
v1.24.0
Compare Source
Configuration
📅 Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.