Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-linkedin-ads-v2: fix bugs causing missing data #2107

Merged
merged 3 commits into from
Nov 1, 2024

Conversation

Alex-Bair
Copy link
Contributor

@Alex-Bair Alex-Bair commented Oct 31, 2024

Description:

We received reports of missing data for the AdCampaignAnalytics, AdCreativeAnalytics, and Creatives streams.

Changes in this PR include:

LinkedInAdsAnalyticsStream

State management issues were causing the most recent date from the previous resource to be used as the start date for the next resource. Almost every state management tool exposed by the Airbyte CDK was being used, causing the stream's state to update when it shouldn't. I converted the stream to use the newer state property to have finer control of when state is updated.

These LinkedInAdsAnalyticsStream changes address the missing data issues for AdCampaignAnalytics and AdCreativeAnalytics.

Creatives

The logic for adding a creative_name to each record was causing the connector to not emit a record if an exception occurred trying to fetch the creative_name. That logic has been refactored and improved. Improvements include:

  • Emitting records even when an exception is raised trying to fetch a creative_name.
  • Logging a more descriptive warning when an exception is raised.
  • Using the versioned endpoint /rest/posts/{encoded_share_urn} to fetch creative_names since LinkedIn is sunsetting the legacy /v2 endpoints.
  • Not raising an exception when a non-OAuth2 authentication method is used.

Snapshot tests

  • Snapshot file names were updated to align with what pytest automatically generates.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack. Confirmed:

  • The number of records received for Creatives and AdCampaignAnalytics matches up with the expected number of records.
  • The number of records received for AdCreativeAnalytics is significantly more than the connector was previously receiving.
    • AdCreativeAnalytics can take 24+ hours to backfill, so I only ran a partial backfill for it. Since both Creatives and AdCampaignAnalytics are now getting the expected number of records & AdCreativeAnalytics is getting significantly more records than before, I feel fairly confident that the missing data issues are resolved for AdCreativeAnalytics.
  • The creative_name retrieved from the newer /rest/posts/{encoded_share_urn} is the same as the creative_name retrieved from the legacy /v2 endpoint.
  • Snapshot tests passed.

All Ad${resource]Analytics bindings for existing tasks will need backfilled since how they manage state has changed.


This change is Reviewable

@Alex-Bair Alex-Bair added the change:planned This is a planned change label Oct 31, 2024
…` records

Background: LinkedIn does not always include the name of the `creative`
when we request multiple creatives, but multiple users would like a
`creative_name` field on each record if the creative has a name. We make
additional requests to a different endpoint to fetch `creative_name`s.

I've fixed some bugs in the existing logic to fetch the `creative_name`.
These improvements include:
- Supporting tasks that authenticate with non-OAuth2 methods.
- Emitting records even when an exception is raised trying to fetch a
  `creative_name`.
- Logging a more descriptive warning when an exception is raised.
- Using the versioned endpoint `/rest/posts/{encoded_share_urn}` to
  fetch creative names since LinkedIn is sunsetting legacy `/v2`
  endpoints.
- Added a new parameter to `read_records` that controls whether
  additional properties are fetched for each record. This is useful when
  we only need the ID of the parent record for the ad analytics streams.
…streams

Multiple different methods were being used to update the state for
`LinkedInAdAnalyticsStream` streams:
- `get_updated_state` was inspecting each emitted record and updating
  the stream state to the most recent date.
- `state_checkpoint_interval` was checkpointing state after every 1000
  emitted records.
- `stream_slices` was checkpointing state after each "slice" was completed.

Couple this tangled web of state management with an incorrect iteration
order, and the connector ended up missing massive quantities of records.
Essentially, the connector was using the most recent date from the
previous resource as the start date for the next resource instead of
starting each resource from the config's start date / most recent cursor.

I fixed this by stripping down the state management to only use the
newer `state` property to have finer control of when state is updated
(`stream_slices` is still used to manage date & field slicing, but we
only rely on the `state` property when determining what slices to create).
Snapshots are identical. This only changes the snapshot names to
align with what's generated by `pytest` automatically, making
development easier.
@Alex-Bair Alex-Bair force-pushed the bair/linkedin-ads-v2-missing-data branch from 6643942 to b001315 Compare November 1, 2024 15:52
@Alex-Bair Alex-Bair marked this pull request as ready for review November 1, 2024 17:10
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the testing & description of the tests; that helps a lot since I don't know anything about how Linkedin Ads works.

@Alex-Bair Alex-Bair linked an issue Nov 1, 2024 that may be closed by this pull request
2 tasks
@Alex-Bair Alex-Bair merged commit a6ee0b8 into main Nov 1, 2024
73 of 77 checks passed
@Alex-Bair Alex-Bair deleted the bair/linkedin-ads-v2-missing-data branch November 1, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:planned This is a planned change
Projects
None yet
2 participants