Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-linkedin-ads-v2: fix bugs causing missing data #2107

Merged
merged 3 commits into from
Nov 1, 2024

Commits on Nov 1, 2024

  1. source-linkedin-ads-v2: fix how creative_name is added to `creative…

    …` records
    
    Background: LinkedIn does not always include the name of the `creative`
    when we request multiple creatives, but multiple users would like a
    `creative_name` field on each record if the creative has a name. We make
    additional requests to a different endpoint to fetch `creative_name`s.
    
    I've fixed some bugs in the existing logic to fetch the `creative_name`.
    These improvements include:
    - Supporting tasks that authenticate with non-OAuth2 methods.
    - Emitting records even when an exception is raised trying to fetch a
      `creative_name`.
    - Logging a more descriptive warning when an exception is raised.
    - Using the versioned endpoint `/rest/posts/{encoded_share_urn}` to
      fetch creative names since LinkedIn is sunsetting legacy `/v2`
      endpoints.
    - Added a new parameter to `read_records` that controls whether
      additional properties are fetched for each record. This is useful when
      we only need the ID of the parent record for the ad analytics streams.
    Alex-Bair committed Nov 1, 2024
    Configuration menu
    Copy the full SHA
    987c7b2 View commit details
    Browse the repository at this point in the history
  2. source-linkedin-ads-v2: fix state checkpointing bug for ad analytics …

    …streams
    
    Multiple different methods were being used to update the state for
    `LinkedInAdAnalyticsStream` streams:
    - `get_updated_state` was inspecting each emitted record and updating
      the stream state to the most recent date.
    - `state_checkpoint_interval` was checkpointing state after every 1000
      emitted records.
    - `stream_slices` was checkpointing state after each "slice" was completed.
    
    Couple this tangled web of state management with an incorrect iteration
    order, and the connector ended up missing massive quantities of records.
    Essentially, the connector was using the most recent date from the
    previous resource as the start date for the next resource instead of
    starting each resource from the config's start date / most recent cursor.
    
    I fixed this by stripping down the state management to only use the
    newer `state` property to have finer control of when state is updated
    (`stream_slices` is still used to manage date & field slicing, but we
    only rely on the `state` property when determining what slices to create).
    Alex-Bair committed Nov 1, 2024
    Configuration menu
    Copy the full SHA
    63fb8de View commit details
    Browse the repository at this point in the history
  3. source-linkedin-ads-v2: update snapshot names

    Snapshots are identical. This only changes the snapshot names to
    align with what's generated by `pytest` automatically, making
    development easier.
    Alex-Bair committed Nov 1, 2024
    Configuration menu
    Copy the full SHA
    b001315 View commit details
    Browse the repository at this point in the history