Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing DQT data #65

Open
11 tasks
peteryates opened this issue Jan 23, 2025 · 0 comments
Open
11 tasks

Importing DQT data #65

peteryates opened this issue Jan 23, 2025 · 0 comments

Comments

@peteryates
Copy link
Member

peteryates commented Jan 23, 2025

This was copied from @joe-harrison-dfe's original data quality issues doc.

1. Bad Status:

  • Implement bad status logic

This issue is when a record has an open induction period alongside an overall induction status of pass, fail or exempt.

  • There are 18 records that have an open induction period with a status that indicates they shouldn’t which we will work with the ops team to manually check.
  • There are a further 40 which are down as in progress but with no induction period which we will also manually check.
  • 100 records have no induction start and an AB that is exempt in Wales so we recommend updating these to exempt. @bbelward agrees
  • There are just under 35,000 records that have no induction dates with an overall status they shouldn’t have, so we suggest just updating those to ‘Required to Complete’RTC.
    • Rationale – no induction period means they shouldn’t be in our database
    • Worst case – someone corrects and says they have actually done induction, which we would have to correct anyway

2. Induction Period start date after end date:

  • Implement period start date after end date logic

This is an issue where an induction period has a start date that isn’t blank and is after the period end date.

  • There are 156 records that contain this issue looking at our data cut from Rob.
  • @easeynathan has conducted some spot checks on these and initial recommendation is to not migrate any of these records over.
  • Live in TRS and can check if we get a query associated with this, odds are low. @bbelward agrees

3. Induction period before overall induction start:

  • Implement induction period before overall induction start logic

This is an issue where the overall induction start date is not blank and is after the period start and/or end date where populated.

  • There are 123 records that contain this issue from our data cut from Rob.
  • This is normally caused by a bug in the DQT that can reset an overall induction start date to a more recent induction period start.
  • Recommendations are to revert records that have had their start date automatically moved and use the earliest start date associated with the record.
  • This is a bug, overall start shouldn’t change @bbelward agrees

4. Future start or end date:

  • Implement future start or end date logic

This is an issue where an induction period has a start and/or end date in the future.

  • There are 1,671 records with a future start date.
  • Of these records, the vast majority are only a few weeks in advance and after those are mostly ahead of September starts, lining up with ABs making the change before the summer holidays.
  • Recommendation is to either import everything as it is and then maybe prompt ABs to revisit or to impose a limit on future dates and not migrate any records that exceed that future limit. E.g. 90 days.
  • Recommendation is not to import any record with a start date later than 18/02/2025 as it will break our new validation rules in the service. For any records with a future date, prompt the AB to resubmit when it goes live.
  • Decision – import anything before 18 Feb, anything after won’t be imported
    • Will need to prompt ABs as a support task to re-register any ECTs starting induction after 18/02

Proposal to confirm with Julie / Colin – 2nd line analyse data issues, 1st line do support chasing to get updates

5. Induction period < 10 weeks:

  • Implement induction period < 10 weeks logic

This is an issue where an IP end is after the IP start, but the period is less than 10 weeks in duration.

  • There are 5,802 records where an IP is less than 10 weeks in length. Of these records, 5,094 have a term of 0 or null and 298 have a term of 1. Recommendation is that these records are fine to migrate. @bbelward agrees
  • There are a further 148 records which are shorter than 10 weeks in length but have between 2 and 7 terms associated with them. These need to be spot checked to understand the cause. Recommendation is to migrate as is and put a note on these records to flag the terms not aligning with length of IP. Also can flag them on the admin console to keep track off.
    • Ben agrees with moving them over – doesn't break validation but isn’t right, we can clean them when over
    • Who does the checking?

6. Duplicate start date, different ABs:

  • Implement duplicate start date different AB logic

This is an issue where an induction period has a start date that is duplicated with another AB for that TRN. (i.e. started induction with 2x ABs on the same day)

  • There are 487 records with this issue from the data cut from Rob
  • @peteryates has been working on some rules for trimming and dealing with overlapping records.
  • Might want to look at a few examples to see if one cancels early so we can remove that record.
  • Recommendation is not to migrate any of these for now. We can prioritise fixing these records for ECT’s with a current IP first. Might need to communicate to clashing ABs to resolve this.
    • @bbelward agreed not MVP
    • Need to agree who is responsible for fixing bad data that stays in TRA

7. Duplicate start date, same ABs:

  • Implement duplicate start date same AB logic

This is an issue where the IP is duplicated with the same AB for that TRN, and the period completely duplicates or has some other distinct data.

  • There are 442 records with this issue from the data cut from Rob.
  • The recommendation here is to keep the longest version of the record so we aren’t throwing any data away. Pete has written a script for dealing with overlapping records that will extend any overlapping records to ensure we don’t lose any data. Also we will take over the maximum terms associated with the record.
    • Rationale – better to keep more data than less as we don’t want to cut out someone’s induction progress
    • Ben agrees

8. Close IP starts:

  • Implement close IP starts logic

This is an issue where the induction period start is distinct for the same AB for that TRN but is less than a fortnight apart.

  • There are 140 records with this issue from the data cut from Rob
  • Recommendation is to just go with the most recently created record as this is most likely caused by manual error and the more recent record is likely going to be more accurate. @bbelward agrees

9. Invalid Period:

  • Implement invalid period logic

This is a category of issue that could occur for several reasons. These are when the period end date is null, the AB is no longer an active AB and/or the end/start date is populated and earlier than 01/09/2021. Assume person isn’t doing induction, or has another induction and this one wasn’t closed.

  • There are 4,327 records with this issue from the data cut from Rob.
  • Recommendation is to migrate the records with an invalid AB on them and to populate these records with a dummy end date and note to say that the end date is unknown. Use the de-designation date for the end dates here. @bbelward agrees
  • We would also show these periods as counting for no terms of induction served.
    • Potential support task to check these after migration to see if there are matching TRNs with open periods for example

10. Missing Data:

  • Implement missing data logic

This is a category of issue that could occur for several reasons. This is when a record has at least one of

  1. start date is null
  2. no AB,
  3. AB data is a holder/not detailed value,
  4. end date is populated but number of terms is not,
  5. TRN is an ECF started and doesn’t have an induction programme type for a closed period.
  • There are 3,727 records with this issue from the data cut from Rob.
  • Recommendation for records with a programme type that are pre-2021 is to strip the programme type from them. @bbelward agrees
  • Recommendation for records with no programme type after 2021 is to assign them an ‘unknown’ value for this rather than null.
    • Ben agrees shouldn’t block migration, unknown should have a note that says to check
  • Recommendation for records with no start date is to not migrate them. If a record only has one IP use the IP start date to populate the overall start date and bring over.
    • Our validation rules needs any overall start date
    • Migrate after launch
    • And set overall start date if they have 1 induction period
    • If it’s more than one flag to confirm start date with AB
  • Recommendation for records that are missing an AB or have a holder value is that we can’t bring them over.
    • Assume bad data, leave them but if there’s a future support issue we can refer back to them
    • Ben agrees
  • Recommendation for records with no term amounts recorded is to migrate the records, set the terms to 0 but add a note onto these to flag it was 0 in DQT.
    • Ben agrees but need a clear process for cleaning data in TRS after launch

11. Open period with an LA:

  • Implement open period with an LA logic

This is an issue where an induction period has a start date, no end date and is associated with an LA.

  • There are 584 TRNs affected by this issue across 55 ABs.
  • There are 2 options here; either we set an end date on the records to close them, or we migrate them as they are and deal with them once they are in our database.
  • Recommendation is to migrate the records and set a standard end date on them as well as 0 terms towards induction. We would also add a note to these records to make it clear to ABs that if you try and view these records you should get in contact with us.
    • Main reason for brining over is to make them viewable so an AB can see how much induction an ECT has done
  • If we want to, we can check if there are term values on any of the open LA periods and bring over that associated term value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant