Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_kpi_calculation_45 (median + mean) calcs sporadically fail #338

Closed
anchit-chandran opened this issue Oct 29, 2024 · 7 comments
Closed
Assignees
Labels
help wanted Extra attention is needed

Comments

@anchit-chandran
Copy link
Contributor

anchit-chandran commented Oct 29, 2024

test_kpi_calculation_45 sporadically fails. My guess is that it's because of some floating point arithmetic but I'm not too sure. Check to see if there's a difference in calculation between how the test calculates the median, vs how the SQL query calculates median.

The relevant code exists in:

test file: test_kpis_44_49.py

  • test_kpi_calculation_45 is the test name

  • the test uses calculate_median fn, defined on line 665

  • kpi calculation: kpis.py

  • calculation method calculate_kpi_45_median_hba1c is defined on line 3098

  • there's no native Median fn for django's orm, so defined my own at the bottom of file on line 3570: class Median(Func):

I've noticed in a failed test output, the actual value (coming from the kpi calculation method) is a Decimal('46.75'), vs we're expecting a value of 47.0. It might just be that Decimal is more accurate and we should update the test to expecting Decimal, but confirm this is the case as I'm not sure. Conversely, it might be the case we need to cast the Decimal to a native python value as it's going to be passed in the view to the frontend:

expected = KPIResult(total_eligible=2, total_ineligible=2, total_passed=47.0, total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)
actual = KPIResult(total_eligible=2, total_ineligible=2, total_passed=Decimal('46.75'), total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)

Cmd to test repeatedly until fail:

while pytest project/npda/tests/kpi_calculations/test_kpis_44_49.py::test_kpi_calculation_45 -s; do :; done

Example Failed test output


expected = KPIResult(total_eligible=2, total_ineligible=2, total_passed=47.0, total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)
actual = KPIResult(total_eligible=2, total_ineligible=2, total_passed=Decimal('46.75'), total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)

    def assert_kpi_result_equal(
        expected: KPIResult,
        actual: KPIResult,
    ) -> None:
        """
        Asserts that two KPIResult objects are equal by comparing their fields and provides
        a detailed error message if they are not.
    
        :param expected: The expected KPIResult object.
        :param actual: The actual KPIResult object.
        :raises AssertionError: If the fields in the KPIResult objects differ.
        """
        if isinstance(expected, KPIResult) is False:
            raise TypeError(
                f"expected must be of type KPIResult (current: {type(expected)}"
            )
        if isinstance(actual, KPIResult) is False:
            raise TypeError(f"actual must be of type KPIResult (current: {type(actual)}")
    
        mismatches = []
    
        if expected.total_eligible != actual.total_eligible:
            mismatches.append(
                f"total_eligible: expected {expected.total_eligible}, got {actual.total_eligible}"
            )
    
        if expected.total_passed != actual.total_passed:
            mismatches.append(
                f"total_passed: expected {expected.total_passed}, got {actual.total_passed}"
            )
    
        if expected.total_ineligible != actual.total_ineligible:
            mismatches.append(
                f"total_ineligible: expected {expected.total_ineligible}, got {actual.total_ineligible}"
            )
    
        if expected.total_failed != actual.total_failed:
            mismatches.append(
                f"total_failed: expected {expected.total_failed}, got {actual.total_failed}"
            )
    
        # Queryset checks
        if expected.patient_querysets is not None:
            # If actual.patient_querysets is None, we can't compare the querysets
            if actual.patient_querysets is None:
                mismatches.append(
                    f"patient_querysets: expected {expected.patient_querysets}, got None"
                )
            else:
                # For each pt queryset in expected, check if the actual queryset is
                # the same
                for key, expected_queryset in expected.patient_querysets.items():
    
                    actual_queryset = actual.patient_querysets.get(key)
    
                    # Convert to list and order by id to compare
                    expected_queryset = list(expected_queryset.order_by("id"))
                    actual_queryset = list(actual_queryset.order_by("id"))
    
                    if expected_queryset != actual_queryset:
                        mismatches.append(
                            f"patient_querysets[{key}]:"
                            f"\nexpected_queryset\n\t{expected_queryset}"
                            f"\nactual_queryset\n\t{actual_queryset}\n"
                        )
    
        if mismatches:
            mismatch_details = "\n".join(mismatches)
>           raise AssertionError(f"KPIResult mismatch:\n{mismatch_details}")
E           AssertionError: KPIResult mismatch:
E           total_passed: expected 47.0, got 46.75

project/npda/tests/kpi_calculations/test_calculate_kpis.py:88: AssertionError
-------------------------------------------------------------------------------------------------------------------------- Captured stderr setup --------------------------------------------------------------------------------------------------------------------------
Using existing test database for alias 'default'...
========================================================================================================================= short test summary info =========================================================================================================================
FAILED project/npda/tests/kpi_calculations/test_kpis_44_49.py::test_kpi_calculation_45 - AssertionError: KPIResult mismatch:
@anchit-chandran anchit-chandran self-assigned this Oct 29, 2024
@anchit-chandran
Copy link
Contributor Author

@dc2007git @reecehill are either of you able to pick this up?

@anchit-chandran anchit-chandran added the help wanted Extra attention is needed label Nov 4, 2024
@anchit-chandran
Copy link
Contributor Author

@cillian-rcpch ignore the rest of the above, narrowed it down to how the median is calculated. In fixing this, was wondering how you guys calculate median for an even number of elements:

Eg n=4

  1. choose one of the middle elements (elements 2 or 3)
  2. mean of middle 2 ([element 2 + element 3] / 2)

@anchit-chandran
Copy link
Contributor Author

(for my own debugging record)

("v={'visit_date': None, 'hba1c_date': None, 'patient__diagnosis_date': "
 "datetime.date(2002, 8, 19), 'hba1c': None, 'patient__postcode': "
 "'ineligible_patient_too_old'}\n")
("v={'visit_date': datetime.date(2024, 10, 1), 'hba1c_date': "
 "datetime.date(2024, 7, 2), 'patient__diagnosis_date': datetime.date(2023, 7, "
 "23), 'hba1c': Decimal('49.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 10, 1), 'hba1c_date': "
 "datetime.date(2024, 1, 8), 'patient__diagnosis_date': datetime.date(2023, "
 "11, 28), 'hba1c': Decimal('47.00'), 'patient__postcode': "
 "'passing_pt_1_median_46'}\n")
("v={'visit_date': datetime.date(2024, 7, 1), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, 7, "
 "23), 'hba1c': Decimal('48.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 7, 1), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, "
 "11, 28), 'hba1c': Decimal('46.00'), 'patient__postcode': "
 "'passing_pt_1_median_46'}\n")
("v={'visit_date': datetime.date(2024, 4, 3), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, "
 "11, 28), 'hba1c': Decimal('45.00'), 'patient__postcode': "
 "'passing_pt_1_median_46'}\n")
("v={'visit_date': datetime.date(2024, 4, 3), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, 7, "
 "23), 'hba1c': Decimal('47.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 3, 31), 'hba1c_date': "
 "datetime.date(2024, 3, 31), 'patient__diagnosis_date': datetime.date(2023, "
 "7, 23), 'hba1c': Decimal('1.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 3, 22), 'hba1c_date': None, "
 "'patient__diagnosis_date': datetime.date(2016, 6, 27), 'hba1c': None, "
 "'patient__postcode': 'ineligible_patient_visit_date'}\n")
passing_pt_2_median_48: hba1c_values=[Decimal('49.00'), Decimal('48.00'), Decimal('47.00')]
passing_pt_1_median_46: hba1c_values=[Decimal('46.00'), Decimal('45.00')]

median_hba1cs=[Decimal('48.00'), Decimal('45.50')]

final result: median_of_median_hba1cs=Decimal('46.75')
{'hba1c': None,
 'hba1c_date': None,
 'patient__diagnosis_date': datetime.date(2008, 8, 1),
 'patient__postcode': 'ineligible_patient_too_old',
 'visit_date': None}

{'hba1c': Decimal('49.00'),
 'hba1c_date': datetime.date(2024, 7, 2),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 10, 1)}

{'hba1c': Decimal('47.00'),
 'hba1c_date': datetime.date(2024, 1, 8),
 'patient__diagnosis_date': datetime.date(2019, 7, 1),
 'patient__postcode': 'passing_pt_1_median_46',
 'visit_date': datetime.date(2024, 10, 1)}

{'hba1c': Decimal('48.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 7, 1)}

{'hba1c': Decimal('46.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2019, 7, 1),
 'patient__postcode': 'passing_pt_1_median_46',
 'visit_date': datetime.date(2024, 7, 1)}

{'hba1c': Decimal('45.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2019, 7, 1),
 'patient__postcode': 'passing_pt_1_median_46',
 'visit_date': datetime.date(2024, 4, 3)}

{'hba1c': Decimal('47.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 4, 3)}

{'hba1c': Decimal('1.00'),
 'hba1c_date': datetime.date(2024, 3, 31),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 3, 31)}

{'hba1c': None,
 'hba1c_date': None,
 'patient__diagnosis_date': datetime.date(2012, 12, 4),
 'patient__postcode': 'ineligible_patient_visit_date',
 'visit_date': datetime.date(2024, 3, 22)}

passing_pt_1_median_46: hba1c_values=[Decimal('47.00'), Decimal('46.00'), Decimal('45.00')]
passing_pt_2_median_48: hba1c_values=[Decimal('49.00')]

median_hba1cs=[Decimal('46.00'), Decimal('49.00')]

final result: median_of_median_hba1cs=Decimal('47.50')

@cillian-rcpch
Copy link

Hi @anchit-chandran, the median of a list with an even number of elements is dealt with by number 2 (mean of middle).

@anchit-chandran
Copy link
Contributor Author

Thanks @cillian-rcpch. also to confirm, we have kpi 45 defined in docs as

## 45. Median HbA1c

### Calculation

**Numerator**: Median of HbA1c measurements (item 17) within the audit period, excluding measurements taken within 90 days of diagnosis

**Denominator**: Total number of eligible patients (measure 1)

**Notes**: The median for each patient is calculated. We then calculate the median of the medians. Only `eligible` and `ineligible` patient querysets are valid; others should be discarded.

**Data Items**: 17

My query filters for valid hba1c_dates as follows (both are within the given audit period):

hba1c_date is 90 days AFTER diagnosis_date = INVALID so excluded
hba1c_date is 91 days AFTER diagnosis_date = VALID so included

The test also asserts the same so want to confirm

@cillian-rcpch
Copy link

Yep that all looks good to me @anchit-chandran !

@mbarton
Copy link
Member

mbarton commented Nov 8, 2024

Fixed in #352

@mbarton mbarton closed this as completed Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants