`test_kpi_calculation_45` (median + mean) calcs sporadically fail #338

anchit-chandran · 2024-10-29T12:58:13Z

test_kpi_calculation_45 sporadically fails. My guess is that it's because of some floating point arithmetic but I'm not too sure. Check to see if there's a difference in calculation between how the test calculates the median, vs how the SQL query calculates median.

The relevant code exists in:

test file: test_kpis_44_49.py

test_kpi_calculation_45 is the test name
the test uses calculate_median fn, defined on line 665
kpi calculation: kpis.py
calculation method calculate_kpi_45_median_hba1c is defined on line 3098
there's no native Median fn for django's orm, so defined my own at the bottom of file on line 3570: class Median(Func):

I've noticed in a failed test output, the actual value (coming from the kpi calculation method) is a Decimal('46.75'), vs we're expecting a value of 47.0. It might just be that Decimal is more accurate and we should update the test to expecting Decimal, but confirm this is the case as I'm not sure. Conversely, it might be the case we need to cast the Decimal to a native python value as it's going to be passed in the view to the frontend:

expected = KPIResult(total_eligible=2, total_ineligible=2, total_passed=47.0, total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)
actual = KPIResult(total_eligible=2, total_ineligible=2, total_passed=Decimal('46.75'), total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)

Cmd to test repeatedly until fail:

while pytest project/npda/tests/kpi_calculations/test_kpis_44_49.py::test_kpi_calculation_45 -s; do :; done

Example Failed test output


expected = KPIResult(total_eligible=2, total_ineligible=2, total_passed=47.0, total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)
actual = KPIResult(total_eligible=2, total_ineligible=2, total_passed=Decimal('46.75'), total_failed=-1, kpi_label='KPI Name not found', patient_querysets=None)

    def assert_kpi_result_equal(
        expected: KPIResult,
        actual: KPIResult,
    ) -> None:
        """
        Asserts that two KPIResult objects are equal by comparing their fields and provides
        a detailed error message if they are not.
    
        :param expected: The expected KPIResult object.
        :param actual: The actual KPIResult object.
        :raises AssertionError: If the fields in the KPIResult objects differ.
        """
        if isinstance(expected, KPIResult) is False:
            raise TypeError(
                f"expected must be of type KPIResult (current: {type(expected)}"
            )
        if isinstance(actual, KPIResult) is False:
            raise TypeError(f"actual must be of type KPIResult (current: {type(actual)}")
    
        mismatches = []
    
        if expected.total_eligible != actual.total_eligible:
            mismatches.append(
                f"total_eligible: expected {expected.total_eligible}, got {actual.total_eligible}"
            )
    
        if expected.total_passed != actual.total_passed:
            mismatches.append(
                f"total_passed: expected {expected.total_passed}, got {actual.total_passed}"
            )
    
        if expected.total_ineligible != actual.total_ineligible:
            mismatches.append(
                f"total_ineligible: expected {expected.total_ineligible}, got {actual.total_ineligible}"
            )
    
        if expected.total_failed != actual.total_failed:
            mismatches.append(
                f"total_failed: expected {expected.total_failed}, got {actual.total_failed}"
            )
    
        # Queryset checks
        if expected.patient_querysets is not None:
            # If actual.patient_querysets is None, we can't compare the querysets
            if actual.patient_querysets is None:
                mismatches.append(
                    f"patient_querysets: expected {expected.patient_querysets}, got None"
                )
            else:
                # For each pt queryset in expected, check if the actual queryset is
                # the same
                for key, expected_queryset in expected.patient_querysets.items():
    
                    actual_queryset = actual.patient_querysets.get(key)
    
                    # Convert to list and order by id to compare
                    expected_queryset = list(expected_queryset.order_by("id"))
                    actual_queryset = list(actual_queryset.order_by("id"))
    
                    if expected_queryset != actual_queryset:
                        mismatches.append(
                            f"patient_querysets[{key}]:"
                            f"\nexpected_queryset\n\t{expected_queryset}"
                            f"\nactual_queryset\n\t{actual_queryset}\n"
                        )
    
        if mismatches:
            mismatch_details = "\n".join(mismatches)
>           raise AssertionError(f"KPIResult mismatch:\n{mismatch_details}")
E           AssertionError: KPIResult mismatch:
E           total_passed: expected 47.0, got 46.75

project/npda/tests/kpi_calculations/test_calculate_kpis.py:88: AssertionError
-------------------------------------------------------------------------------------------------------------------------- Captured stderr setup --------------------------------------------------------------------------------------------------------------------------
Using existing test database for alias 'default'...
========================================================================================================================= short test summary info =========================================================================================================================
FAILED project/npda/tests/kpi_calculations/test_kpis_44_49.py::test_kpi_calculation_45 - AssertionError: KPIResult mismatch:

The text was updated successfully, but these errors were encountered:

anchit-chandran · 2024-11-04T09:52:38Z

@dc2007git @reecehill are either of you able to pick this up?

anchit-chandran · 2024-11-06T07:43:57Z

@cillian-rcpch ignore the rest of the above, narrowed it down to how the median is calculated. In fixing this, was wondering how you guys calculate median for an even number of elements:

Eg n=4

choose one of the middle elements (elements 2 or 3)
mean of middle 2 ([element 2 + element 3] / 2)

anchit-chandran · 2024-11-06T10:01:31Z

(for my own debugging record)

("v={'visit_date': None, 'hba1c_date': None, 'patient__diagnosis_date': "
 "datetime.date(2002, 8, 19), 'hba1c': None, 'patient__postcode': "
 "'ineligible_patient_too_old'}\n")
("v={'visit_date': datetime.date(2024, 10, 1), 'hba1c_date': "
 "datetime.date(2024, 7, 2), 'patient__diagnosis_date': datetime.date(2023, 7, "
 "23), 'hba1c': Decimal('49.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 10, 1), 'hba1c_date': "
 "datetime.date(2024, 1, 8), 'patient__diagnosis_date': datetime.date(2023, "
 "11, 28), 'hba1c': Decimal('47.00'), 'patient__postcode': "
 "'passing_pt_1_median_46'}\n")
("v={'visit_date': datetime.date(2024, 7, 1), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, 7, "
 "23), 'hba1c': Decimal('48.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 7, 1), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, "
 "11, 28), 'hba1c': Decimal('46.00'), 'patient__postcode': "
 "'passing_pt_1_median_46'}\n")
("v={'visit_date': datetime.date(2024, 4, 3), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, "
 "11, 28), 'hba1c': Decimal('45.00'), 'patient__postcode': "
 "'passing_pt_1_median_46'}\n")
("v={'visit_date': datetime.date(2024, 4, 3), 'hba1c_date': "
 "datetime.date(2024, 4, 2), 'patient__diagnosis_date': datetime.date(2023, 7, "
 "23), 'hba1c': Decimal('47.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 3, 31), 'hba1c_date': "
 "datetime.date(2024, 3, 31), 'patient__diagnosis_date': datetime.date(2023, "
 "7, 23), 'hba1c': Decimal('1.00'), 'patient__postcode': "
 "'passing_pt_2_median_48'}\n")
("v={'visit_date': datetime.date(2024, 3, 22), 'hba1c_date': None, "
 "'patient__diagnosis_date': datetime.date(2016, 6, 27), 'hba1c': None, "
 "'patient__postcode': 'ineligible_patient_visit_date'}\n")
passing_pt_2_median_48: hba1c_values=[Decimal('49.00'), Decimal('48.00'), Decimal('47.00')]
passing_pt_1_median_46: hba1c_values=[Decimal('46.00'), Decimal('45.00')]

median_hba1cs=[Decimal('48.00'), Decimal('45.50')]

final result: median_of_median_hba1cs=Decimal('46.75')

{'hba1c': None,
 'hba1c_date': None,
 'patient__diagnosis_date': datetime.date(2008, 8, 1),
 'patient__postcode': 'ineligible_patient_too_old',
 'visit_date': None}

{'hba1c': Decimal('49.00'),
 'hba1c_date': datetime.date(2024, 7, 2),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 10, 1)}

{'hba1c': Decimal('47.00'),
 'hba1c_date': datetime.date(2024, 1, 8),
 'patient__diagnosis_date': datetime.date(2019, 7, 1),
 'patient__postcode': 'passing_pt_1_median_46',
 'visit_date': datetime.date(2024, 10, 1)}

{'hba1c': Decimal('48.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 7, 1)}

{'hba1c': Decimal('46.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2019, 7, 1),
 'patient__postcode': 'passing_pt_1_median_46',
 'visit_date': datetime.date(2024, 7, 1)}

{'hba1c': Decimal('45.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2019, 7, 1),
 'patient__postcode': 'passing_pt_1_median_46',
 'visit_date': datetime.date(2024, 4, 3)}

{'hba1c': Decimal('47.00'),
 'hba1c_date': datetime.date(2024, 4, 2),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 4, 3)}

{'hba1c': Decimal('1.00'),
 'hba1c_date': datetime.date(2024, 3, 31),
 'patient__diagnosis_date': datetime.date(2024, 3, 22),
 'patient__postcode': 'passing_pt_2_median_48',
 'visit_date': datetime.date(2024, 3, 31)}

{'hba1c': None,
 'hba1c_date': None,
 'patient__diagnosis_date': datetime.date(2012, 12, 4),
 'patient__postcode': 'ineligible_patient_visit_date',
 'visit_date': datetime.date(2024, 3, 22)}

passing_pt_1_median_46: hba1c_values=[Decimal('47.00'), Decimal('46.00'), Decimal('45.00')]
passing_pt_2_median_48: hba1c_values=[Decimal('49.00')]

median_hba1cs=[Decimal('46.00'), Decimal('49.00')]

final result: median_of_median_hba1cs=Decimal('47.50')

cillian-rcpch · 2024-11-06T12:06:32Z

Hi @anchit-chandran, the median of a list with an even number of elements is dealt with by number 2 (mean of middle).

anchit-chandran · 2024-11-07T09:23:32Z

Thanks @cillian-rcpch. also to confirm, we have kpi 45 defined in docs as

## 45. Median HbA1c

### Calculation

**Numerator**: Median of HbA1c measurements (item 17) within the audit period, excluding measurements taken within 90 days of diagnosis

**Denominator**: Total number of eligible patients (measure 1)

**Notes**: The median for each patient is calculated. We then calculate the median of the medians. Only `eligible` and `ineligible` patient querysets are valid; others should be discarded.

**Data Items**: 17

My query filters for valid hba1c_dates as follows (both are within the given audit period):

hba1c_date is 90 days AFTER diagnosis_date = INVALID so excluded
hba1c_date is 91 days AFTER diagnosis_date = VALID so included

The test also asserts the same so want to confirm

cillian-rcpch · 2024-11-08T08:15:45Z

Yep that all looks good to me @anchit-chandran !

mbarton · 2024-11-08T11:41:26Z

Fixed in #352

anchit-chandran self-assigned this Oct 29, 2024

anchit-chandran added the help wanted Extra attention is needed label Nov 4, 2024

anchit-chandran mentioned this issue Nov 7, 2024

[KPIS] Fix test for KPI45 sporadically failing #352

Merged

mbarton closed this as completed Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_kpi_calculation_45` (median + mean) calcs sporadically fail #338

`test_kpi_calculation_45` (median + mean) calcs sporadically fail #338

anchit-chandran commented Oct 29, 2024 •

edited

Loading

anchit-chandran commented Nov 4, 2024

anchit-chandran commented Nov 6, 2024

anchit-chandran commented Nov 6, 2024

cillian-rcpch commented Nov 6, 2024

anchit-chandran commented Nov 7, 2024

cillian-rcpch commented Nov 8, 2024

mbarton commented Nov 8, 2024

test_kpi_calculation_45 (median + mean) calcs sporadically fail #338

test_kpi_calculation_45 (median + mean) calcs sporadically fail #338

Comments

anchit-chandran commented Oct 29, 2024 • edited Loading

anchit-chandran commented Nov 4, 2024

anchit-chandran commented Nov 6, 2024

anchit-chandran commented Nov 6, 2024

cillian-rcpch commented Nov 6, 2024

anchit-chandran commented Nov 7, 2024

cillian-rcpch commented Nov 8, 2024

mbarton commented Nov 8, 2024

`test_kpi_calculation_45` (median + mean) calcs sporadically fail #338

`test_kpi_calculation_45` (median + mean) calcs sporadically fail #338

anchit-chandran commented Oct 29, 2024 •

edited

Loading