Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-node Python unit tests fail #52

Closed
edmondop opened this issue Dec 12, 2024 · 1 comment
Closed

Single-node Python unit tests fail #52

edmondop opened this issue Dec 12, 2024 · 1 comment

Comments

@edmondop
Copy link
Contributor

After enabling the Python unit tests on my fork here, this simple test fails:

def test_basic_query_succeed():
    df_ctx = SessionContext()
    ctx = DatafusionRayContext(df_ctx)
    df_ctx.register_csv("tips", "examples/tips.csv", has_header=True)
    record_batch = ctx.sql("SELECT * FROM tips")
    assert record_batch.num_rows == 244

As one can see from this run
https://github.com/edmondop/datafusion-ray/actions/runs/12268595231/job/34230694956:

the cause is (execute_query_stage pid=8895) index out of bounds: the len is 0 but the index is 0

effectively, the problem comes from the Query Stage code

    pub fn get_input_partition_count(&self) -> usize {
        self.plan.children()[0]
            .properties()
            .output_partitioning()
            .partition_count()
    }

that throws an exception because the CsvExec doesn't have children.

Query stage #0:
CsvExec: file_groups={1 group: [[home/runner/work/datafusion-ray/datafusion-ray/examples/tips.csv]]}, projection=[total_bill, tip, sex, smoker, day, time, size], has_header=true

It might be related to the fact that the unit tests run a local ray instance

@andygrove
Copy link
Member

Fixed in #53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants