Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve GpuJsonToStructs performance #11560

Open
ttnghia opened this issue Oct 4, 2024 · 2 comments
Open

[FEA] Improve GpuJsonToStructs performance #11560

ttnghia opened this issue Oct 4, 2024 · 2 comments
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf epic Issue that encompasses a significant feature or body of work feature request New feature or request improve performance A performance related task/issue

Comments

@ttnghia
Copy link
Collaborator

ttnghia commented Oct 4, 2024

The performance of our current GpuJsonToStructs is not good. When running the profiling, it looks like this:

Image

In the particular test case for the profiling above, the only useful work is only what to the end of the read_json range (just above 300ms), which is less than 50% of the entire GpuJsonToStructs projection (>800ms). The rest are just overhead, but it consists mostly of hundreds of small kernel calls and stream syncs due to pure copying data from the intermediate result to the final output.

We can do a lot better by reducing the unnecessary overhead, or improving them by a way that they can run in a much less time. If we divide the runtime of GpuJsonToStructs into sections:

Image

The improvement can be done by the following tasks:

@ttnghia ttnghia added cudf_dependency An issue or PR with this label depends on a new feature in cudf epic Issue that encompasses a significant feature or body of work feature request New feature or request improve performance A performance related task/issue labels Oct 4, 2024
@ttnghia ttnghia self-assigned this Oct 4, 2024
@karthikeyann
Copy link

karthikeyann commented Oct 7, 2024

After discussion with @ttnghia, Here are the improvements planned for different sections:

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Oct 29, 2024
In Spark, the `DecimalType` has a specific number of digits to represent the numbers. However, when creating a data Schema, only type and name of the column are stored, thus we lose that precision information. As such, it would be difficult to reconstruct the original decimal types from cudf's `Schema` instance.

This PR adds a `precision` member variable to the `Schema` class in cudf Java, allowing it to store the precision number of the original decimal column.

Partially contributes to NVIDIA/spark-rapids#11560.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #17176
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Nov 8, 2024
…read_json` directly (#17180)

With this PR, `Table.readJSON` will return the output from libcudf `read_json` directly without the need of reordering the columns to match with the input schema, as well as generating all-nulls columns for the ones in the input schema that do not exist in the JSON data. This is because libcudf `read_json` already does these thus we no longer have to do it.

Depends on:
 * #17029

Partially contributes to NVIDIA/spark-rapids#11560.
Closes #17002

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #17180
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 14, 2024

With all Section 1/2/3 addressed, the final profiling shows the output as expected:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf epic Issue that encompasses a significant feature or body of work feature request New feature or request improve performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants