Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added TokenList::Stream class to wrap std::istream usage and implemented alternative C I/O version #244

Merged
merged 21 commits into from
Mar 2, 2023

Conversation

firewave
Copy link
Collaborator

@firewave firewave commented Mar 7, 2022

Reading the file via std::istream includes a considerable overhead caused by std::istream::sentry and others. Using C I/O instead reduces the "total Ir" usage by about 10%.

Testing with -q -Ilib/ -D__GNUC__ lib/valueflow.cpp:

Clang 13 271,013,439 -> 269,398,283 -> 244,353,123
GCC 11 -> -> ``

The intermediate value shows the improvement if we only change the actual input file. The last value if we also do it for all the includes which does not require you to use the new interface function. So we even if you do not change the application which uses this it would still result in a sizable improvement.

@firewave
Copy link
Collaborator Author

firewave commented Mar 7, 2022

Still needs the unit tests to be adjusted to test both implementations as well as making it properly selectable in the standalone binary.

This can possibly be introduced further by adding buffering to FileStream.

We could also add a MemStream version which works on a memory buffer allowing for the use case of just a passing a std::istringstream (i.e. string) and deprecating/removing the old API with std::istream.

@firewave
Copy link
Collaborator Author

firewave commented Apr 19, 2022

Somehow using fgetc() seems to already interpret the characters causing 0x00 to be omitted when read even though the file is open in "binary" mode. Using fread() (which is supposed to be faster) fixes this but it is much slower.

Turns out it's just a bug in my code with unget() and multi-byte characters.

@firewave
Copy link
Collaborator Author

firewave commented Oct 27, 2022

I backed out the test changes and will merge them into #261 so they don't get lost.

@firewave
Copy link
Collaborator Author

firewave commented Oct 27, 2022

Testing with -q -Ilib/ -D__GNUC__ lib/valueflow.cpp:

current -> std::istream input -> FILE* input
Clang 14 263,400,261 -> 255,772,481 -> 241,915,419
GCC 12 266,098,949 -> 261,699,461 -> 247,682,262

Testing with -q -Ilib/ -D__GNUC__ lib/tokenize.cpp:

current -> std::istream input -> FILE* input
Clang 14 300,502,647 -> 294,632,802 -> 275,871,698
GCC 12 304,202,739 -> 301,878,425 -> 282,870,756

@danmar
Copy link
Owner

danmar commented Dec 18, 2022

reducing the Ir count by using C I/O sounds good to me.

@firewave
Copy link
Collaborator Author

I put this into Cppcheck with the current and added TokenList construction and all tests passed on Windows and Linux. So this is finally ready for review.

@firewave firewave marked this pull request as ready for review February 25, 2023 16:23
@danmar danmar merged commit 00d1f67 into danmar:master Mar 2, 2023
@firewave firewave deleted the stream branch March 2, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants