-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wchar_t / unicode support broken? - initscr breaking putwc #53
Comments
Small update: In the meantime i have rebuilt my application against ncursesw which resolved the issue with no code changes whatsoever besides including the appropriate header files. This is somewhat unfortunate as i would rather use netbsd-curses but given that even after extensive debugging i have failed to track down the cause of putwc failing it seems that is the best i can do for now. |
hmm, interesting, i didnt get a github notification when you opened this issue 7 days ago. |
Sadly building against 0.3.1 does not seem to make a difference. Non 7bit characters still show up as inverse question marks. What might be somewhat interesting though is that adding borders generally works. I guess the characters used to render the border aren't 7bit either but then rendering those is done using "alternative charset mode" (i am not sure how that works). I've tried patching one of the codepoints that seemed to be assigned to the be the "alternate" representation of a 7bit character but the change was ignored so being able to render those might not mean much as it seems the association is hardcoded one way or another. Edit: I have just tried all versions down to 0.0.4 which while somewhat differing in rendering all exhibit the same behavior regarding non 7bit wchars. |
thanks for testing. could you come up with a minimal example program exposing the behaviour ? |
I've tried but it proved difficult. My application does not do a whole lot before initializing curses but even after stripping basically everything it still seems to do "something" that results in a different scenario than just calling putwc, initscr and then putwc again. The most similar behavior i managed to trigger was put putwc failing before and after initscr. Trying to replicate the original scenario i noticed a lot of weird behavior from putwc for no obvious reason though, so i decided to dig into glibc to see what putwc was actually doing and that's when things started making a lot more sense: glibc handles wide characters pretty much separate from normal output. Down to using different buffers based on which kind of character is being written. Intermixing both kind of writes seems like a pretty bad idea with this logic and from what i saw in my testing i am almost certain doing so causes errors in regards to what is actually written to the screen (missing output - likely due to reading from the wrong internal buffer - and similar weird behavior). There is also another thing which is probably the main cause of what i was seeing earlier. Glibc keeps kind of a "wide character state" on the FILE* handle (mostly stdout in this case). It's not obvious (especially in regards to stdout/stderr) where this is actually decided (it's possible to force it on open - i highly doubt this is done for the default descriptors though) but it seems possible for glibc to set this the fly when handling functions that write to the FILE* handle. Now this state contains a reference to a conversion function that deals with turning the wchar_t data into something the terminal understands. This state is set ONCE and never touched again, so if there is any output before a usable LC_CTYPE (which is queried it seems) is setup or it changes later on the conversion function reference on the file handle goes out of sync... Now my theory is that my original code somehow triggers glibc to set the wide character conversion function to something that goes out of sync during initscr. Either that or the tputs/ti_puts calls using non wide writes simply confuse glibc's internal state. I am somewhat short on time right now but i'll do a bit more testing to see if a call to freopen (which should reset the wide character state) has any effect. It also seems that fprintf( stdout, "%lc", wchar ) has a slightly more intelligent logic when it comes to intermixing character types. So if the freopen approach fails i'll probaly try to replace putwc with fprintf which would admittedly be kinda ugly but so seems glibc's approach at handling wide character output. |
I've had a little more time than i initially thought so i investigated a bit. Here is some code demonstrating glibc's confusion when mixing putwc with putc calls:
Interestingly from reading glibc source it seems neither putc nor putwc itself set the stream orientation (which is usually done using an implicit call to fwide) but obviously something down the line does. Calling freopen after initscr to reset the internal state actually did make a difference but output is still broken (ignoring line breaks). To be honest i am pretty baffled with glibc's behavior. I've tried patching putchar.c:92 to use status = fprintf(outfd, "%lc", wch); instead of status = putwc(wch, outfd); which does fix the weirdness but not really because it's more intelligent or even "doing the right thing". What fprintf does for wide characters is call wcrtomb to convert to multibyte no matter what which does work in my case because of of the UTF-8 locale but would likely fail for anything else. |
ouch, sorry that i didn't realize it before, but you cannot safely mix stdio output with curses output, because both systems use internal buffering and bookkeeping of positions and such, and are unaware of each other. a curses application shouldnt output anything before calling initscr(), and after that only use curses routines for output. |
No need to be sorry. It's not really related to stdio anyways. You can take for example test10 and remove the fprintf to stderr (which was actually chosen to avoid messing with stdouts internal state) and the result is still the same:
It's not even about mixing waddstr and waddwstr. The following version just calling waddwstr does the same:
I am pretty sure it's related to glibc's internal stream state in regard to wide character handling associated with stdout being chosen by the first write occurring to it which is likely some putc call invoked by tputs/ti_puts during initscr (like i've said initially, hacking libcurses to turn tputs/ti_puts into noops keeps wide character output functional). At least i don't have a better explanation for what i am seeing. It's not like i've read every single line of libcurses / glibc and fully comprehend every little detail (especially in regards to libcurses - the code is pretty crazy at times!). |
test10 prints |
Yes, that would be exactly what i'd expect to show up on the screen. What i am getting is an |
By the way there is no need to rush this issue. I've added a switch to my build scripts for now that selects between netbsd-curses and ncurses (if i end up with a bit of free time i might also take a look to see if i can figure out what ncurses does differently in regards to wide character handling) so i can easily switch back if a solution is found and it doesn't bother me much right now. I am generally quite grateful you are taking the time to look into this. |
It looks like netbsd curses is internally mixing byte and wide stdio calls, which produces undefined behavior. See at least: netbsd-curses/libcurses/putchar.c Lines 62 to 92 in ae69600
I think you want to use narrow stdio functions entirely regardless of the type of character being printed. See if replacing the |
Well put. I pretty much agree with you. Regarding |
@michaelforney has produced this patch: http://ix.io/49Ui - you might want to try it out. |
Nice, works for me! I've build the test10 code against libcurses patched with your link and it printed |
thanks for confirmation. i'll close the issue once the patch is merged - which can take some time as michael wants to upstream it to netbsd proper so i have to backport the changes since last year from there. |
Alright. Seems sensible. |
|
I see. The proposed solution is probably fine then. Sorry, i didn't realize until now that you are the author of musl and likely know a thing or two about what standard functions do ;) |
Hi,
i've spent quite a bit of time trying to debug this issue and at this point i feel confident to say it might be a bug. If putwc is called with a valid codepoint (i've used 0xF6 for testing which should render as "ö") before initscr everything works as expected but anywhere after initscr non 7bit codepoints render as unknown characters (inverse color question marks) and as putwc is used by all functions printing wchar_t strings (for example waddwstr) this basically breaks unicode support. Locale settings (setlocale) don't seem to affect this at all (en_US.UTF-8, C or an invalid setting all result in the same behavior).
Trying to track down the cause i've come to the conclusion that it seems to be related to tputs/ti_puts calls (adding macros to curses_private.h turning those into noops keeps putwc functional but obviously otherwise breaks rendering badly). Sadly i wasn't able to pin this to a certain call as it seems at least multiple (or even all of them?) result in putwc breaking. Any kind of hint or pointer on to how to get this working would be greatly appreciated.
Testing was done on Devuan Ascii (a linux distribution pretty much identical to Debian 9.0 sans systemd). Curses version used is 0.3.2 release. I've also tried a handful of different terminal definitions which made no difference regarding putwc behavior (terminal used for most testing is sakura which is libvte based identifying as xterm-256color. Xterm (the application, not just the definition) gave the same results though.
The text was updated successfully, but these errors were encountered: