Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] std::string support for json library #72

Open
petke opened this issue Feb 13, 2016 · 20 comments
Open

[feature request] std::string support for json library #72

petke opened this issue Feb 13, 2016 · 20 comments

Comments

@petke
Copy link

petke commented Feb 13, 2016

In my experience most 3rd party libraries use std::string and not std::wstring in their interface. Therefore applications become tied to std::string. Because the ppl/json library uses std::wstring (on windows) this makes integrating the ppl/json library into an applications cumbersome and inefficient. One has to do lots of unnecessary wstring-to-string and string-to-wstring conversions. Could you add support for std::string, at least for the json library?

Minimal example:

    #include <cpprest/json.h>

    std::string SerializeToJson(std::string name, int id) //utf8 coming in and goin out
    {
        utility::stringstream_t stream;
        web::json::value js_obj;
        js_obj[L"id"] = web::json::value::number(id);
        js_obj[L"name"] = web::json::value::string(utility::conversions::utf8_to_utf16(name)); //unneccessary conversion 1

        js_obj.serialize(stream);
        std::wstring wret = stream.str();
        std::string ret = utility::conversions::utf16_to_utf8(wret); //unneccessary conversion 2

        return ret;
    }

    int main() {

        auto r = SerializeToJson("test", 123);
        std::cout << r << std::endl;
    }
@jameskennedy2265
Copy link

I would love to see this as a feature as well. Also huge thanks to petke for showing how to get a string variable into a json container.

While you may not be able to update this as a feature, can anyone update the documentation with this in json, it would be a huge help to a lot of people including myself.

EDIT:

For help to anyone else trying to convert from wstring to string without using the conversion utilities.

std::string newstring( oldwstring.begin(), oldwstring.end() ); //using this to convert from wstring to string, then std::copy(newstring.begin(), newstring.end(), newchar); // this is for the conversion of string to char newchar[newstring.size()] = '\0'; // gets the size exact for newchar

@ras0219-msft
Copy link
Contributor

I definitely agree that the wstring/string/string_t issues cause a ton of headache when using the library.

Let me lay out the current issues I see that need to be solve here.

  • On Windows, the underlying system APIs are utf16. We want to give applications that use utf16 everywhere the option to avoid conversions (the current implementation handles this).
  • Virtually all web payloads are utf8. Applications should be able to consume these payloads (including JSON) without incurring a utf8 -> utf16 hit (current implementation does not allow this on windows unless you grab the data as a vector<unsigned char> and parse it yourself using RapidJSON/other).
  • Unfortunately, the default standpoint on windows is that const char*/std::string is encoded according to the local codepage. For ASCII English this distinction of course doesn't matter, but we all need more ☃ in our lives. If possible, protection against encoding errors would be nice.
  • Don't break API-compat in minor versions. If the solution requires breaking source-API compatibility, we wouldn't want to incorporate it until 3.0. I'd much prefer an incremental solution that we can release sooner!
  • std::wstring is not ubiquitous. There is cognitive and technical cost to handling two string types (even with the string_t approach). Also, most text is ASCII English (especially HTML), so there's definitely a memory/performance cost for utf16 vs utf8.

There are two obvious options that I'll throw out to start with:

  1. Add overloads that silently perform the conversion for you. This doesn't fix any performance issues and could cause codepage-vs-utf8 pain, but it does simply and directly address the API UX problem.
  2. Fork the entire JSON object model for utf16 vs utf8 (something like json::u16value and json::u8value). This is more work to implement, but gives a more thorough solution to the above problems. For compatibility, json::value would alias to the "platform preferred" type as it currently does.

What do you think?

@petke
Copy link
Author

petke commented Feb 25, 2016

in my opinion the cleanest solution would be for core json types (web::json::value) etc to be templated on the string type so it can use either std::string and std::wstring. That way the user gets to choose which to use, but the implementation logic for both stays (much) the same and does not have to be forked.

Kindof how in the c++ standard library itself, the std::basic_string is templated on the character type so that it can use either char or wchar. Something similar should be possible here.

@kreuzerkrieg
Copy link
Contributor

+1
Sick and tired of converting to/from wstring/string

@megaposer
Copy link
Contributor

+1
I am going to jump on this bandwagon as well!

@grahamreeds
Copy link

You are all free to submit patches to correct this!

Sent from my Nexus 5.
On 9 Mar 2016 10:13 a.m., "megaposer" [email protected] wrote:

+1
I am going to jump on this bandwagon as well!


Reply to this email directly or view it on GitHub
#72 (comment)
.

@petke
Copy link
Author

petke commented Mar 9, 2016

Hi Graham, are you working at Microsoft and cpprest?

I suspect this is such a fundamental design change of the library, that one would like some feedback from the official maintainers first. Why did they decide to support only one or the other? Maybe there is some good reason for the current design. Do they have any interest in such a feature, or are they against it on principle?

@grahamreeds
Copy link

No, I don't work for MS. ras0219-msft did give feedback: Either lightweight wrapper or full-on conversion for version 3.0.0.

I did write a wrapper to do exactly what you mentioned in the the original post and what @ras0219-msft mentions in his (1) in a previous job because manually converting to and fro was a nuisance and a code bloat. Because of the wrapper we were able to pretty much ignore the problem.

Question is how many other teams have written similar objects to hide the pain?

A light weight wrapper would get my vote (I don't use cpprestsdk in my new job) but I would like to see a full conversion at some point in the future (you never know what I might have need of it again).

@petke
Copy link
Author

petke commented Mar 10, 2016

Thanks graham. I somehow missed that ras...msft works for ms.
If you managed to make a useful wrapper i could try and hack together one of my own. (On the other hand im in the middle of migrating to use rapid json. Its very fast but has an ugly and cumbersome interface. With allocators passed around manually everywhere, etc).

Maybe ras and friends will fix it for all at some point. Thanks

@capsocrates
Copy link

Is anyone working on this right now? In our code base we use utf8 everywhere (yes, even on Windows) and it is a big pain to have to do string conversions everywhere.

@jason-ha
Copy link
Contributor

jason-ha commented Jul 9, 2019

I found that this could be solved by adding a build option (you would need to build cpprestsdk yourself) and updating some of the #ifdef _WIN32 to #if defined(_UTF16_STRINGS). I will be creating a PR to add this support.

@damienhocking
Copy link

+1 Jason. Nice!

@jason-ha
Copy link
Contributor

I have not fully validated the changes (been building just the json support portions), but I pushed changes at https://github.com/jason-ha/cpprestsdk.git fork in branch win32_narrow_strings if anyone want to give it a whirl.

@nickyang4github
Copy link

I just wanna use std::string in Windows, but the source code does not supply a macro to control this.

@jason-ha
Copy link
Contributor

@NickYang1988 No, Windows made moved to reduce overall string complexity by supporting UNICODE with 16 characters by default.

I have verified more of my attempt to support this for json parsing. As described well by @ras0219-msft there are complications - particularly cpprestsdk having a common utility::string_t across all code.
My use has no need of the http client/server portions of cpprestsdk and I don't know that portion of code well enough to make best decision for where to convert in-out of utf16 when working with Windows system APIs. Maybe others have similar uses cases (no need of http support).

Thus I am working on an additional change to better componentize the suite. With this change cpprest_json and cpprest_pplx static libraries can be produced using new CPPREST_ISOLATE_COMPONENTS build option.

@nickyang4github
Copy link

@NickYang1988 No, Windows made moved to reduce overall string complexity by supporting UNICODE with 16 characters by default.

I have verified more of my attempt to support this for json parsing. As described well by @ras0219-msft there are complications - particularly cpprestsdk having a common utility::string_t across all code.
My use has no need of the http client/server portions of cpprestsdk and I don't know that portion of code well enough to make best decision for where to convert in-out of utf16 when working with Windows system APIs. Maybe others have similar uses cases (no need of http support).

Thus I am working on an additional change to better componentize the suite. With this change cpprest_json and cpprest_pplx static libraries can be produced using new CPPREST_ISOLATE_COMPONENTS build option.

Thanks for your reply, I know Windows use unicode/UTF16 in it's system, but after I convert std::string to string_t by using function to_string_t, but I get a exception "throw std::range_error("UTF-8 string character can never start with 10xxxxxx");"

And I read the repo with narrow_string feature you forked, but I don't know if it's stable.

@jason-ha
Copy link
Contributor

to_string_t

So you are getting that exception during the call to to_string_t. The exception is thrown from count_utf8_to_utf16. I think you need to validate the source of your UTF8 string. The exception is being thrown because an invalid encoding has been detected. UTF-8 use high bit (1xxxxxxx) to indicated extended character. But it also requires that bit 6 (8 bits 0-7; I call this out because cpprest uses 1-8 bit position notation) be 1, ie: 11xxxxxx, but you must have 10xxxxxx.
This could mean corruption or not a UTF-8 source.

@nickyang4github
Copy link

@jason-ha Here is another issue that same question, #1142
Please check it.

@jason-ha
Copy link
Contributor

branch win32_narrow_strings has been rebased to 2.10.14 and cleanups have been made to enable a clean (but not full) cpprest package.
As noted above the HTTP related code is not addressed; so that is all missing from Win32 build with narrow strings forced.
I also have refactoring and isolated libraries working. I will publish that under a separate branch.

@damienhocking
Copy link

At this point we can't build the listener and client on Windows with UTF8, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants