Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating or connecting to databases with non-UTF8 character sets #13

Closed
rowland opened this issue Feb 6, 2015 · 14 comments
Closed

Creating or connecting to databases with non-UTF8 character sets #13

rowland opened this issue Feb 6, 2015 · 14 comments

Comments

@rowland
Copy link
Contributor

rowland commented Feb 6, 2015

It's not always desired to create databases or to connect to databases with character set UTF8, which is what is currently hard-coded. I would be happy to submit a patch, but the DSN format would need to be extended. The DSN used by the postgresql library lib/pq appears to use space-separated key=value pairs. My go-fb library uses semicolon-separated key=value pairs. I don't mind what schema is used as long as it's extensible. Your library, your call...

@mliese
Copy link

mliese commented Apr 11, 2017

Would like to push this issue.
Is their a easy fix for databases with non-UTF8 character sets?

@rowland
Copy link
Contributor Author

rowland commented Apr 11, 2017

Since I filed this issue, firebirdsql settled on URL format, using url.Parse to parse the DSN. So, it should now be possible to construct a patch to change the create/connect parameters. I don't remember now what the ramifications would be. I've been able to store and retrieve binary data and I wouldn't expect the driver to transliterate between character sets.

@roelandm
Copy link

The problem is that the text is always converted from UTF-8.
If the charset is NONE a charmap should be used to convert to the correct encoding.
Could this be a parameter somewhere? Not sure how I can do this in the connection string.

Here I'm using win1252 if the charset is NONE, but that could be wrong.

func (x *xSQLVAR) parseString(raw_value []byte) interface{} {
	if x.sqlsubtype == 1 { // OCTETS
		return raw_value
	}
	// The charset is NONE. What should I use? Parameter?
	if x.sqlsubtype == 0 {
		dec := charmap.Windows1252.NewDecoder()
		v, _ := dec.Bytes(raw_value)
		return string(v)
	}

	return bytes.NewBuffer(raw_value).String()
}

func (x *xSQLVAR) value(raw_value []byte) (v interface{}, err error) {
	switch x.sqltype {
	case SQL_TYPE_TEXT:
		v = x.parseString(raw_value)
	case SQL_TYPE_VARYING:
		v = x.parseString(raw_value)

@the-Arioch
Copy link

Mr. Nakagami also contributed to JayBird library (Java access to FB).

Just for the sake of uniformity maybe Go driver could also settle on the same URI-standard method of setting parameters. Especially since it already uses generic URI parsing - as in #107

W.r.t. connection character set it can go as "localhost:3050/C:/db/employee.fdb?encoding=UTF8"

Example taken from ch.2.1.1. Specifying extended properties - https://firebirdsql.github.io/jaybird-manual/jaybird_manual.html#connection-drivermanager

@nakagami
Copy link
Owner

I think UTF-8 encoding database is the best because golang string is UTF-8 centric.
Are everyone need connection to non-UTF8 charset databases ?

@roelandm
Copy link

UTF-8 is the best, but I have an existing database that uses win1252 encoding.

@nakagami
Copy link
Owner

nakagami commented May 15, 2020

I understand that @r03 need it in your old database.
Does anyone know how to set database encoding as a parameter for op_connect and automaticaly get utf-8 string ?

I'm sorry, I made a mistake.

If the driver set charset to isc_dpb_lc_type and decode to utf-8 when fetching, it seems to be good.

I'll think about it, but I'd be happy if someone would give me a pull request.

@nakagami
Copy link
Owner

nakagami commented May 17, 2020

Not all (but enough) charset support
#109
Is it correct ?

Do you think it's working?
If it's going to work, I'm going to merge it.

@roelandm
Copy link

I need some time to test this, but it looks very good! Thanks!

@bat22
Copy link

bat22 commented May 17, 2020

I think converting data when fetching is wrong. If we need data in utf8 we connect to database with utf8 charset, even if the database has non utf8 encoding. The server itself does the conversion and does it well.

I've been working with win1251 database for some years. Delphi application connects with win1251 charset, golang services connect with utf8 charset, everything is good.

In my opinion, the driver should have charset option to be able to:

  • create database with non utf8 encoding
  • fetch strings as is with source encoding

ps
also the proposed solution will affecting performace

@nakagami
Copy link
Owner

I agree that it would be better to handle the database in UTF8.

In golang, I think character code conversion is necessary if we are dealing with a non-UTF8 database. Because golang string is UTF8 centric.
At least this patch will allow us to fulfill our requests.

If we don't specify the charset as before, it won't be too slow.

If there is a better way to do it than this patch, please indicate it in code.

@bat22
Copy link

bat22 commented May 17, 2020

In golang, I think character code conversion is necessary if we are dealing with a non-UTF8 database. Because golang string is UTF8 centric.

I totally disagree.

Firstly, when we are dialing with non utf8 databases and want to handle strings in utf8 we should connect with utf8 charset (how driver works now) and server will convert data automatically.

Secondly, strings in golang can contains any data (https://blog.golang.org/strings)
It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.

@nakagami
Copy link
Owner

Please vote on #109

@nakagami nakagami closed this as completed Jul 8, 2020
@roelandm
Copy link

Finally migrated to your latest version and the charset changes work perfect for me. Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants