-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame -> ESRI Shapefile: UTF-8/16 mangled to ?????????. DataFrame -> CSV: UTF-8/16 mangled to Latin-1 characters #99
Comments
I tried an example with Cyrillic characters in a CSV through GeoDataFrames, and it seems fine. So it looks like the only issue is with your Shapefile. Since I don't have access to those I can't tell you exactly what went wrong. But you can try Shapefile.jl and see if that loads those files better, since it probably makes less assumptions and may be easier to fix if broken. GeoDataFrames is ArchGDAL under the hood so we are at the whims of GDAL here. DetailsCan you try this minimal example, and see if that causes the same error on your machine? It seems to work for me. julia> using GeoDataFrames
julia> descriptions = [String(rand('А':'Ҁ', 5)) for _ in 1:10]
10-element Vector{String}:
"дэМѲЪ"
"ѴЖїѾв"
"ѨѶоѧз"
"УѕѦИэ"
"ѸѬѳѴо"
"ЪўѝѯѮ"
"ъѲТкў"
"ѭйФйѓ"
"ђЙнШѲ"
"мщѼыг"
julia> geometries = tuple.(rand(10), rand(10))
10-element Vector{Tuple{Float64, Float64}}:
(0.3616396660054806, 0.1902277850964662)
(0.9946340856206181, 0.7562092008804872)
(0.8648571829290774, 0.00931884536274874)
(0.41750353601434986, 0.4618622731533355)
(0.04766980429969825, 0.5472432276083967)
(0.8020186665742213, 0.24774530424596475)
(0.22464094645451838, 0.37652599046554514)
(0.15877861428124762, 0.7791053151409258)
(0.27718245266096586, 0.7923647914178605)
(0.27286993041519136, 0.7142004310660254)
julia> df = GeoDataFrames.DataFrame(geometry = geometries, description = descriptions)
10×2 DataFrame
Row │ geometry description
│ Tuple… String
─────┼─────────────────────────────────────
1 │ (0.36164, 0.190228) дэМѲЪ
2 │ (0.994634, 0.756209) ѴЖїѾв
3 │ (0.864857, 0.00931885) ѨѶоѧз
4 │ (0.417504, 0.461862) УѕѦИэ
5 │ (0.0476698, 0.547243) ѸѬѳѴо
6 │ (0.802019, 0.247745) ЪўѝѯѮ
7 │ (0.224641, 0.376526) ъѲТкў
8 │ (0.158779, 0.779105) ѭйФйѓ
9 │ (0.277182, 0.792365) ђЙнШѲ
10 │ (0.27287, 0.7142) мщѼыг
julia> GeoDataFrames.write("try1.csv", df)
"try1.csv"
julia> using CSV
julia> rdf = CSV.read("try1.csv", GeoDataFrames.DataFrame)
10×1 DataFrame
Row │ description
│ String15
─────┼─────────────
1 │ дэМѲЪ
2 │ ѴЖїѾв
3 │ ѨѶоѧз
4 │ УѕѦИэ
5 │ ѸѬѳѴо
6 │ ЪўѝѯѮ
7 │ ъѲТкў
8 │ ѭйФйѓ
9 │ ђЙнШѲ
10 │ мщѼыг
julia> all(rdf.description .== descriptions)
true
|
Here's what I got running the minimal example on my machine
Opening the I would think this could be solved by given the
That doesn't work either. However, calling
So whatever |
Can you try with |
Yup, that seemed to work! Seems like for shapefiles it's https://gdal.org/en/stable/drivers/vector/shapefile.html#encoding Looking here, I guess you can pass
Or uh...maybe just EDIT 1: Okay neither EDIT 2:
EDIT 3: Okay...leaving it blank doesn't work either ( |
Hello!
Firstly, thank you for this package!
At work, we do a lot of stuff with Esri and we deal with shape files a lot. Initially I've read in a CSV file and a Shapefile into 2 different dataframes and combined them with
vcat
. The CSV file is from calculated the centroids for a polygon on a layer we have on ArcGIS, the shapefile contains points from another source:This does create a valid shapefile, but any columns that contain rows with items that are in or have Mandarin or Cyrillic script are shown as ????????? or "?????????" whenever I load the new combined shapefile with GeoDataFrames or into ArcGIS.
This is similar to writing to a CSV file, even with
options=Dict("bom"=>"true")
, in that Mandarin and Cyrillic script characters are mangled to seemingly Latin-1 characters:Is there an option I can pass to the driver for shapefiles, is there something I'm missing for both drivers, or is there something else I can do?
The text was updated successfully, but these errors were encountered: