Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent Corruption of UT8-Encoded Files missidentified as UTF8-BOM #84973

Closed
MatthiasWinkelmann opened this issue Nov 16, 2019 · 1 comment
Closed
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug verified Verification succeeded
Milestone

Comments

@MatthiasWinkelmann
Copy link

MatthiasWinkelmann commented Nov 16, 2019

Issue Type: Bug

Opening an UTF-8 encoded file with the following content (3 lines, dashed lines not part of the file):

-----------
({ä üu, öo})
({ä üu, öo})

----------

The file encoding is guessed as "UTF8 (BOM)", even though the first three bytes do not include the BOM sequence. On saving, the first few bytes of the file are changed by prepending the BOM (\uFEFF):

pry> puts Pathname("before").read.dump
"({\u00E4 \u00FCu, \u00F6o})\n({\u00E4 \u00FCu, \u00F6o})\n"

pry> puts Pathname("after").read.dump
"\uFEFF({\u00E4 \u00FCu, \u00F6o})\n({\u00E4 \u00FCu, \u00F6o})\n"
 ^^^^^^^

The resulting file appears identical, but (at least in my case) leads to errors downstream. The Crystal compiler, for example, will complain about an "unknown function 'require' in line 1".

Possibly related to #84495 #84504, tagging @bpasero

VS Code version: Code - Insiders 1.41.0-insider (bf7d03b, 2019-11-15T05:37:27.862Z)
OS version: Darwin x64 18.2.0

System Info
Item Value
CPUs Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz (8 x 4009)
GPU Status 2d_canvas: enabled
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
metal: disabled_off
multiple_raster_threads: enabled_on
oop_rasterization: unavailable_off
protected_video_decode: unavailable_off
rasterization: unavailable_off
skia_renderer: disabled_off
surface_control: disabled_off
surface_synchronization: enabled_on
video_decode: enabled
viz_display_compositor: enabled_on
viz_hit_test_surface_layer: disabled_off
webgl: enabled
webgl2: enabled
Load (avg) 2, 2, 2
Memory (System) 32.00GB (0.14GB free)
Process Argv --disable-extensions o.cr
Screen Reader no
VM 0%
Extensions disabled
@bpasero bpasero added bug Issue identified by VS Code Team member as probable bug file-guess-encoding labels Nov 19, 2019
@bpasero bpasero added this to the November 2019 milestone Nov 19, 2019
@bpasero
Copy link
Member

bpasero commented Nov 19, 2019

Great catch 👍

@jrieken jrieken added the verified Verification succeeded label Dec 5, 2019
@vscodebot vscodebot bot locked and limited conversation to collaborators Jan 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

3 participants