Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SJIS, EUC-JP, JISに変換できないときに該当の文字を無視するオプションを追加する #41

Merged
merged 4 commits into from
Jun 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Convert and detect character encoding in JavaScript.
+ [Specify conversion options to the argument `to` as an object](#specify-conversion-options-to-the-argument-to-as-an-object)
+ [Specify the return type by the `type` option](#specify-the-return-type-by-the-type-option)
+ [Replacing characters with HTML entities when they cannot be represented](#replacing-characters-with-html-entities-when-they-cannot-be-represented)
+ [Ignoring characters when they cannot be represented](#ignoring-characters-when-they-cannot-be-represented)
+ [Specify BOM in UTF-16](#specify-bom-in-utf-16)
* [urlEncode : Encodes to percent-encoded string](#encodingurlencode-data)
* [urlDecode : Decodes from percent-encoded string](#encodingurldecode-string)
Expand Down Expand Up @@ -405,6 +406,30 @@ const sjisArray = Encoding.convert(unicodeArray, {
console.log(sjisArray); // Converted to a code array of 'ホッケの漢字は𩸽'
```

#### Ignoring characters when they cannot be represented

By specifying `ignore` as a `fallback` option, characters that cannot be represented in the target encoding format can be ignored.

Example of specifying `{ fallback: 'ignore' }` option:

```javascript
const unicodeArray = Encoding.stringToCode("寿司🍣ビール🍺");
// No fallback specified
let sjisArray = Encoding.convert(unicodeArray, {
to: "SJIS",
from: "UNICODE",
});
console.log(sjisArray); // Converted to a code array of '寿司?ビール?'

// Specify `fallback: html-entity`
sjisArray = Encoding.convert(unicodeArray, {
to: "SJIS",
from: "UNICODE",
fallback: "ignore",
});
console.log(sjisArray); // Converted to a code array of '寿司ビール'
```

#### Specify BOM in UTF-16

You can add a BOM (byte order mark) by specifying the `bom` option when converting to `UTF16`.
Expand Down
25 changes: 25 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ JavaScript で文字コードの変換や判定をします。
+ [引数 `to` にオブジェクトで変換オプションを指定する](#引数-to-にオブジェクトで変換オプションを指定する)
+ [`type` オプションで戻り値の型を指定する](#type-オプションで戻り値の型を指定する)
+ [変換できない文字を HTML エンティティ (HTML 数値文字参照) に置き換える](#変換できない文字を-html-エンティティ-html-数値文字参照-に置き換える)
+ [変換できない文字を無視する](#変換できない文字を無視する)
+ [UTF-16 に BOM をつける](#utf-16-に-bom-をつける)
* [urlEncode : 文字コードの配列をURLエンコードする](#encodingurlencode-data)
* [urlDecode : 文字コードの配列にURLデコードする](#encodingurldecode-string)
Expand Down Expand Up @@ -395,6 +396,30 @@ const sjisArray = Encoding.convert(unicodeArray, {
console.log(sjisArray); // 'ホッケの漢字は𩸽' の数値配列に変換されます
```

#### 変換できない文字を無視する

変換先の文字コードで表現できない文字を無視するには、 `fallback` オプションに `ignore` を指定します。

`{ fallback: 'ignore' }` オプションを指定する例:

```javascript
const unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺');
// fallback指定なし
let sjisArray = Encoding.convert(unicodeArray, {
to: 'SJIS',
from: 'UNICODE'
});
console.log(sjisArray); // '寿司?ビール?' の数値配列に変換されます

// `fallback: ignore`を指定
sjisArray = Encoding.convert(unicodeArray, {
to: 'SJIS',
from: 'UNICODE',
fallback: 'ignore'
});
console.log(sjisArray); // '寿司ビール' の数値配列に変換されます
```

#### UTF-16 に BOM をつける

`UTF16` に変換する際に `bom` オプションを指定すると BOM (byte order mark) の付加を指定できます。
Expand Down
3 changes: 3 additions & 0 deletions encoding.js
Original file line number Diff line number Diff line change
Expand Up @@ -1824,6 +1824,9 @@ function handleFallback(results, bytes, fallbackOption) {
}
results[results.length] = 0x3B; // ;
}
break;
case 'ignore':
break;
}
}

Expand Down
3 changes: 3 additions & 0 deletions src/encoding-convert.js
Original file line number Diff line number Diff line change
Expand Up @@ -1672,5 +1672,8 @@ function handleFallback(results, bytes, fallbackOption) {
}
results[results.length] = 0x3B; // ;
}
break;
case 'ignore':
break;
}
}
44 changes: 44 additions & 0 deletions tests/test.js
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,50 @@ describe('encoding', function() {
assert.deepEqual(decoded, '🍣寿司ビール🍺');
});
});

describe('Ignore untranslatable unknown characters', function() {
it('SJIS', function() {
// Characters that cannot be converted to Shift_JIS ('🍣', '🍺') will be ignored.
var sjis = encoding.convert(utf8, {
to: 'sjis',
from: 'utf-8',
fallback: 'ignore'
});
var decoded = encoding.convert(sjis, {
to: 'unicode',
from: 'sjis'
});
assert.deepEqual(decoded, '寿司ビール');
});

it('EUC-JP', function() {
// Characters that cannot be converted to EUC-JP ('🍣', '🍺') will be ignored.
var eucjp = encoding.convert(utf8, {
to: 'euc-jp',
from: 'utf-8',
fallback: 'ignore'
});
var decoded = encoding.convert(eucjp, {
to: 'unicode',
from: 'euc-jp'
});
assert.deepEqual(decoded, '寿司ビール');
});

it('JIS', function() {
// Characters that cannot be converted to JIS ('🍣', '🍺') will be ignored.
var jis = encoding.convert(utf8, {
to: 'jis',
from: 'utf-8',
fallback: 'ignore'
});
var decoded = encoding.convert(jis, {
to: 'unicode',
from: 'jis'
});
assert.deepEqual(decoded, '寿司ビール');
});
});
});
});

Expand Down