-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HexFormat proposal #361
HexFormat proposal #361
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,320 @@ | ||
# HexFormat | ||
|
||
* **Type**: Standard Library API proposal | ||
* **Author**: Abduqodiri Qurbonzoda | ||
* **Status**: Implemented in Kotlin 1.9.0 | ||
* **Prototype**: Implemented | ||
* **Target issue**: [KT-57762](https://youtrack.jetbrains.com/issue/KT-57762/) | ||
* **Discussion**: TBD | ||
|
||
## Summary | ||
|
||
Convenient API for formatting binary data into hexadecimal string form and parsing back. | ||
|
||
## Motivation | ||
|
||
Our research has shown that hexadecimal representation is more widely used than other numeric bases, | ||
second only to decimal representation. There are some fundamental reasons for the hex popularity: | ||
* Hexadecimal representation is more human-readable and understandable when it comes to bits. | ||
Each digit in the hex system represents exactly four bits of data, | ||
making the mapping of a hex digit to its corresponding nibble straightforward. | ||
* Hex representation is more compact than the decimal format and consumes a predictable number of characters. | ||
* The implementation of a hex encoder/decoder is relatively simple and fast. | ||
|
||
By providing a convenient API for common use cases described below, we aim to make coding in Kotlin easier and more enjoyable. | ||
|
||
## Use cases | ||
|
||
### Logging and debugging | ||
|
||
The readability of the format makes it very appealing for logging and debugging. | ||
The value that is converted to hex for logging is usually less informative itself than its binary representation, | ||
e.g., when the value has some particular bit pattern. Another popular use case is printing bytes in some | ||
[hex dump](https://en.wikipedia.org/wiki/Hex_dump) format, split into lines and groups. | ||
|
||
### Storing or transmitting binary data in text-only formats | ||
|
||
Sometimes binary data needs to be embedded into text-only formats such as URL, XML, or JSON. | ||
Our research indicates that in this use case, hex encoding is among the most frequently used encodings, | ||
especially when encoding primitive values such as `Int` and `Long`. | ||
|
||
### Protocol requirements | ||
|
||
The following popular protocols require hex format: | ||
* When generating or parsing HTML code, one might need to work with the hex representation of RGB color codes. | ||
e.g., `<div style="background-color:#ff6347;">...</div>` | ||
* To express Unicode code points in HTML or XML. | ||
e.g., `<message>It's 🌧 outside, be sure to grab ☂</message>` | ||
* The framework used in your project might require specifying IP or MAC addresses in a certain hex format. | ||
e.g., `"00:1b:63:84:45:e6"` or `"001B.6384.45E6"` | ||
|
||
## Similar API review | ||
|
||
* Java [`HexFormat`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) class. | ||
* Python [binascii](https://docs.python.org/3/library/binascii.html) module. Also, | ||
`hex` and `fromhex` functions on [bytes objects](https://docs.python.org/3/library/stdtypes.html#bytes-objects). | ||
|
||
## Proposal | ||
|
||
Considering the use cases mentioned above it is proposed to have the following format options. | ||
|
||
For formatting a numeric value: | ||
* Whether upper case or lower case hexadecimal digits should be used | ||
* The prefix of the hex representation | ||
* The suffix of the hex representation | ||
* Whether to remove leading zeros in the hex representation | ||
|
||
For formatting `ByteArray`: | ||
* Whether upper case or lower case hexadecimal digits should be used | ||
* The number of bytes per line | ||
* The number of bytes per group | ||
* The string used to separate groups in a line | ||
* The string used to separate bytes in a group | ||
* The prefix of a byte hex representation | ||
* The suffix of a byte hex representation | ||
|
||
### Creating a format | ||
|
||
It is proposed to introduce an immutable `HexFormat` class that holds the options. | ||
`Builder` is used to configure a format. Each option in the builder has a default value that can be customized. | ||
All related types are nested inside `HexFormat` to reduce the top-level surface area of the API: | ||
``` | ||
public class HexFormat internal constructor( | ||
val upperCase: Boolean, | ||
val bytes: BytesHexFormat, | ||
val number: NumberHexFormat | ||
) { | ||
|
||
public class Builder internal constructor() { | ||
var upperCase: Boolean = false | ||
val bytes: BytesHexFormat.Builder = BytesHexFormat.Builder() | ||
val number: NumberHexFormat.Builder = NumberHexFormat.Builder() | ||
|
||
inline fun bytes(builderAction: BytesHexFormat.Builder.() -> Unit) | ||
inline fun number(builderAction: NumberHexFormat.Builder.() -> Unit) | ||
} | ||
|
||
public class BytesHexFormat internal constructor( | ||
val bytesPerLine: Int, | ||
val bytesPerGroup: Int, | ||
val groupSeparator: String, | ||
val byteSeparator: String, | ||
val bytePrefix: String, | ||
val byteSuffix: String | ||
) { | ||
|
||
public class Builder internal constructor() { | ||
var bytesPerLine: Int = Int.MAX_VALUE | ||
var bytesPerGroup: Int = Int.MAX_VALUE | ||
var groupSeparator: String = " " | ||
var byteSeparator: String = "" | ||
var bytePrefix: String = "" | ||
var byteSuffix: String = "" | ||
} | ||
} | ||
|
||
public class NumberHexFormat internal constructor( | ||
val prefix: String, | ||
val suffix: String, | ||
val removeLeadingZeros: Boolean | ||
) { | ||
|
||
public class Builder internal constructor() { | ||
var prefix: String = "" | ||
var suffix: String = "" | ||
var removeLeadingZeros: Boolean = false | ||
} | ||
} | ||
} | ||
``` | ||
|
||
`BytesHexFormat` and `NumberHexFormat` classes hold format options for `ByteArray` and numeric values, correspondingly. | ||
`upperCase` option, which is common to both `ByteArray` and numeric values, is stored in `HexFormat`. | ||
|
||
It's not possible to instantiate a `HexFormat` or its builder directly. The following function is provided instead: | ||
``` | ||
public inline fun HexFormat(builderAction: HexFormat.Builder.() -> Unit): HexFormat | ||
``` | ||
|
||
### Formatting | ||
|
||
For formatting, the following extension functions are proposed: | ||
``` | ||
// Formats the byte array using HexFormat.upperCase and HexFormat.bytes | ||
public fun ByteArray.toHexString(format: HexFormat = HexFormat.Default): String | ||
|
||
public fun ByteArray.toHexString( | ||
startIndex: Int = 0, | ||
endIndex: Int = size, | ||
format: HexFormat = HexFormat.Default | ||
): String | ||
|
||
// Formats the numeric value using HexFormat.upperCase and HexFormat.number | ||
// N is Byte, Short, Int, Long, and their unsigned counterparts | ||
public fun N.toHexString(format: HexFormat = HexFormat.Default): String | ||
``` | ||
|
||
### Parsing | ||
|
||
It is critical to be able to parse the results of the formatting functions above. | ||
For parsing, the following extension functions are proposed: | ||
``` | ||
// Parses a byte array | ||
public fun String.hexToByteArray(format: HexFormat = HexFormat.Default): ByteArray | ||
|
||
// Parses a numeric value | ||
// N is Byte, Short, Int, Long, and their unsigned counterparts | ||
public fun String.hexToN(format: HexFormat = HexFormat.Default): String | ||
``` | ||
|
||
## Contracts | ||
|
||
* When formatting a `ByteArray`, the LF character is used to separate lines. | ||
* When parsing a `ByteArray`, any of the char sequences CRLF (`"\r\n"`), LF (`"\n"`) and CR (`"\r"`) are considered a valid line separator. | ||
* Parsing is performed in a case-insensitive manner. | ||
* `NumberHexFormat.removeLeadingZeros` is ignored when parsing. | ||
* Assigning a non-positive value to `BytesHexFormat.Builder.bytesPerLine/bytesPerGroup` is prohibited. | ||
In this case `IllegalArgumentException` is thrown. | ||
* Assigning a string containing LF or CR character to `BytesHexFormat.Builder.byteSeparator/bytePrefix/byteSuffix` | ||
and `NumberHexFormat.Builder.prefix/suffix` is prohibited. In this case `IllegalArgumentException` is thrown. | ||
|
||
### Examples | ||
|
||
``` | ||
// Parsing an Int | ||
"3A".hexToInt() // 58 | ||
// Formatting an Int | ||
93.toHexString() // "0000005d" | ||
|
||
// Parsing a ByteArray | ||
val macAddress = "001b638445e6".hexToByteArray() | ||
|
||
// Formatting a ByteArray | ||
macAddress.toHexString(HexFormat { bytes.byteSeparator = ":" }) // "00:1b:63:84:45:e6" | ||
|
||
// Defining a format and assigning it to a variable | ||
val threeGroupFormat = HexFormat { upperCase = true; bytes.bytesPerGroup = 2; bytes.groupSeparator = "." } | ||
// Formatting a ByteArray using a previously defined format | ||
macAddress.toHexString(threeGroupFormat) // "001B.6384.45E6" | ||
``` | ||
|
||
## Alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another option is a third-party implementation. Logging, for example, is only available this way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also please provide a link to the library you are referring to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
### For numeric values | ||
|
||
The Kotlin standard library provides `Primitive.toString(radix = 16)` for converting primitive values | ||
to their hex representation. However, this function focuses on converting the values, not bits. As a result: | ||
* Negative values are formatted with minus sign. | ||
One needs to convert values of signed types to corresponding unsigned types before converting to hex representation. | ||
* Leading zero nibbles are ignored. To get the full length one must additionally `padStart` the result with `'0'`. | ||
* Related complaint: [KT-60782](https://youtrack.jetbrains.com/issue/KT-60782) | ||
|
||
There is also `String.toPrimitive(radix = 16)` for parsing back a primitive value. | ||
But this function throws if the primitive type can't have the resulting value, even if the bits fit. | ||
e.g., `"FF".toByte()` fails. To prevent this, the string must first be converted to the corresponding unsigned type. | ||
|
||
### For `ByteArray` | ||
|
||
`ByteArray.joinToString(separator) { byte -> byte.toString(radix = 16) }` can be used to format a ByteArray. | ||
Downsides are: | ||
* Not possible to separate bytes into groups and lines | ||
* Challenges with formatting `Byte` to hex described above | ||
|
||
There is no API for parsing `ByteArray` currently. | ||
|
||
## Naming | ||
|
||
### Existing functions for converting to String | ||
|
||
For ByteArray: | ||
* `contentToString` | ||
* `encodeToByteArray`/`decodeToString` | ||
* `joinToString` | ||
|
||
For primitive types: | ||
* `toString(radix)` | ||
* `Char.digitToInt()` | ||
* `Int.digitToInt()` | ||
|
||
### Naming options | ||
|
||
As listed above, existing functions with similar purpose use `toString` suffix when converting to `String`, | ||
and `toType` when converting from `String` to another type. Thus, options with similar naming schemes were considered: | ||
* **Proposed:** `toHexString` and `hexToType` for formatting and parsing, correspondingly | ||
* "hex" used as an adjective | ||
* `hexToString` or `hexifyToString` for formatting | ||
* "hex" used as a verb | ||
* A similar verb is needed to describe the parsing of a hex-formatted string | ||
* Use `format` and `parse` verbs, e.g., `formatToHexString` and `parseHexToByteArray` | ||
* `To` already indicates that the function converts the receiver | ||
|
||
## API design approaches | ||
|
||
* **Proposed:** Provide formatting and parsing functions as extensions on the type to be converted | ||
* Pro: Discoverable | ||
* Users already know and use the `toString` family of extension functions. | ||
When typing "toString", code completion displays the hex conversion functions as well. | ||
This can also prompt users to wonder how `toString(radix = 16)` differs from `toHexString()`, | ||
and help to choose the proper one. | ||
* Typing ".hex" is enough for code completion to display the hex conversion function for the receiver. | ||
No need to remember the exact function name. | ||
* Pro: Allows chaining with other calls | ||
* Con: May pollute code completion for `String` receiver | ||
* Provide all formatting and parsing functions on `HexFormat`, similar to Java `HexFormat` and Kotlin `Base64` | ||
* Pro: Gathers all related functions under a single type | ||
* Con: Less discoverable than the proposed approach. Users need to remember that there is `HexFormat` class. | ||
* Con: Requires `let` or `run` [scope function](https://kotlinlang.org/docs/scope-functions.html) for chaining with other calls | ||
* Have `BytesHexFormat` and `NumberHexFormat` as top-level classes, each with its own `upperCase` property. | ||
No need for `HexFormat` class. Functions for formatting/parsing `ByteArray` take `BytesHexFormat`, | ||
while functions for numeric types take `NumberHexFormat`. e.g., | ||
``` | ||
byteArray.toHexString( | ||
BytesHexFormat { byteSeparator = " "; bytesPerLine = 16 } | ||
) | ||
``` | ||
* Pro: Eliminates possible confusion about what options affect formatting | ||
* Con: Two variables are needed to store preferred format options | ||
* `Builder` overrides a provided format, | ||
e.g., `HexFormat(MY_HEX_FORMAT) { bytes.bytesPerLine = ":" }` | ||
* Not so many use cases for altering an existing format | ||
* Can be added as an overload of `fun HexFormat()` | ||
* Pass options to formatting and parsing functions directly, without introducing `HexFormat` | ||
* Not convenient in cases when a format is defined once and used in multiple occasions | ||
* Adding new options in the future is problematic | ||
* There is no way in Kotlin to require calling a function with named arguments. | ||
Passing multiple arguments without specifying names damages code readability, | ||
e.g., `bitMask.toHexString(true, "0x", false)` | ||
|
||
## Dependencies | ||
|
||
Only a subset of Kotlin Standard Library available on all supported platforms is required. | ||
|
||
## Placement | ||
|
||
* Standard Library | ||
* `kotlin.text` package | ||
|
||
## Reference implementation | ||
|
||
* HexFormat class: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexFormat.kt | ||
* Extensions for formatting and parsing: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexExtensions.kt | ||
* Test cases for formatting and parsing `ByteArray`: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/BytesHexFormatTest.kt | ||
* Test cases for formatting and parsing numeric values: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/NumberHexFormatTest.kt | ||
|
||
## Future advancements | ||
|
||
* Adding the ability to limit the number of hex digits when formatting numeric values | ||
* `NumberHexFormat.maxLength` could be introduced | ||
* When formatting an `Int`, combination of `maxLength = 6` and `removeLeadingZeros = false` results to exactly 6 least significant hex digits | ||
* Combination of `maxLength = 6` and `removeLeadingZeros = true` returns at most 6 hex (least-significant) digits without leading zeros | ||
* Related request: [KT-60787](https://youtrack.jetbrains.com/issue/KT-60787) | ||
* Overloads for parsing a substring: [KT-58277](https://youtrack.jetbrains.com/issue/KT-58277) | ||
* Overloads for appending format result to an `Appendable` | ||
* `toHexString` might need to be renamed to `hexToString/Appendable` or `hexifyToString/Appendable`, because | ||
`Int.toHexString(stringBuilder)` isn't intuitive to infer that the result is appended to the provided `StringBuilder` | ||
* Formatting and parsing I/O streams in Kotlin/JVM | ||
* Similar to [`InputStream.decodingWith(Base64)`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io.encoding/java.io.-input-stream/decoding-with.html) | ||
and [`OutputStream.encodingWith(Base64)`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io.encoding/java.io.-output-stream/encoding-with.html) | ||
* Formatting and parsing a `Char` | ||
* Although `Char` is not a numeric type, it has a `Char.code` associated with it. | ||
With the proposed API formatting a `Char` won't be an easy task: `Char.code.toShort().toHexString()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Citation needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any public document with the results. But search in grep.app finds much more usages of
toString(16)
than any other base.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably
.toString( /* 10 */ )
is more common, should we also consider introducingDecFormat
?I'm not against this proposal; it seems strange to tackle hexadecimal number formatting before decimal ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decimal representation of numeric values is indeed more common, and this document explicitly agrees with the proposition. However, this proposal is about introducing an API to facilitate formatting (and parsing) the hex representation of binary data. It provides use cases and proposes an API to make it easy to handle those use cases.
Of course, there are other potentially more impactful features. They deserve separate proposals.
While
HexFormat
is not the most important or useful feature out there, it does address the pain points outlined in this document.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Discussion of the proposal has been moved here: #362
Please don't hesitate to express your concerns there.