Skip to content

Commit

Permalink
schema: Specify the encoding for character offsets
Browse files Browse the repository at this point in the history
  • Loading branch information
varungandhi-src committed Dec 12, 2023
1 parent beb6593 commit 2cd58f9
Show file tree
Hide file tree
Showing 7 changed files with 3,323 additions and 2,775 deletions.
906 changes: 504 additions & 402 deletions bindings/go/scip/scip.pb.go

Large diffs are not rendered by default.

2,292 changes: 1,250 additions & 1,042 deletions bindings/haskell/src/Proto/Scip.hs

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions bindings/haskell/src/Proto/Scip_Fields.hs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2,777 changes: 1,453 additions & 1,324 deletions bindings/rust/src/generated/scip.rs

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions bindings/typescript/scip.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

53 changes: 47 additions & 6 deletions docs/scip.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 34 additions & 1 deletion scip.proto
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ message Metadata {
// directory.
string project_root = 3;
// Text encoding of the source files on disk that are referenced from
// `Document.relative_path`.
// `Document.relative_path`. This value is unrelated to the `Document.text`
// field, which is a Protobuf string and hence must be UTF-8 encoded.
TextEncoding text_document_encoding = 4;
}

Expand Down Expand Up @@ -102,8 +103,37 @@ message Document {
// can be used for other purposes as well, for example testing or when working
// with virtual/in-memory documents.
string text = 5;

// Specifies the encoding used for source ranges in this Document.
PositionEncoding position_encoding = 6;
}

// Encoding used to interpret the 'character' value in source ranges.
enum PositionEncoding {
// Default value. This value should not be used by new SCIP indexers
// so that a consumer can process the SCIP index without ambiguity.
UnspecifiedPositionEncoding = 0;
// The 'character' value is interpreted as a byte offset,
// assuming that the text for the line is encoded as UTF-8.
//
// Example: For the string "πŸš€ Woo" in UTF-8, the bytes are
// [240, 159, 154, 128, 32, 87, 111, 111], so the offset for 'W'
// would be 5.
UTF8ByteOffsetFromLineStart = 1;
// The 'character' value is interpreted as an offset in terms
// of UTF-8 code units.
//
// Example: For the string "πŸš€ Woo", the UTF-8 code units are
// ['πŸš€', ' ', 'W', 'o', 'o'], so the offset for 'W' would be 2.
UTF8CodeUnitOffsetFromLineStart = 2;
// The 'character' value is interpreted as an offset in terms
// of UTF-16 code units.
//
// Example: For the string "πŸš€ Woo", the UTF-16 code units are
// ['\ud83d', '\ude80', ' ', 'W', 'o', 'o'], so the offset for 'W'
// would be 3.
UTF16CodeUnitOffsetFromLineStart = 3;
}

// Symbol is similar to a URI, it identifies a class, method, or a local
// variable. `SymbolInformation` contains rich metadata about symbols such as
Expand Down Expand Up @@ -594,6 +624,9 @@ message Occurrence {
// line/character values before displaying them in an editor-like UI because
// editors conventionally use 1-based numbers.
//
// The 'character' value is interpreted based on the PositionEncoding for
// the Document.
//
// Historical note: the original draft of this schema had a `Range` message
// type with `start` and `end` fields of type `Position`, mirroring LSP.
// Benchmarks revealed that this encoding was inefficient and that we could
Expand Down

0 comments on commit 2cd58f9

Please sign in to comment.