-
Notifications
You must be signed in to change notification settings - Fork 6
CSV Files
The Comma-Separated Values format is one of the formats used by SemTK for both ingestion of data and for query responses.
This file format is defined by RFC 4180, and enjoys wide support in tools like Microsoft Excel and libraries like Python's csv package and Apache's commons-csv Java library.
While RFC 4180 describes the format in detail, there are some highlights to be aware of when reading or writing raw CSV files:
- Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
- If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
The RACK tools expect UTF-8 encoded content. The CSV files and JSON files it uses are all UTF-8 encoded.
In RACK v7 the CLI tool expected and produced UTF-8 without any leading signature. New in RACK v8 the CLI tool will tolerate (but not require) leading signatures as generated by some Microsoft software.
Copyright (c) 2021-2024, General Electric Company, Galois, Inc.
All Rights Reserved
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-20-C-0203.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA).
Distribution Statement "A" (Approved for Public Release, Distribution Unlimited)