Skip to content

Commit

Permalink
Merge pull request #8 from sandstrom/auto-detect
Browse files Browse the repository at this point in the history
Auto-formatting of numbers, dates and times in text
  • Loading branch information
felixbuenemann authored Sep 14, 2017
2 parents f586b38 + b3633f4 commit faa75c6
Show file tree
Hide file tree
Showing 8 changed files with 169 additions and 53 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Changelog

## 0.3.0 (2017-07-12)

- Add support for auto-formatting

## 0.2.0 (2017-02-20)

- Ruby 2.4 compatibility
- Misc bug fixes

## 0.1.0 (2015-10-17)

- Initial release
71 changes: 44 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Xlsxtream

Xlsxtream is a streaming writer for XLSX spreadsheets. It supports multiple worksheets and optional string deduplication via a shared string table (SST). Its purpose is to replace CSV for large exports, because using CSV in Excel is very buggy and error prone. It's very efficient and can quickly write millions of rows with low memory usage.
Xlsxtream is a streaming writer for XLSX spreadsheets. It supports multiple worksheets and optional string
deduplication via a shared string table (SST). Its purpose is to replace CSV for large exports, because using
CSV in Excel is very buggy and error prone. It's very efficient and can quickly write millions of rows with
low memory usage.

Xlsxtream does not support formatting, charts, comments and a myriad of other [OOXML](https://en.wikipedia.org/wiki/Office_Open_XML) features. If you are looking for a fully featured solution take a look at [axslx](https://github.com/randym/axlsx).
Xlsxtream does not support formatting, charts, comments and a myriad of
other [OOXML](https://en.wikipedia.org/wiki/Office_Open_XML) features. If you are looking for a
fully featured solution take a look at [axslx](https://github.com/randym/axlsx).

Xlsxtream supports writing to files or IO-like objects, data is flushed as the ZIP compressor sees fit.

Expand All @@ -25,38 +30,51 @@ Or install it yourself as:
## Usage

```ruby
# Creates a new workbook and closes it at the end of the block.
Xlsxtream::Workbook.open("foo.xlsx") do |xlsx|
xlsx.write_worksheet "Sheet1" do |sheet|
# Creates a new workbook and closes it at the end of the block
Xlsxtream::Workbook.open('my_data.xlsx') do |xlsx|
xlsx.write_worksheet 'Sheet1' do |sheet|
# Date, Time, DateTime and Numeric are properly mapped
sheet << [Date.today, "hello", "world", 42, 3.14159265359, 42**13]
sheet << [Date.today, 'hello', 'world', 42, 3.14159265359, 42**13]
end
end

io = StringIO.new('')
io = StringIO.new
xlsx = Xlsxtream::Workbook.new(io)
xlsx.write_worksheet "Sheet1" do |sheet|
# Number of columns doesn't have to match
sheet << %w[first row]
sheet << %w[second row with more colums]

# Number of columns doesn't have to match
xlsx.write_worksheet 'Sheet1' do |sheet|
sheet << ['first', 'row']
sheet << ['second', 'row', 'with', 'more colums']
end
# Write multiple worksheets with custom names:
xlsx.write_worksheet "Foo & Bar" do |sheet|
sheet.add_row ["Timestamp", "Comment"]
sheet.add_row [Time.now, "Foo"]
sheet.add_row [Time.now, "Bar"]

# Write multiple worksheets with custom names
xlsx.write_worksheet 'AppendixSheet' do |sheet|
sheet.add_row ['Timestamp', 'Comment']
sheet.add_row [Time.now, 'Good times']
sheet.add_row [Time.now, 'Time-machine']
end
# If you have highly repetitive data, you can enable Shared
# String Tables (SST) for the workbook or a single worksheet.
# The SST has to be kept in memory, so don't use it if you
# have a huge amount of rows or a little duplication of content
# accros cells. A single SST is used across the whole workbook.
xlsx.write_worksheet("SST", use_shared_strings: true) do |sheet|
sheet << %w[the same old story]
sheet << %w[the old same story]
sheet << %w[old, the same story]

# If you have highly repetitive data, you can enable Shared String Tables (SST)
# for the workbook or a single worksheet. The SST has to be kept in memory,
# so do not use it if you have a huge amount of rows or a little duplication
# of content across cells. A single SST is used for the whole workbook.
xlsx.write_worksheet('SheetWithSST', :use_shared_strings => true) do |sheet|
sheet << ['the', 'same', 'old', 'story']
sheet << ['the', 'old', 'same', 'story']
sheet << ['old', 'the', 'same', 'story']
end
# Writes metadata and ZIP archive central directory.

# Strings in numeric or date/time format can be auto-detected and formatted
# appropriately. This is a convenient way to avoid an Excel-warning about
# "Number stored as text". Dates and times must be in the ISO-8601 format and
# numeric values must contain only numbers and an optional decimal separator.
xlsx.write_worksheet('SheetWithAutoFormat', :auto_format => true) do |sheet|
# these two rows will be identical in the xlsx-output
sheet << [11.85, DateTime.parse('2050-01-01T12:00'), Date.parse('1984-01-01')]
sheet << ['11.85', '2050-01-01T12:00', '1984-01-01']
end

# Writes metadata and ZIP archive central directory
xlsx.close
```

Expand All @@ -74,4 +92,3 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/felixb
## License

The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

74 changes: 55 additions & 19 deletions lib/xlsxtream/row.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,46 +4,82 @@

module Xlsxtream
class Row
def initialize(row, rownum, sst = nil)

ENCODING = Encoding.find('UTF-8')

NUMBER_PATTERN = /\A-?[0-9]+(\.[0-9]+)?\z/.freeze
# ISO 8601 yyyy-mm-dd
DATE_PATTERN = /\A[0-9]{4}-[0-9]{2}-[0-9]{2}\z/.freeze
# ISO 8601 yyyy-mm-ddThh:mm:ss(.s)(Z|+hh:mm|-hh:mm)
TIME_PATTERN = /\A[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(?::[0-9]{2}(?:\.[0-9]{1,9})?)?(?:Z|[+-][0-9]{2}:[0-9]{2})?\z/.freeze

DATE_STYLE = 1
TIME_STYLE = 2

def initialize(row, rownum, options = {})
@row = row
@rownum = rownum
@sst = sst
@encoding = Encoding.find("UTF-8")
@sst = options[:sst]
@auto_format = options[:auto_format]
end

def to_xml
column = 'A'
@row.reduce(%'<row r="#@rownum">') do |xml, value|
cid = "#{column}#@rownum"
xml = %Q{<row r="#{@rownum}">}

@row.each do |value|
cid = "#{column}#{@rownum}"
column.next!
xml << case value

if @auto_format && value.is_a?(String)
value = auto_format(value)
end

case value
when Numeric
%'<c r="#{cid}" t="n"><v>#{value}</v></c>'
when DateTime, Time
%'<c r="#{cid}" s="2"><v>#{time_to_oa_date value}</v></c>'
xml << %Q{<c r="#{cid}" t="n"><v>#{value}</v></c>}
when Time, DateTime
xml << %Q{<c r="#{cid}" s="#{TIME_STYLE}"><v>#{time_to_oa_date(value)}</v></c>}
when Date
%'<c r="#{cid}" s="1"><v>#{time_to_oa_date value}</v></c>'
xml << %Q{<c r="#{cid}" s="#{DATE_STYLE}"><v>#{time_to_oa_date(value)}</v></c>}
else
value = value.to_s unless value.is_a? String
if value.empty?
''
else
value = value.encode(@encoding) if value.encoding != @encoding
value = value.to_s

unless value.empty? # no xml output for for empty strings
value = value.encode(ENCODING) if value.encoding != ENCODING

if @sst
%'<c r="#{cid}" t="s"><v>#{@sst[value]}</v></c>'
xml << %Q{<c r="#{cid}" t="s"><v>#{@sst[value]}</v></c>}
else
%'<c r="#{cid}" t="inlineStr"><is><t>#{XML.escape_value value}</t></is></c>'
xml << %Q{<c r="#{cid}" t="inlineStr"><is><t>#{XML.escape_value(value)}</t></is></c>}
end
end
end
end << '</row>'
end

xml << '</row>'
end

private

# Detects and casts numbers, date, time in text
def auto_format(value)
case value
when NUMBER_PATTERN
value.include?('.') ? value.to_f : value.to_i
when DATE_PATTERN
Date.parse(value)
when TIME_PATTERN
DateTime.parse(value)
else
value
end
end

# Converts Time objects to OLE Automation Date
def time_to_oa_date(time)
time = time.respond_to?(:to_time) ? time.to_time : time
time = time.to_time if time.respond_to?(:to_time)

# Local dates are stored as UTC by truncating the offset:
# 1970-01-01 00:00:00 +0200 => 1970-01-01 00:00:00 UTC
# This is done because SpreadsheetML is not timezone aware.
Expand Down
2 changes: 1 addition & 1 deletion lib/xlsxtream/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module Xlsxtream
VERSION = "0.2.0"
VERSION = "0.3.0"
end
7 changes: 6 additions & 1 deletion lib/xlsxtream/workbook.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,17 @@ def initialize(data = nil, options = {})

def write_worksheet(name = nil, options = {})
use_sst = options.fetch(:use_shared_strings, @options[:use_shared_strings])
auto_format = options.fetch(:auto_format, @options[:auto_format])
sst = use_sst ? @sst : nil

name ||= "Sheet#{@worksheets.size + 1}"
sheet_id = @worksheets[name]
@io.add_file "xl/worksheets/sheet#{sheet_id}.xml"
worksheet = Worksheet.new(@io, use_sst ? @sst : nil)

worksheet = Worksheet.new(@io, :sst => sst, :auto_format => auto_format)
yield worksheet if block_given?
worksheet.close

nil
end
alias_method :add_worksheet, :write_worksheet
Expand Down
7 changes: 4 additions & 3 deletions lib/xlsxtream/worksheet.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@

module Xlsxtream
class Worksheet
def initialize(io, sst = nil)
def initialize(io, options = {})
@io = io
@rownum = 1
@sst = sst
@options = options

write_header
end

def <<(row)
@io << Row.new(row, @rownum, @sst).to_xml
@io << Row.new(row, @rownum, @options).to_xml
@rownum += 1
end
alias_method :add_row, :<<
Expand Down
46 changes: 44 additions & 2 deletions test/xlsxtream/row_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,27 +31,69 @@ def test_integer_column
assert_equal expected, actual
end

def test_text_integer_column
row = Row.new(['1'], 1, :auto_format => true)
actual = row.to_xml
expected = '<row r="1"><c r="A1" t="n"><v>1</v></c></row>'
assert_equal expected, actual
end

def test_float_column
row = Row.new([1.5], 1)
actual = row.to_xml
expected = '<row r="1"><c r="A1" t="n"><v>1.5</v></c></row>'
assert_equal expected, actual
end

def test_date_column_oa_conversion
def test_text_float_column
row = Row.new(['1.5'], 1, :auto_format => true)
actual = row.to_xml
expected = '<row r="1"><c r="A1" t="n"><v>1.5</v></c></row>'
assert_equal expected, actual
end

def test_date_column
row = Row.new([Date.new(1900, 1, 1)], 1)
actual = row.to_xml
expected = '<row r="1"><c r="A1" s="1"><v>2.0</v></c></row>'
assert_equal expected, actual
end

def test_text_date_column
row = Row.new(['1900-01-01'], 1, :auto_format => true)
actual = row.to_xml
expected = '<row r="1"><c r="A1" s="1"><v>2.0</v></c></row>'
assert_equal expected, actual
end

def test_date_time_column
row = Row.new([DateTime.new(1900, 1, 1, 12, 0, 0, '+00:00')], 1)
actual = row.to_xml
expected = '<row r="1"><c r="A1" s="2"><v>2.5</v></c></row>'
assert_equal expected, actual
end

def test_text_date_time_column
candidates = [
'1900-01-01T12:00',
'1900-01-01T12:00Z',
'1900-01-01T12:00+00:00',
'1900-01-01T12:00:00+00:00',
'1900-01-01T12:00:00.000+00:00',
'1900-01-01T12:00:00.000000000Z'
]
candidates.each do |timestamp|
row = Row.new([timestamp], 1, :auto_format => true)
actual = row.to_xml
expected = '<row r="1"><c r="A1" s="2"><v>2.5</v></c></row>'
assert_equal expected, actual
end
row = Row.new(['1900-01-01T12'], 1, :auto_format => true)
actual = row.to_xml
expected = '<row r="1"><c r="A1" s="2"><v>2.5</v></c></row>'
refute_equal expected, actual
end

def test_time_column
row = Row.new([Time.new(1900, 1, 1, 12, 0, 0, '+00:00')], 1)
actual = row.to_xml
Expand All @@ -61,7 +103,7 @@ def test_time_column

def test_string_column_with_shared_string_table
mock_sst = { 'hello' => 0 }
row = Row.new(['hello'], 1, mock_sst)
row = Row.new(['hello'], 1, :sst => mock_sst)
expected = '<row r="1"><c r="A1" t="s"><v>0</v></c></row>'
actual = row.to_xml
assert_equal expected, actual
Expand Down
1 change: 1 addition & 0 deletions xlsxtream.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Gem::Specification.new do |spec|
spec.bindir = "exe"
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
spec.require_paths = ["lib"]
spec.required_ruby_version = '>= 1.9.1'

spec.add_dependency "rubyzip", ">= 1.0.0"

Expand Down

0 comments on commit faa75c6

Please sign in to comment.