Skip to content

Latest commit

 

History

History
257 lines (187 loc) · 7.79 KB

README.md

File metadata and controls

257 lines (187 loc) · 7.79 KB

XML::Mixup: A mixin for XML markup

require 'xml-mixup'

class Anything
  include XML::Mixup
end

something = Anything.new

# generate a structure
node = something.markup spec: [
  { '#pi'   => 'xml-stylesheet', type: 'text/xsl', href: '/transform' },
  { '#dtd'  => :html },
  { '#html' => [
    { '#head' => [
      { '#title' => 'look ma, title' },
      { '#elem'  => :base, href: 'http://the.base/url' },
    ] },
    { '#body' => [
      { '#h1' => 'Illustrious Heading' },
      { '#p'  => :lolwut },
    ] },
  ], xmlns: 'http://www.w3.org/1999/xhtml' }
]

# `node` will correspond to the last thing generated. In this
# case, it will be a text node containing 'lolwut'.

doc = node.document
puts doc.to_xml
# => <?xml version="1.0"?>
# => <?xml-stylesheet href="/transform" type="text/xsl"?>
# => <!DOCTYPE html>
# => <html xmlns="http://www.w3.org/1999/xhtml">
# =>   <head>
# =>     <title>look ma, title</title>
# =>     <base href="http://the.base/url"/>
# =>   </head>
# =>   <body>
# =>     <h1>Illustrious Heading</h1>
# =>     <p>lolwut</p>
# =>   </body>
# => </html>

Yet another XML markup generator?

Some time ago, I wrote a Perl module called Role::Markup::XML. I did this because I had a lot of XML to generate, and was dissatisfied with what was currently on offer. Now I have a lot of XML to generate using Ruby, and found a lot of the same things:

Structure is generated by procedure calls

Granted it's a lot nicer to do this sort of thing in Ruby, but at the end of the day, the thing generating the XML is a nested list of method calls — not a declarative data structure.

Document has to be generated all in one shot

It's not super-easy to generate a piece of the target document and then go back and generate some more (although Nokogiri::XML::Builder.with is a nice start). This plus the last point leads to all sorts of cockamamy constructs which are almost as life-sucking as writing raw DOM routines.

Hard to do surgery on existing documents

This comes up a lot: you have an existing document and you want to add even just a single node to it — say, in between two nodes just for fun. Good luck with that.

Enter XML::Mixup

  • The input consists of ordinary Ruby data objects so you can build them up ahead of time, in bulk, transform them using familiar operations, etc.,
  • Sprinkle pre-built XML subtrees anywhere into the spec so you can memoize repeating elements, or otherwise compile a document incrementally,
  • Attach new generated content anywhere: underneath a parent node, or before, after, or instead of a node at the sibling level.

The tree spec format

At the heart of this module is a single method called markup, which, among other things, takes a :spec. The spec can be any composite of these objects, and will behave as described:

Hashes

The principal construct in XML::Mixup is the Hash. You can generate pretty much any node with it:

Elements

{ '#tag' => 'foo' }                 # => <foo/>

# or, with the element name as a symbol
{ '#element' => :foo }              # => <foo/>

# or
{ '#elem' => 'foo' }                # => <foo/>

# or, with nil as a key
{ nil => :foo }                     # => <foo/>

# or, with attributes
{ nil => :foo, bar: :hi }           # => <foo bar="hi"/>

# or, with namespaces
{ nil => :foo, xmlns: 'urn:x-bar' } # => <foo xmlns="urn:x-bar"/>

# or, with more namespaces
{ nil => :foo, xmlns: 'urn:x-bar', 'xmlns:hurr' => 'urn:x-durr' }
# => <foo xmlns="urn:x-bar" xmlns:hurr="urn:x-durr"/>

# or, with content
{ nil => [:foo, :hi] }              # => <foo>hi</foo>

# or, shove your child nodes into an otherwise content-less key
{ [:hi] => :foo, bar: :hurr }       # => <foo bar="hurr">hi</foo>

# or, if you have content and the element name is not a reserved word
{ '#html' => { '#head' => { '#title' => :hi } } }
# => <html><head><title>hi</title></head></html>

# also works with namespaces
{ '#atom:feed' => nil, 'xmlns:atom' => 'http://www.w3.org/2005/Atom' }
# => <atom:feed xmlns:atom="http://www.w3.org/2005/Atom"/>

Reserved hash keywords are: #comment, #cdata, #doctype, #dtd, #elem, #element, #pi, #processing-instruction, #tag. Note that the constructs { nil => :foo }, { nil => 'foo' }, and { '#foo' => nil }, plus [] anywhere you see nil, are all equivalent.

Attributes are sorted lexically. Composite attribute values get flattened like this:

{ nil => :foo, array: [:a, :b], hash: { e: :f, c: :d } }
# => <foo array="a b" hash="c: d e: f"/>

Note that attribute values can also be a Proc, which are fed arbitrary arguments from the markup method. The Proc is expected to return something which can subsequently flattened. If an attribute value is nil or ultimately resolves to nil, or an empty Array or Hash, that attribute will be omitted. nil values in arrays or hashes will also be skipped, as will empty-string values in arrays. This is different behaviour from versions prior to 0.1.10, where nil (or, e.g., []) would produce an attribute containing the empty string.

This change was made to eliminate a lot of clunky logic in application code to determine whether or not to include a given attribute. If you need to render attributes explicitly with empty strings, then explicitly pass in the empty string.

Processing instructions

{ '#pi' => 'xml-stylesheet', type: 'text/xsl', href: '/transform' }
# => <?xml-stylesheet type="text/xsl" href="/transform"?>

# or, if you like typing
{ '#processing-instruction' => :hurr } # => <?hurr?>

DOCTYPE declarations

{ '#dtd' => :html } # => <!DOCTYPE html>

# or (note either :public or :system can be nil)
{ '#dtd' => [:html, :public, :system] }
# => <!DOCTYPE html PUBLIC "public" SYSTEM "system">

# or, same thing
{ '#doctype' => :html, public: :public, system: :system }

Comments and CDATA sections

Comments and CDATA are flattened into string literals:

{ '#comment' => :whatever }     # => <!-- whatever -->

{ '#cdata' => '<what-everrr>' } # => <![CDATA[<what-everrr>]]>

Pretty straight forward?

Arrays

Parts of a spec that are arrays (or really anything that can be turned into one) are attached at the same level of the document in the sequence given, as you might expect.

Nokogiri::XML::Node objects

These are automatically cloned, but otherwise passed in as-is.

Procs, lambdas etc.

These are executed with any supplied :args, and then markup is run again over the result. (Take care not to supply a Proc that produces another Proc.)

Everything else

Turned into a text node.

Documentation

Generated and deposited in the usual place.

Installation

Come on, you know how to do this:

$ gem install xml-mixup

Or, download it off rubygems.org.

Contributing

Bug reports and pull requests are welcome at the GitHub repository.

The Future

As mentioned, this is pretty much a straight-across port of Role::Markup::XML, where it makes sense in Perl to bolt a bunch of related pseudo-private _FOO-looking instance methods onto an object so you can use them to make more streamlined methods. This may or may not make the same kind of sense with Ruby.

In particular, these methods do not touch the calling object's state. In fact they should be completely stateless and side-effect free. Likewise, they are really meant to be private. As such, it may make sense to simply bundle them as class methods and use them as such. I don't know, I haven't decided yet.

License

This software is provided under the Apache License, 2.0.