You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Often I find myself wanting more details about the individual cells than just their values.
e.g.
Some HTML cells contain more than just a single value; and this can require additional parsing to understand what the true value is. For example,
421
is naively converted to 421 at the moment; in order to do this additional processing I require the HTML source of the cell.
Some formats (Excel, HTML, etc.) support additional formatting - e.g. bold, font colour, background colour. It would be good to allow future support for these.
But we don't want to write enormous amounts of code to cover all use cases, especially where features are limited to one or two formats. But making available the internals of the library parsing the file (e.g. LXML's internal rendering of the cell) we can allow people to interrogate this data without hacking on messytables directly.
So: I propose adding a "properties" attribute to messytables Cells, which is a dictionary; what keys exist is entirely dependant on the helper library.
Currently, I:
expose internals via "_lxml", "_xlrd", "_pyxl"
expose raw HTML for the cell via "html"
expose whether a cell was spanned via "span" (HTML only so far)
Does this sound like a good idea / terrible idea?
The text was updated successfully, but these errors were encountered:
In practice, I can imagine having dicts of data on every single cell being pretty inefficient. When iterating over rows this is likely to stop those items (refs to underlying libs repr of cells) being garbage-collected if the caller keeps a reference to what they thought was just some text and a type obj - isn't it?
Random thought, if this is currently only useful for HTML tables, what do you think of only supporting it for HTML tables and having a nop for other formats until a use-case presents itself for them?
I'd drastically underestimated the memory usage of those dicts; it's about +50% in the case of {} on a CSV file. A better method might be more appropriate.
Often I find myself wanting more details about the individual cells than just their values.
e.g.
is naively converted to 421 at the moment; in order to do this additional processing I require the HTML source of the cell.
Some formats (Excel, HTML, etc.) support additional formatting - e.g. bold, font colour, background colour. It would be good to allow future support for these.
But we don't want to write enormous amounts of code to cover all use cases, especially where features are limited to one or two formats. But making available the internals of the library parsing the file (e.g. LXML's internal rendering of the cell) we can allow people to interrogate this data without hacking on messytables directly.
So: I propose adding a "properties" attribute to messytables Cells, which is a dictionary; what keys exist is entirely dependant on the helper library.
Currently, I:
Does this sound like a good idea / terrible idea?
The text was updated successfully, but these errors were encountered: