Indexing custom tables for Table Question Answering #4274
-
I have generated many tables from a PDF and stored them as seperate CSV files. How to combine these CSV files and index it to one json file using elastic search document store? I went through the Tutorial15 Table QA but it did not show how the tables was combined(indexed) into one tables.json file. Requesting help. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hey @MariaDavid30, you don't need to convert your CSV files into JSON to index them into a DocumentStore. Instead, you can directly create DataFrames from your CSVs using For example, this is a Document created from one table in the tutorial. <Document: {'content': [['Opponent', 'M', 'W', 'L', 'T', 'NR', 'Win%', 'First', 'Last'], ['Afghanistan', '2', '2', '0', '0', '0', '100.0', '2012', '2014'], ['Australia', '98', '32', '62', '1', '3', '34.21', '1975', '2017'], ['Bangladesh', '35', '31', '4', '0', '0', '88.57', '1986', '2015'], ['Canada', '2', '2', '0', '0', '0', '100.0', '1979', '2011'], ... , ['Zimbabwe', '59', '52', '4', '1', '2', '92.1', '1992', '2018'], ['Total[12]', '894', '474', '394', '8', '18', '54.56', '1973', '2018']], 'content_type': 'table', 'score': None, 'meta': {}, 'id_hash_keys': ['content'], 'embedding': None, 'id': '1de8c757-2b70-4061-b0ee-b605b839e861'}> |
Beta Was this translation helpful? Give feedback.
-
@MariaDavid30 then when you wanted to use your data in the reader, instead of doing this: Try the TableReader on one Tabletable_doc = document_store.get_document_by_id("36964e90-3735-4ba1-8e6a-bec236e88bb2") what did you use for the variable table_doc ? |
Beta Was this translation helpful? Give feedback.
Hey @MariaDavid30, you don't need to convert your CSV files into JSON to index them into a DocumentStore. Instead, you can directly create DataFrames from your CSVs using
pandas
library. Then, you can create Haystack Documents from those DataFrames. One thing you need to keep in mind that as you create Documents, you need to statecontent_type
astable
and set thecontent
as the array of each of your tables.For example, this is a Document created from one table in the tutorial.
content
is the array of rows of the table and the first item['Opponent', 'M', 'W', 'L', 'T', 'NR', 'Win%', 'First', 'Last']
is the header of the table: