How can I interact with the facts stored in inst? #4
-
This is a brilliant project and the only library that I find useful in parsing the EDGAR XBRL comprehensively. Now I have troubles to extract numerical data from facts. Is there a way to convert the facts stored in inst into a structured format e.g. pandas dataframe? many thanks |
Beta Was this translation helpful? Give feedback.
Answered by
manusimidt
Mar 18, 2021
Replies: 2 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
manusimidt
-
This is a dirty approach I have come around with: import pandas as pd
import logging
from xbrl_parser.cache import HttpCache
from xbrl_parser.instance import XbrlInstance, parse_xbrl_url
import functools
DELIMITER = "."
def rgetattr(obj, path: str, *default):
"""
:param obj: Object
:param path: 'attr1.attr2.etc'
:param default: Optional default value, at any point in the path
:return: obj.attr1.attr2.etc
"""
attrs = path.split(DELIMITER)
try:
return functools.reduce(getattr, attrs, obj)
except AttributeError:
return None
logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./cache')
# Replace the dummy header with your information!!
# services like SEC EDGAR require you to disclose information about your bot! (https://www.sec.gov/privacy.htm#security)
cache.set_headers({'From': '[email protected]', 'User-Agent': 'Tool/Version (Website)'})
xbrl_url: str = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926_htm.xml'
inst: XbrlInstance = parse_xbrl_url(xbrl_url, cache)
def get_df(inst):
columns = {
'entity',
'value',
'unit_id',
'numerator',
'denominator',
'unit',
'decimals',
'concept.xml_id',
'concept.schema_url',
'concept.name',
'concept.substitutio_group',
'concept.concept_type',
'concept.abstract',
'concept.nillable',
'concept.period_type',
'concept.balance',
'abstract.xml_id',
'abstract.entity',
'abstract.instant_date',
'absrtact.start_date',
'abstract.end_date',
'metadata'
}
df = pd.DataFrame(columns = columns)
rows = []
for fact in inst.facts:
row = {
'value': rgetattr(fact, 'value'),
'entity': rgetattr(fact, 'context.entity'),
'unit_id': rgetattr(fact, 'unit.unit_id'),
'unit': rgetattr(fact, 'unit.unit'),
'numerator': rgetattr(fact, 'unit.numerator'),
'denominator': rgetattr(fact, 'unit.denominator'),
'decimals': rgetattr(fact, 'decimals'),
'concept.xml_id': rgetattr(fact, 'concept.xml_id'),
'concept.schema_url': rgetattr(fact, 'concept.schema_url'),
'concept.name': rgetattr(fact, 'concept.name'),
'concept.substitutio_group': rgetattr(fact, 'concept.substitutio_group'),
'concept.concept_type': rgetattr(fact, 'concept.concept_type'),
'concept.abstract': rgetattr(fact, 'concept.abstract'),
'concept.nillable': rgetattr(fact, 'concept.nillable'),
'concept.period_type': rgetattr(fact, 'concept.period_type'),
'concept.balance': rgetattr(fact, 'concept.balance'),
'context.xml_id': rgetattr(fact, 'context.xml_id'),
'context.entity': rgetattr(fact, 'context.entity'),
'context.instant_date': rgetattr(fact, 'context.instant_date'),
'context.start_date': rgetattr(fact, 'context.start_date'),
'context.end_date': rgetattr(fact, 'context.end_date'),
'metadata': len(fact.context.segments),
}
rows.append(row)
df = pd.DataFrame(rows)
return df
df = get_df(inst) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sorry for the late answer. Got no notification about your question for some reason...
Thank you!
I would just loop over the facts array of the instance object i.e: