How can I interact with the facts stored in inst? #4

cjs365 · 2021-02-18T07:58:48Z

cjs365
Feb 18, 2021

This is a brilliant project and the only library that I find useful in parsing the EDGAR XBRL comprehensively. Now I have troubles to extract numerical data from facts. Is there a way to convert the facts stored in inst into a structured format e.g. pandas dataframe? many thanks

Answered by manusimidt

Mar 18, 2021

Sorry for the late answer. Got no notification about your question for some reason...

Thank you!
I would just loop over the facts array of the instance object i.e:

# code for downloading and parsing the submission
cache: HttpCache = HttpCache('./../cache/', delay=500)
instance_url = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926.htm'
inst: XbrlInstance = parse_ixbrl_url(instance_url, cache)
print(inst)

# now extracting some selected facts
extracted_data: [dict] = []
selected_facts: [str] = ['Assets', 'Liabilities', 'StockholdersEquity']
for fact in inst.facts:
    # use some kind of filter, otherwise your dataframe will have maaaaannnyyy columns (one for…

View full answer

manusimidt · 2021-03-18T14:55:55Z

manusimidt
Mar 18, 2021
Maintainer

Sorry for the late answer. Got no notification about your question for some reason...

Thank you!
I would just loop over the facts array of the instance object i.e:

# code for downloading and parsing the submission
cache: HttpCache = HttpCache('./../cache/', delay=500)
instance_url = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926.htm'
inst: XbrlInstance = parse_ixbrl_url(instance_url, cache)
print(inst)

# now extracting some selected facts
extracted_data: [dict] = []
selected_facts: [str] = ['Assets', 'Liabilities', 'StockholdersEquity']
for fact in inst.facts:
    # use some kind of filter, otherwise your dataframe will have maaaaannnyyy columns (one for every concept)
    if fact.concept.name not in selected_facts: continue
    # only select non-dimensional data for now
    if len(fact.context.segments) > 0: continue
    extracted_data.append({'date': fact.context.instant_date, 'concept': fact.concept.name, 'value': fact.value})

df: pd.DataFrame = pd.DataFrame(data=extracted_data)
df.drop_duplicates(inplace=True)
#pivot the dataframe so that the concept name is now the column
pivot_df: pd.DataFrame() = df.pivot(index='date', columns='concept')
print(pivot_df)

This will create the following dataframe:

However please notice that this only works if you just select instant concepts (i.e: from Balance Sheet).
If you want to get facts linked to concepts that span over a duration (i.e: from Income Statement) you would have to replace the "instant_date" with "end_date".
It is really dificult if not impossible to pack all facts from one submission into one dataframe, because you have to deal with different dates and timeframes.
Therefore i would recommend to select the concepts you want to extract (as shown in the code example).

Happy coding :)

0 replies

Pablompg · 2021-05-26T14:01:22Z

Pablompg
May 26, 2021

This is a dirty approach I have come around with:

import pandas as pd
import logging
from xbrl_parser.cache import HttpCache
from xbrl_parser.instance import XbrlInstance, parse_xbrl_url

import functools

DELIMITER = "."
def rgetattr(obj, path: str, *default):
    """
    :param obj: Object
    :param path: 'attr1.attr2.etc'
    :param default: Optional default value, at any point in the path
    :return: obj.attr1.attr2.etc
    """

    attrs = path.split(DELIMITER)
    try:
        return functools.reduce(getattr, attrs, obj)
    except AttributeError:
        return None

logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./cache')
# Replace the dummy header with your information!! 
# services like SEC EDGAR require you to disclose information about your bot! (https://www.sec.gov/privacy.htm#security)
cache.set_headers({'From': '[email protected]', 'User-Agent': 'Tool/Version (Website)'})

xbrl_url: str = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926_htm.xml'
inst: XbrlInstance = parse_xbrl_url(xbrl_url, cache)


def get_df(inst):
    columns = {
        'entity',
        'value',
        'unit_id',
        'numerator',
        'denominator',
        'unit',
        'decimals',
        'concept.xml_id',
        'concept.schema_url',
        'concept.name',
        'concept.substitutio_group',
        'concept.concept_type',
        'concept.abstract',
        'concept.nillable',
        'concept.period_type',
        'concept.balance',
        'abstract.xml_id',
        'abstract.entity',
        'abstract.instant_date',
        'absrtact.start_date',
        'abstract.end_date',
        'metadata'
    }
    df = pd.DataFrame(columns = columns)


    rows = []
    for fact in inst.facts:
        row = {
            'value': rgetattr(fact, 'value'),
            'entity': rgetattr(fact, 'context.entity'),
            'unit_id': rgetattr(fact, 'unit.unit_id'),
            'unit': rgetattr(fact, 'unit.unit'),
            'numerator': rgetattr(fact, 'unit.numerator'),
            'denominator': rgetattr(fact, 'unit.denominator'),
            'decimals': rgetattr(fact, 'decimals'),
            'concept.xml_id': rgetattr(fact, 'concept.xml_id'),
            'concept.schema_url': rgetattr(fact, 'concept.schema_url'),
            'concept.name': rgetattr(fact, 'concept.name'),
            'concept.substitutio_group': rgetattr(fact, 'concept.substitutio_group'),
            'concept.concept_type': rgetattr(fact, 'concept.concept_type'),
            'concept.abstract': rgetattr(fact, 'concept.abstract'),
            'concept.nillable': rgetattr(fact, 'concept.nillable'),
            'concept.period_type': rgetattr(fact, 'concept.period_type'),
            'concept.balance': rgetattr(fact, 'concept.balance'),
            'context.xml_id': rgetattr(fact, 'context.xml_id'),
            'context.entity': rgetattr(fact, 'context.entity'),
            'context.instant_date': rgetattr(fact, 'context.instant_date'),
            'context.start_date': rgetattr(fact, 'context.start_date'),
            'context.end_date': rgetattr(fact, 'context.end_date'),
            'metadata': len(fact.context.segments),
        }
        rows.append(row)

    df = pd.DataFrame(rows)
    return df

df = get_df(inst)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I interact with the facts stored in inst? #4

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How can I interact with the facts stored in inst? #4

cjs365 Feb 18, 2021

Replies: 2 comments

manusimidt Mar 18, 2021 Maintainer

Pablompg May 26, 2021

cjs365
Feb 18, 2021

manusimidt
Mar 18, 2021
Maintainer

Pablompg
May 26, 2021