Skip to content

Clustered regression

121onto edited this page Feb 20, 2019 · 3 revisions
  1. Setup your workspace:

    from __future__ import print_function
    from __future__ import absolute_import
    from __future__ import division
    
    import pandas as pd
    import numpy as np
    
    from py_metrics import caches
    from py_metrics.regress import Cluster
    
    frame = pd.read_csv(caches.data_path('ddk2011.txt'))
    frame['intercept'] = 1.0
    std = frame['totalscore'].std()
    mu = frame['totalscore'].mean()
    frame['testscore']  = (frame['totalscore'] - mu) / std
  2. Fit the regression:

    # Initialize
    x = ['intercept', 'tracking']
    y = 'testscore'
    grp = 'schoolid'
    
    reg = Cluster(x, y, grp)
    reg.fit(frame)
    reg.summarize()
  3. Estimate a cluster-robust covariance matrix:

    vce = pd.DataFrame(
        reg.vce('cr3'),
        index=reg.x_cols,
        columns=reg.x_cols)
    print(vce)

Additional examples with more detail are available in the examples directory.

Clone this wiki locally