Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrunchCube.N #4

Open
jamesrkg opened this issue Dec 12, 2017 · 16 comments
Open

CrunchCube.N #4

jamesrkg opened this issue Dec 12, 2017 · 16 comments

Comments

@jamesrkg
Copy link

I can't see this anywhere yet. It would be handy to have access to the overall N of the cube. Unsure if there are any edge cases that would make it more complicated than:

    @property
    def N(self):
        '''Return the overall N of the cube.
        '''
        return self._cube['result']['n']
@jamesrkg
Copy link
Author

jamesrkg commented Dec 12, 2017

Looks like self._cube['result']['n'] is always the unweighted N? Common usage (I would think...) would be to return the weighted N if the cube's was weighted, however I suspect a use case exists for being able to easily get either when weighted results were exported. Perhaps this needs to be a method with a weighted kwarg, e.g.:

    def N(self, weighted=False):
        '''Return the overall N of the cube.
        '''
        if weighted:
            # return weighted N (derive from cube._cube['result']['measures']['count']['data'] ?)
        else:
            return self._cube['result']['n']

@slobodan-ilic
Copy link
Contributor

slobodan-ilic commented Dec 19, 2017

I've just seen this, I'll look into it... and after looking into it, what would you expect in the "derivation" case? In the case of unweighted, the n is equal to the sum(counts)... Also, please mind @malecki 's question.

@jamesrkg ☝️

@malecki
Copy link
Contributor

malecki commented Dec 19, 2017

What do you expect N to be for an array or multiple response, and how do you expect to use it? I can’t guarantee that this value is interpretable, but an accessor to it seems reasonable.

I would expect that what you actually want is margin(null, weighted=False) in all cases.

@jamesrkg ☝️

@slobodan-ilic
Copy link
Contributor

@jamesrkg @malecki
This has been implemented as count method on CrunchCube. I understand that N might have been more natural for your use case, but this feels more pythonic (and the linter doesn't complain as much).

I'm closing this issue, please let me know if you have any comments/suggestions.

@jamesrkg
Copy link
Author

jamesrkg commented Nov 7, 2018

@slobodan-ilic i'm looking at a multiXmulti cube which in Crunch looks like this:

multixmulti

But I'm having trouble getting the 649 figure:

>>> cube.count()
1293

cube.margin() is also unexpected I think:

>>> cube.margin()
[[238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]]

*Updated the cube.margin() snippet, I'd pasted in the wrong data originally.

As these questions relate to the original ticket I wanted to ask here but we can create separate tickets if this also looks suspect to you @slobodan-ilic .

@slobodan-ilic slobodan-ilic reopened this Nov 7, 2018
@slobodan-ilic
Copy link
Contributor

slobodan-ilic commented Nov 7, 2018

Can you paste the link to the actual cube in the app? Also, which version of cube are you using (is it pinned)?

I'll look into this as soon as I get the link, so that I can grab the cube...

@jamesrkg ☝️

@slobodan-ilic
Copy link
Contributor

I'm on it, will report here shortly.

@slobodan-ilic
Copy link
Contributor

I've created a PR for this (here: #119 ), that demonstrates how to obtain the same numbers as visible in whaam. Please let me know if you need any other numbers, or if you have an idea how to calculate them.

@jamesrkg ☝️

@slobodan-ilic
Copy link
Contributor

Btw, the reason behind the margin being a 2D np array is that the cube type is MR x MR. This translates to the actual dimensions (under the hood) being: MR(items) x MR(selected) x MR(items) x MR(selected). So it's actually a 4D cube (that comes from zz9 DB). So when you do a margin across a dimension (in the direction of the axis), you're actually summing across the selected dimension (the items is never summed across, because the items are independent). When this happens for all the possible dimensions, it collapses all the selections, but the MR items always remain.

Now, the thing that we show as a margin is only one row (or one column) of this. The right name for this would be the _denominator, and it's actually what's used in the cube as the private method. The margin should (probably) return either a row or a column (depending on the direction), and we might refactor it to do just that.

@jamesrkg @scanny @malecki ☝️

@jamesrkg
Copy link
Author

jamesrkg commented Nov 7, 2018

@slobodan-ilic that's exactly what I'm currently doing with the margin I get back from these kinds of cubes. 👍 It's just the total margin, in this case 649 that I can't seem to get at.

@slobodan-ilic
Copy link
Contributor

Hmm... Can't figure it out now. Will give it a shot first thing tomorrow...

@slobodan-ilic
Copy link
Contributor

slobodan-ilic commented Nov 8, 2018

@slobodan-ilic that's exactly what I'm currently doing with the margin I get back from these kinds of cubes. 👍 It's just the total margin, in this case 649 that I can't seem to get at.

Ok, so you can get it like this:

np.sum(cube.margin(axis=0), axis=1)

I'm not sure what this number represents though, because it's adding across subvariables (if I'm not mistaken). That's why we don't have it in cr.cube. If there's a real usage for this number, we can implement it al right. Also, if you check it, the columns margin (row) adds up to the total number, but the rows margin (column) does not. And this is (to me) completely wrong in the context of showing the N measure (all of this in whaam).

@jamesrkg ☝️

P.S.
Any comments on the numbers @malecki @gshotwell @jonkeane ?

@jamesrkg
Copy link
Author

jamesrkg commented Nov 8, 2018

I had assumed it would be the number of people who had any selected in either variable, but I've calculated that another way (outside of Crunch) and the base of this crosstab is 238, as per:

>>> cube.margin()
[[238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]
 [238 238 238 238 238 238 238]]

My guess is that where both multiple_response variables are uniform_basis=True (as they are in this example) all of the values in this 2D array will be the same. When one or both are uniform_basis=False that will not be the case. I will look into this separately.

Regardless, 649, as np.sum(cube.margin(axis=0), axis=1), is therefore the margin of the discretely observed responses (e.g. not at the respondent level). I'm not sure why that figure is there though. Should it be @malecki ?

@slobodan-ilic
Copy link
Contributor

@jamesrkg @malecki What's the required action on this issue?

@jamesrkg
Copy link
Author

jamesrkg commented Dec 20, 2018

I think we're awaiting confirmation of what, from Crunch's POV, the cube.N involving a multiple_response should be. I assume the answer would change depending on the value of uniform_basis? In any case it seems as though the result of np.sum(cube.margin(axis=0), axis=1) shouldn't be it, because this is not the base of the respondent-level data, it's the base of the observations (and what if both/more dimensions are multiple_response?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants