Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making database models for the Lokniti question data #18

Closed
3 tasks done
JusticeV452 opened this issue May 1, 2024 · 4 comments · Fixed by #20
Closed
3 tasks done

Making database models for the Lokniti question data #18

JusticeV452 opened this issue May 1, 2024 · 4 comments · Fixed by #20
Assignees

Comments

@JusticeV452
Copy link
Member

JusticeV452 commented May 1, 2024

Now that the codebook csv mapping question variables to question text #13 is finished, all the data needs to be added to the database in some easy-to-use format. Based on the structure of question response data, I propose we integrate the data across all years using three models: Some model representing an individual who responded ("Responder"), another model for each response of a "Responder" ("Responses"), and a model for the Codebook. For some more details about what each of the models should contain:

  • "Responder" (or some other variation): @vaeyias
    Each row of the main data contains information that uniquely identifies a person and their responses to the survey questions.
    The Responder model consists of only personal information, such as a person's state name, IDs, age, status, etc. The question responses will be stored in the "Responses" table instead of with the particular responder.
  • "Responses":
    The Responses model contains all the responses from each responder and has attributes storing the year, question variable name, a foreign key to the responder, and a foreign key to a codebook entry
  • "Codebook":
    The Codebook should contain the same attributes as the column names in the spreadsheet created from Make codebook for column names in lokniti question data #13 and can be created straightforwardly using the existing update_db command. Implementing this model first might make it easier to look up the variable name corresponding to the state name when creating the Responder model (each responder should have a state name attribute, although the variable name containing the state name varies from year to year).

We could have as many as three people working on this issue, each implementing one of the models above.

@vaeyias
Copy link
Contributor

vaeyias commented May 1, 2024

In the lokniti-data branch, I started each model/serializer, and the codebook model is complete. Responders is in-progress. Also, instead of using the "Resno" columns in the data files as the respondent number, we should use the first column because "Resno" columns have a lot of repeats (maybe resno does not mean respondent number?) I renamed the first column to "entry_no" in each file

@JusticeV452
Copy link
Member Author

JusticeV452 commented May 1, 2024

I looked at the csvs again, and I think "resno" is per state or some other combination of pc/ac/ps id. Regardless, we should still have the resno attribute of the model reflect what it is in the csv, since it is one of the fields on the document that people needed to fill out. Django automatically assigns a unique id to every object added to a table, so there isn't a need to assign an id attribute per respondent if it's just the row number.

@vaeyias
Copy link
Contributor

vaeyias commented May 1, 2024

Codebook and Responders models are complete:

  • To load Codebook, run update_db
  • To load Responders run load_responders on each of the four NES data files

Some sketchy things I have noticed:

  • Some respondent numbers, ages, ps_id's are nan; i set the default value for all of these to 0, but there is probably a better default value

@JusticeV452
Copy link
Member Author

Great, thanks! I think you should add the attributes (null=True, blank=True) to all the fields in the csv that might be nan. That way, the attributes can be assigned None instead of 0 to signify that they are blank and won't appear in general queries based on those values. If we want to find people who haven't responded, we could check explicitly using something like [model].objects.filter([attribute]__isnull=True).count()

@czheng10 czheng10 self-assigned this May 2, 2024
@vaeyias vaeyias linked a pull request May 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants