-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: how to use a loop to compare and extract data from dataframe #11
Comments
Hi @mattnuttall00, One nice feature of
So I'm starting off with the below.
So, if you want all of the unique columns of
The above gives the following for
And it gives the below for
So using your data set, I ran the following code.
I got this output.
The triple loop way of doing it that you used is perfectly reasonable too (it might take a bit more time than the above just because loops are slow in R -- underneath everything, |
Hi @bradduthie , thanks for your comprehensive response! It was really useful, and good to know about So I think I may not have explained the final step very well. Other people I've spoken to also didn't get what I meant either so totally my fault. But the final step I need to do is: rather than pull out all unique values, once I have got R to look at each 'Marker' within each 'Sample_Name', I need it to look at the three values in column 'Allele_1' (each Allele will have 3 rows within each Marker), and only if they are the same number in all 3 rows, should R pull it out and put the information into the new dataframe (and it only needs to pull out 1 row, so that I know the sample name, marker, and the value). That's why my output dataframe in my original question has only 3 rows - because Sample_Number 1, Marker b does not have matching values in Allele_1, nor does it have matching values in Allele_2, so it gets excluded. That's what my
was trying to do. I'm trying to tell R to see if the three values in the column 'Allele_1' (once separated by Sample_Name and Marker) are the same, and if they are, pull out a row. Although I know that code is wrong! Does that make sense? Apologies if I am explaining it really poorly. Do let me know if I'm not being clear and I will try again! |
@mattnuttall00 Ah! Now I see what you mean. I think I have a way that should work -- basically, use the Sorry, I'll try to give some more meaningful code later, but the idea is this:
If that makes sense? |
How 'bout this! @mattnuttall00 library(tidyverse) matt <- read_csv("matt.csv") %>% as_tibble() # reading in your data - ive changed the names matt <- matt %>% select(sn, m, a1, a2) # select relevant cols - again, ive changed names matt %>% group_by(sn, m, a1) %>% summarise(count = n()) %>% filter(count >= 3) # this returns a df with 1 row for each sn/m/a1 that has 3 matching values
then youd have to do the same again with Allele_2, then join_tables or bind_cols |
Hey Matt, Here is a different take - no The only thing I'm not sure about is why the third line of your desired result chose the row where Allete_2 was
If the
Another option would be to not report the
|
@bradduthie @anna-deasey @jmcvw @jmcvw the reason I would want the NA to be selected rather than 148 is because there are only two values of 148, therefore that allele doesn't fit the rule that to be selected there must be 3 matching values. And the structure of that output table was me assuming that entire rows would need to be selected, and because Allele_1 did meet the rule, there would need to be something under Allele_2 (and it couldn't be 148). Hope that makes sense! Thanks again all :) |
Hi all,
I am going to attempt to post the first coding question! Apologies in advance if I don't quite nail the "how to post a good question", but please feel free to use this as an opportunity to offer suggestions for future users about how to post a good question. i.e. you can tell me what I could have done to make the question better - you won't hurt my feelings :)
I have a dataframe with ~10,000 observations of 35 variables. I have attached a much smaller subset as an example (attachment at the bottom). The data are outputs from genetic analyses. The structure of the subsetted data looks like this:
What I need to do is to first look at each unique 'Sample_Name', and then within that 'Sample_Name' look at each 'Marker'. And then within each 'Marker' (which are normally 3 rows in length), I want R to look row by row down the column 'Allele_1' and if the values are the same I want it to put one row of data into a new dataframe with some of the same column headers as the original data. I then need R to do the same thing, but for column 'Allele_2'.
So for example, if a portion of the original dataframe looked like this (I've excluded unnecessary cols):
I would want R to create the following dataframe:
I am guessing this needs to be done using a loop (but feel free to suggest an easier way!). I am very much a beginner with loops, but I have had a crack anyway. Surprise surprise, it hasn't worked, and I'm assuming I am wildly wrong, but what I have tried is below. (note: I have only tried to do it for Allele_1)
When I run the above code I get the error:
So I guess I am not getting that bit of code right? But potentially a lot of it isn't right...!
Any guidance would most appreciated!
All PCRs Plex1 missing 002 PCR2.xlsx
John: In your example, what these rows? Are they supposed to be removed?
The text was updated successfully, but these errors were encountered: