-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional datapoints: domain and inclusion criteria #94
Comments
Thanks @nwagu, great suggestions. We are working on a brand new dataset collection.
We will add this in the next release of the dataset!
I agree this can be very interesting indeed. I thought about this as well. This can be a tremendous job to collect these criteria. In the upcoming version of the dataset collection, we will ensure that every dataset has the systematic review publication attached to it. That makes it easier to search for these variables and hopefully have a list of inclusion criteria as well. Can you give an example of the format you would like to see? |
Okay. Looking forward to the new dataset and am available to contribute to it. The criteria can be adapted from the paper, for example:
So included papers can be assumed to have abstracts that answer yes to all the questions. I feel like it would be easy (though manual) to get the questions from HCI review papers, not sure about all other fields. |
Thanks for the example! I discussed this with the team, and this example helps. They think it is definitely interesting, and we will look into it. Question: do you have a reference to the study you are referring to? It looks like a nice addition to our work (if the data is available as well) |
I took a paper https://jcircadianrhythms.com/articles/10.5334/jcr.183 in our new collection and asked ChatGPT to extract the questions in the form you requested.
The answer:
So this looks quite good and interesting to add. I also asked to make it into a JSON object, which is interesting: {
"criteria": [
{
"question": "Was the study conducted using microdialysis technique?",
"exclusion": false
},
{
"question": "Did the study measure Hist and at least one of the following amino acids: Asn, Asp, GABA, Glu, Gln, Gly, Pro, or Tau?",
"exclusion": false
},
{
"question": "Was the study a retro-dialysis study?",
"exclusion": true
},
{
"question": "Did the study report baseline values without the specified molecules in the perfusion fluid?",
"exclusion": true
},
{
"question": "Was the study an extra-cerebral microdialysis study?",
"exclusion": true
},
{
"question": "Was the study a human or in vitro study?",
"exclusion": true
},
{
"question": "Did the paper contain primary study data?",
"exclusion": true
},
{
"question": "Did the study measure one or more of the molecules of interest during naturally occurring sleep stages that were validated with polysomnographic measurements?",
"exclusion": false
},
{
"question": "Did the study measure one or more of the molecules of interest during sleep deprivation?",
"exclusion": false
},
{
"question": "Did the study include any of the following terms in the title: '*sleep*', '*REM*', '*rest*', '*fatig*', or '*somn*'",
"exclusion": false
},
{
"question": "Was the study not using techniques other than microdialysis, or did it not report data on amino acids other than those we searched for?",
"exclusion": true
}
]
}
I think your idea is very nice, and we might want to include this for all datasets or as an extension in the future. |
ChatGPT to the rescue! I like the JSON format too My example was from https://dl.acm.org/doi/pdf/10.1145/3491102.3501875. The data they have provided is the included papers only in a pdf |
I think ChatGPT is not the only option and a T5 model can be fine-tuned for this particular task. |
We would also very much appreciate this addition. Would it help to crowd-source this? If a bunch of people do one or two reviews each, this is a manageable effort. I'd volunteer to do two! This would need some instructions though, so it contains what you'd like in the form that you think is best. |
Please do not use ChatGPT or T5 or any other machine learning approach to create the inclusion criteria! The synergy dataset serves an important purpose in that we can use it to test how well machine learning techniques work. It is therefore important that the data contained in it (including on inclusion criteria) is high quality and reliable. The example given above looks OK on first glance, but read the last one:
If a study was using only microdialysis, then it was "not using techniques other than microdialysis", so it would pass this question and be excluded! Further, The original criteria included studies measuring one or more of the molecules of interest during (1) naturally occurring sleep stages that were validated with polysomnographic measurements and/or (2) during sleep deprivation. Yet the ChatGPT version splits these two into two separate criteria, making the and/or a simple and. It also mixes up inclusion criteria with the search strategy. I think there are a couple of tasks involved here. The first one is to simply locate the text in the paper. The second is turn this into a structured form. I actually think just the first part would already be extremely helpful (and a task for humans not ChatGPT). It may or not be a good idea to use ChatGPT to turn this text into a structured format. This first example shows that it may be more difficult, but if we want to use a model for this task, we should first create a hand-labelled dataset to be able to assess how well the model can perform this task. So in any case we would need to create by hand an ideal structured version of the inclusion criteria. I think 26 papers is not such a large number that this would not be feasible (at least in a sample of cases), and I don't think this number of papers justifies fine-tuning a model. I'd be happy to volunteer to extract inclusion criteria for 19% of your systematic reviews |
@mcallaghan Thanks for your response! I have copied and pasted the text containing the inclusion criteria for all SYNERGY papers into a table. The next step is to create a standardized format for presenting the criteria. Do you have ideas on how to present these? |
I observed three types of criteria: 1) Meta Criteria Examples I came across:
Examples:
|
Thanks very much for this! I think a useful first step would be all the inclusion criteria - exactly as they are written - in text form. For experiments with using LLMs for screening, it would be interesting to see if providing the criteria as they are written helps identify studies better. I think it could also be interesting to list of each of these, perhaps in a form similar to that suggested above, although I would perhaps consider a simplification
This is at least easier to understand to my mind. I think it is also crucial to pay attention to double negatives. One could consider writing the exclusion criteria in negative form. "Study must not be a case study", but I think the positive form is slightly easier. Written like this, one could think of looping through each of the inclusion criteria, and each of the exclusion criteria and asking an LLM to provide the answer for each. Then you could come to the decision based on the combination of these answers. One final thing is the boolean logic. I am assuming that a study is included if ALL of the inclusion criteria apply, and a study is excluded if ANY of the exclusion criteria apply. However, this might not always be the case. If not, one might want to make the logic explicit.
|
It is also interesting that we have criteria like "Full-text access required". Automated approaches will never be able to replicate this, unless we knew exactly which journals the reviewers had access to, and which papers had open fulltext available (on researchgate, requested through researchgate, on google scholar, etc.). It makes me wonder if these criteria are all applied at the title and abstract level. If there is some distinction made, then it may be wise to separate title and abstract inclusion criteria from full text inclusion criteria. |
It would be nice to make the criteria more easily accessible for more datasets! My suggestion would be to use YAML over JSON:
In our recent paper (to be presented at IAL@ECML-PKDD 2024), we did exactly this! criteria:
- a: |
Is the study a longitudinal/prospective study with at least three-time point assessments measuring posttraumatic stress disorder (PTSD)?
Answer with YES if it is a longitudinal/prospective study with at least three-time point assessments measuring PTSD.
Answer with NO if the study is not longitudinal or prospective or does not measure PTSD.
Answer with UNKNOWN if the study is longitudinal but the number of time-point assessments is not mentioned
- b: |
Does the study assess PTSD symptoms as a continuous variable using an eligible PTSD scale?
Here are some eligible PTSD scales (answer with YES if the scale is in this list) that measure PTSD as a continuous variable:
* Clinician Administered PTSD Scale (CAPS)
* PTSD Checklist (PCL)
[...] We transformed the screening protocol to a more verbose version to make it more suitable for the LLM. The idea behind this formatting is as follows: Next, every criterion has an identifier (in our work, To summarize, I would propose the following format (using the example of @mcallaghan): criteria:
- human: "Study includes human subjects"
- english: "Study is written in English"
- spanish: "Study is written in Spanish"
- invitro: "Study is a human/in vitro study"
- casestudy: "Study is a case study"
decision_function:
- main: "inclusion AND NOT exclusion"
- inclusion: "human AND (english OR spanish)"
- exclusion: "invitro OR casestudy" Encoding the decision function fully in YAML may also be an option. |
The addition of the eligibility criteria also enables us to study how well we would perform if we took the inclusion and exclusion criteria as a prior starting point compared to the current default of a 1+1 prior. We started making these comparisons in this repository. First preliminary results (using TFIDF and Naive Bayes) show that using these inclusion and exclusion criteria performs about equal with a 1+1 prior knowledge start. |
This may not be very necessary for active learning, but it makes the data more meaningful, and accessible on its own. In a structured format, it can be read using scripts without needing to go to the source of the data.
Would very much prefer if the inclusion criteria is a list of the criteria all in boolean question format. This is important for a project I am working on.
And domain, the general field of the research so researchers can be selective.
The text was updated successfully, but these errors were encountered: