-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"primary_location:source.source.type" == "type"? #291
Comments
Thanks for the details (and cross-checking with the OpenAlex team)! I'm having some trouble following the narrative though - is the stuff about Could you give us a small reprex isolating the problem? Then we can us use that as the basis for debugging. |
No. "host_organization" and "issn_l" are all good. You can see the data structure using the following code: |
Thanks - I think I see what you mean. The Both are preserved in the A more minimal example: work <- oa_fetch(identifier = "W4396214724")
work$type
#> [1] "article"
work_list <- oa_fetch(identifier = "W4396214724", output = "list")
work_list$type
#> [1] "article"
work_list$primary_location$source$type
#> [1] "journal" I think what you have is a fair ask. We already hoist some properties of work |>
select(starts_with("so"))
#> # A tibble: 1 × 2
#> so so_id
#> <chr> <chr>
#> 1 The R Journal https://openalex.org/S2489169438 |
Hi @yhan818 so actually in this case, you can perform the filtering within the query (instead of after pulling all the data down first) which is also more efficient. We could add column library(openalexR)
org_works_2023 <- oa_fetch(
entity = "works",
institutions.ror = c("03m2x1q45"),
from_publication_date = "2023-01-01",
to_publication_date = "2023-01-01",
primary_location.source.type = "journal",
options = list(sample = 20, seed = 1),
verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=institutions.ror%3A03m2x1q45%2Cfrom_publication_date%3A2023-01-01%2Cto_publication_date%3A2023-01-01%2Cprimary_location.source.type%3Ajournal&sample=20&seed=1
#> Getting 1 page of results with a total of 20 records...
org_works_2023
#> # A tibble: 20 × 39
#> id title display_name author ab publication_date relevance_score so
#> <chr> <chr> <chr> <list> <chr> <chr> <dbl> <chr>
#> 1 https… Nove… Novel Wayfi… <df> If a… 2023-01-01 0.997 Vict…
#> 2 https… SEAR… SEARCHING F… <df> <NA> 2023-01-01 0.995 Abst…
#> 3 https… Neur… Neuropsychi… <df> To e… 2023-01-01 0.989 Leuk…
#> 4 https… Long… Longitudina… <df> Alzh… 2023-01-01 0.988 Neur…
#> 5 https… Corr… Corrigendum… <df> <NA> 2023-01-01 0.983 Magn…
#> 6 https… Tren… Trends in t… <df> Spor… 2023-01-01 0.981 Curr…
#> 7 https… The … The role of… <df> This… 2023-01-01 0.979 Ener…
#> 8 https… Rece… Recent prog… <df> ZIP1… 2023-01-01 0.979 Comp…
#> 9 https… Spec… Spectral me… <df> We r… 2023-01-01 0.975 Dele…
#> 10 https… Mult… Multiplexed… <df> The … 2023-01-01 0.974 Soft…
#> 11 https… Eval… Evaluation … <df> <NA> 2023-01-01 0.974 Arth…
#> 12 https… LATE… LATE MIOCEN… <df> <NA> 2023-01-01 0.969 Abst…
#> 13 https… Auto… Autobiograp… <df> <NA> 2023-01-01 0.966 Neur…
#> 14 https… MONI… MONITORING … <df> <NA> 2023-01-01 0.965 Abst…
#> 15 https… Edit… Editor’s Ch… <df> Scie… 2023-01-01 0.958 Mate…
#> 16 https… Maki… Making Dron… <df> Smal… 2023-01-01 0.949 Data…
#> 17 https… Disc… Discrete Ti… <df> Conv… 2023-01-01 0.949 IEEE…
#> 18 https… Audi… Audio deliv… <df> Heal… 2023-01-01 0.949 Proc…
#> 19 https… CHAR… CHARACTERIZ… <df> <NA> 2023-01-01 0.947 Abst…
#> 20 https… The … The implica… <df> Amyo… 2023-01-01 0.945 Anai…
#> # ℹ 31 more variables: so_id <chr>, host_organization <chr>, issn_l <chr>,
#> # url <chr>, pdf_url <chr>, license <chr>, version <chr>, first_page <chr>,
#> # last_page <chr>, volume <chr>, issue <chr>, is_oa <lgl>,
#> # is_oa_anywhere <lgl>, oa_status <chr>, oa_url <chr>,
#> # any_repository_has_fulltext <lgl>, language <chr>, grants <list>,
#> # cited_by_count <int>, counts_by_year <list>, publication_year <int>,
#> # cited_by_api_url <chr>, ids <list>, doi <chr>, type <chr>, … Created on 2024-10-24 with reprex v2.0.2 |
Great point! In that case I agree that we can hold off on this. Between the filter strategy and the option to run a secondary search on |
I am currently conducting citation analysis. I focus on “works” data obtained from OpenAlex. This dataset serves as the primary source for conducting data analysis and data mining, specifically aimed at understanding the publications and citation (mainly articles).
I primarily use the “host_organization” field to analyze our publishers and selected publishers along with their associated journals. This analysis helps us identify which journals are frequently cited and determine how many times individual journals or publishers have been cited over the years.
During the process, I filtered using "issn_l" containing value for year 2019 to 2023.
# Filter rows where issn_l is neither NA nor an empty string articles_cited <- works_cited_final[!is.na(works_cited_final$issn_l) & works_cited_final$issn_l != "", ]
As a result, ~20% of articles have "referenced_works" values as NA. For example, https://openalex.org/works/W2980882411
I met with OpenAlex staff and they said that some book chapters have ISSN too . They recommended using an additional filter to see "primary_location:source.source.type: " = "journal".
OpenAlex technical documentation has "Journal articles will have a primary_location.source.type of journal" (type is defined )
I checked my data pulled, which has 38 cols. One col has a name "type". Is this the mapping of the" primary_location:source.source.type: "? If not, how to get it?
Thank you very much,
The text was updated successfully, but these errors were encountered: