Minor changes:
Cellosaurus
: internal code tweak to improve redownload of outdated cached release file.
New functions:
geneFusions
,mutations
: New functions that extract sequence annotation information about gene fusions and driver gene mutations.cellsPerGeneFusion
,cellsPerMutation
: New functions that callgeneFusions
ormutations
internally respectively, and return aDFrame
containing logical columns per gene fusion or gene mutation.
Minor changes:
tnbc
: Added cell line exclusion rules.
Major changes:
- Removed stringi dependency in favor of AcidBase for string splitting.
Cellosaurus
: No longer converting strings to factor. Simply encoding usingRle
instead. Removedfactorize
call in primary generator.
Minor changes:
- Updated Acid Genomics imports to use strict camel case.
export
: Updated to use generic from AcidGenerics instead of BiocIO. This variant doesn't require unusedformat
argument, which is preferable.
Major changes:
- Migrated
selectCells
from DepMapAnalysis package and added code coverage against expected match failures.
Minor changes:
- Migrated generics in use to AcidGenerics package.
- Split out
excludeContaminatedCells
andexcludeProblematicCells
to separate documentation files.
Minor changes:
- Added a vignette to the package, which shows how to use the primary generator
and
mapCells
function, in particular.
Major changes:
Cellosaurus
: Reworked internal code to parse and extract ATCC identifiers, which are commonly used instead of Cellosaurus identifiers to organize cell lines. Also added support formisspellings
column, which handles edge cases where cell line names are misspelled.
Minor changes:
mapCells
: Added option to returnNA
on map failure instead of error by settingstrict = FALSE
.mapCells
: Added support for mapping by ATCC identifiers.
New functions:
excludeProblematicCells
: Exclude (remove) cell lines from theCellosaurus
object that are labeled as"Problematic cell line"
in the comments. Note that this function is more strict thanexcludeContaminatedCells
, which are a subset of problematic cells on Cellosaurus.excludeContaminatedCells
: Exclude cell lines that are labeled as"Problematic cell line: Contaminated"
in the comments.
Major changes:
Cellosaurus
: Return now includes OncoTree metadata, which are mapped against the NCI thesaurus disease identifiers.
Minor changes:
Cellosaurus
generator now returnsisContaminated
column, which is useful for differentiating betweenisProblematic
lines, which may simply be misidentified, versus cell lines that are really problematic due to contamination issues.- Updated internal taxonomy parsing code to sanitize organism into full Latin name (e.g. "Homo sapiens") from "Homo sapiens (Human)" without the trailing nickname defined in the parentheses.
- Resaved example
cello
object.
New functions:
currentCellosaurusVersion
: Check the Cellosaurus server for current release version. Currently returns asinteger
.
Minor changes:
Cellosaurus
: Updated key forsamplingSite
metadata column, which is now defined asDerived from site
in 46 release update.- Updated Acid Genomics dependencies.
Minor changes:
mapCells
: Reworked our internal matching code.- Now using new
matchNested
function internally. - Consistently dispatching on
DFrame
instead ofDataFrame
virtual class. - Split out internal
.processEntry
function.
Minor changes:
- Reworked DataFrame rbind step using our
rbindToDataFrame
function instead of data.tablerbindlist
. - Updated dependencies to support new Bioconductor 3.17 release.
Minor changes:
Cellosaurus
: FixedncitDiseaseId
andncitDiseaseName
mapping issue with accessions containing multiple matches (e.g."CVCL_0011"
,"CVCL_0028"
).- Switched from future.apply to parallel, for optional parallel tasks.
- Improved coverage of expected data return.
Major changes:
Cellosaurus
: Completely reworked main generator function. Now the package parses thecellosaurus.txt
file internally instead of the previously usedcellosaurus.obo
file. We ran into OBO parser issues with the currentcellosaurus.obo
file (release 44). Also, only thecellosaurus.txt
file contains additional useful metadata, including secondary accessions and the patient age at sampling. We have attempted to standardize metadata columns in the returnedCellosaurus
object to better match the naming conventions currently used on the Cellosaurus website.export
: Updated method to drop nested list columns (SimpleList
) from the exported CSV file. Dropped columns currently include:"comments"
,"crossReferences"
,"date"
,"diseases"
,"hierarchy"
,"originateFromSameIndividual"
,"referencesIdentifiers"
,"strProfileData"
,"webPages"
.mapCells
: Updated mapping engine to also support secondary accession identifiers, which is very useful for redirected previously used identifiers that are still present in DepMap and Sanger CellModelPassports databases. Also reworked approach for handling standardized cell names at the last step, to avoid mapping issues with tricky cell line names, like ICC2 vs. ICC-2. These are non-breaking changes that are tested to map against all supported cell lines on DepMap and Sanger CellModelPassports.
Minor changes:
- Added cell name mapping code coverage against DepMap 22Q2, which differs significantly from DepMap 22Q4.
mapCells
: Now supports return of cell line name.- Removed BT-549 cell line mapping override. Sanger CMP is currently incorrect.
- Added some additional identifier aliases to support DepMap 22Q2 coverage.
Major changes:
mapCells
: Reworked internal matching engine, and added support for manual overrides usingoverrides
object defined insysdata.rda
. The original mappings are defined inoverrides.csv
(seedata-raw
). Mappings are now covered against all cell lines defined in DepMap (22Q4) and Sanger CellModelPassports.
Minor changes:
Cellosaurus
: removed option to override caching manually withcache
.sanitizeCells
: Added an additional handling rule for edge case.Cellosaurus
object now gets saved withpackageVersion
inmetadata
.- Resaved example
cello
object.
Minor changes:
Cellosaurus
: Fix for"CVCL_7082"
line, which is actually named"NA"
.standardizeCells
: Fix for handling of all cells in Cellosaurus database.mapCells
: Added some additional name variant rules for better matching.
Major changes:
- Now pinning
cellosaurus.obo
file internally atr.acidgenomics.com
server instead of downloading the latest release version fromftp.expasy.org
. This change was made due to breaking changes introduced in Cellosaurus 44 release that broke the package.
Minor changes:
- Improved standardization of column names (e.g.
depmapId
instead ofdepMapId
;sangerModelId
instead ofsangerId
), for better consistency with DepMapAnalysis and CellModelPassports packages. - Added
cache
override option to mainCellosaurus
generator, which makes updating to latest version (e.g. 43), more intuitive than having to delete the BiocFileCache directory.
Minor changes:
export
: Harden inheritance of S4 methods, to ensure that we class onCellosaurus
, instead of inheriting the default method forDataFrame
.
Minor changes:
Cellosaurus
class now returns withsex
metadata column.- Factor columns are now automatically handled using
factorize
internally, and all applicable vectors are converted toRle
for improved memory efficiency. export
: Added initial experimental method support for export of Cellosaurus metadata, that dynamically drops columns that aren't useful in CSV format.
This is a major update, with breaking changes.
New S4 classes:
Cellosaurus
: Now defining this class instead ofCellosaurusTable
. Data is retrieved using ontologyIndex from Cellosaurus FTP server instead of querying the website directly.
Major changes:
mapCells
: Now supports return of multiple identifier key types, including Cellosaurus (default), DepMap, and Sanger (for Cell Model Passports).- Now using taxizedb internally for NCBI taxonomy identifier matching to full Latin organism name (species; e.g. "Homo sapiens").
Minor changes:
- Bug fix for breaking change in pipette namespace.
Major changes:
- Split out basejump dependencies.
CellosaurusTable
: Added support for return of more identifier columns. Improved support for handling of non-human (e.g. mouse) cell lines.- Updated
CellosaurusTable
to use R 4.2-specificformula
call. - S4 class inherits from
DFrame
now, due to a breaking change introduced with Bioconductor 3.15, whereDataFrame
no longer works.
Minor changes:
- Updated basejump dependencies and removed unnecessary stringr import.
Minor changes:
- Reworked NAMESPACE, following basejump v0.14 release series update.
- Simplified the number of dependencies, and removed need for internal dplyr
code, instead using new
rbindToDataFrame
approach. - Removed internal dependency on BiocParallel, so as to not query the Cellosaurus server too frequently.
Major changes:
- Renamed package from cellosaurus to Cellosaurus.
Minor changes:
- Updated dependency package version requirements.
Minor changes:
- Converted
mapCells
andstandardizeCells
functions to S4 methods that work on character class. We may define methods for these generics that work on classed objects inside the DepMapAnalysis package.
Initial release.