Major changes:
- Removed magrittr from package.
- Updated Acid Genomics dependencies.
- Now requiring R 4.3.
Major changes:
- AcidPlots is now included as a default package.
- Simplified the package by making it simply attach the core Acid Genomics packages and relevant Bioconductor packages. Functions are no longer reexported in the NAMESPACE. For development, we recommend that you now import code from the primary packages rather than relying on the basejump NAMESPACE.
Minor changes:
- Updated NAMESPACE for compatibility with Bioconductor 3.17.
- Updated version pinnings on Acid Genomics dependencies.
Minor changes:
- Removed
mapToDataFrame
from NAMESPACE, which will be made defunct in a pending update to AcidPlyr and AcidGenerics packages. We recommend usingrbindToDataFrame
instead.
Minor changes:
- Removed
requireNamespaces
from reexports, as this function will be migrated from AcidBase to goalie in a pending update.
Minor changes:
- Updated package dependencies, now requiring Bioconductor 3.16.
minorVersion
has been removed in favor ofmajorMinorVersion
.- Now including
euclidean
andzscore
as reexports from AcidGenerics. Methods for these are primarily defined in AcidBase.
Minor changes:
- Hardened dependency version cutoffs.
- Added lintr exclusions for magrittr imports.
- Updated roxygen2 documentation.
Major changes:
- R 4.2 / Bioconductor 3.15 is now required.
- Reduced the number of packages imported: grDevices, grid, utils.
- No longer reexporting S4 classes or coercion methods.
- Removed AcidCLI as a core package, as this shouldn't really be used in basic analysis scripts.
- Removed dependencies on data.table and tibble.
Minor changes:
- Now exporting
droplevels2
, to avoid method collisions with changes inDataFrame
class handling introduced in Bioconductor 3.15. - Updated dependencies on AcidExperiment, AcidGenomes, AcidSingleCell, and pipette, to resolve breaking changes introduced by Bioconductor 3.15.
- Reworked data.table and tibble reexports, which are no longer defined in pipette package.
Major changes:
- Reverted back to reexporting all useful magrittr pipes, for convenience.
- Migrated some single-cell RNA-seq functions from pipette to AcidSingleCell,
which are now reexported here in basejump. These include:
CellCycleMarkers
,CellTypeMarkers
,importCellCycleMarkers
,importCellTypeMarkers
. - Reexporting new classes now defined in AcidSingleCell (previously in
pointillism):
CellCycleMarkers
,CellTypeMarkers
,KnownMarkers
.
Minor changes:
- Removed from reexports:
compressExtPattern
,extPattern
(removed from AcideBase; refer to goalie), andlocalOrRemoteFile
(removed from pipette). - Now reexporting
GenomicRanges
,GenomicRangesList
,IntegerRanges
virtual classes, instead ofGRanges
,GRangesList
,IRanges
, respectively. Downstream S4 methods should be declared against these virtual classes, and never againstDFrame
directly (useDataFrame
virtual class), for example. - Updated supported S4 coercion methods via
as
from pipette package.
Minor changes:
- Reworked method exports for
aggregate
andaggregateCols
, following update in AcidSingleCell that improves the consistency ofaggregate
methods. - Updated dependency version cutoffs of other packages.
New functions:
- Added some reexports from AcidCLI:
abort
,inform
,warn
.
Removed functions:
- Removed some deprecated functions:
matchArgsToDoCall
,matchEnsemblReleaseToURL
,matchHumanOrthologs
,matchesInterestingGroups
,metadataBlacklist
,multiassignAsEnvir
,readSampleData
,readTx2Gene
,sortUnique
,toStringUnique
, andunlistToDataFrame
. - Removed reexports from methods package:
as
,formalArgs
,is
,new
, andvalidObject
.
Minor changes:
- Updated dependency version cutoffs.
Minor changes:
- Updated dependency version cutoffs.
- Removed
mapEnsemblBuildToUCSC
andmapUCSCBuildToEnsembl
, which are no longer exported in the AcidGenomes package.
Minor changes:
- Added
simpleClass
reexport from AcidBase package update. - Updated dependency package version cutoffs.
Minor changes:
- Updated
median
andquantile
reexports to import from AcidGenerics instead of AcidBase. This now refers tomedian
andquantile
generics defined in IRanges package, which is useful for handling ofNumericList
objects. - Updated dependency version cutoffs.
Minor changes:
- Removed magrittr as an import, and removed magrittr pipes as reexports. Base R now supports native pipe in 4.1 release.
Minor changes:
- Including useful stringr reexports that are used in a number of downstream packages.
Minor changes:
- Including additional reexports used in pointillism package.
Minor changes:
- Deprecated
metadataBlacklist
in favor ofmetadataDenylist
. - Updated dependency version cutoffs.
Minor changes:
- Updated dependency versions.
- Including some additional reexports from AcidGenomes, such as
EntrezGeneInfo
.
Minor changes:
- Including
seqnames
in reexports, which is used in bcbioRNASeq.
Minor changes:
- Including
getListElement
from S4Vectors via AcidGenerics, which is used in bcbioRNASeq.
Minor changes:
- Including
URLencode
from utils via AcidBase as a reexport, which is used in the Cellosaurus package.
Minor changes:
- Now reexporting
mapToDataFrame
andrbindToDataFrame
from AcidPlyr. Note thatunlistToDataFrame
is now deprecated, in favor ofmapToDataFrame
, and will be removed in a future release.
Minor changes:
- Reexporting
formula
anduntar
, which are used in WormBase package. - Ensure grDevices, stats, and utils reexports inherit from AcidBase.
- Updated dependency version cutoffs.
Minor changes:
- Added more Bioconductor reexports from BiocGenerics and S4Vectors.
- Updated dependency versions.
Minor changes:
- Migrated reexports of IRanges from pipette to AcidGenerics.
Minor changes:
- Now reexporting
end
,start
, andwidth
from AcidGenerics.
Minor changes:
- Reworked the NAMESPACE to reduce the number of imported packages.
Minor changes:
- Now reexporting additional classes defined in IRanges via pipette:
CharacterList
,FactorList
,IntegerList
,LogicalList
,NumericList
, andRleList
.
New functions:
- Reexporting some functions used downstream in DESeqAnalysis:
capture.output
,cbind
,getS3method
,model.matrix
,rbind
, andrelevel
.
New functions:
- Added more reexports from base/recommended R packages, including grDevices, grid, methods, stats, and utils.
Minor changes:
- Now re-exporting some other useful functions from SummarizedExperiment and SingleCellExperiment, which are defined in AcidExperiment and AcidSingleCell respectively.
Reworked package to simply inherit and reexport functions from other Acid Genomics packages, rather than defining any code directly here.
Major changes:
camelCase
andupperCamelCase
now default tostrict = TRUE
, reflecting change implemented in syntactic v0.4.4 release.
Minor changes:
- Consolidated S4 generic imports into AcidGenerics, removing import dependency on BiocGenerics.
- Removed reexport of
mapUCSCBuildToEnsembl
, defined in AcidGenomes.
Minor changes:
- Added
AsIs
method support forsem
, which is required for dplyrmutate
calculations on numeric classes. Note that thenumeric
method is defined in the AcidBase package.
Minor changes:
- Updated dependency version cutoffs. Applies primarily to cli and magrittr.
- Now rexporting new
sem
function from AcidBase.
Minor changes:
- Migrated
transmit
to pipette package. Still reexporting here.
Minor changes:
- Reworked
makeSummarizedExperiment
andmakeSingleCellExperiment
generators to generate empty objects without requiring any assays to be defined.
Migrated some functions to AcidBase, AcidGenomes, AcidPlyr, and pipette packages.
Major changes:
Ensembl2Entrez
: Improved internal matching engine, to allow reverse matching of Entrez identifiers to Ensembl identifiers.- Now exporting
Entrez2Ensembl
, which works likeEntrez2Ensembl
. - New S4 class
Entrez2Ensembl
, which inheritsEnsembl2Entrez
structure.
Minor changes:
makeSummarizedExperiment
: No longer requiring primaryassay
defined to be named "counts". This isn't appropriate forSummarizedExperiment
objects defined in the new DepMapAnalysis package.makeGRangesFromEnsembl
: IncludegeneSynonyms
column for supported organisms, including Homo sapiens.- Deprecated
matchEnsemblReleaseToURL
in favor ofmapEnsemblReleaseToURL
.
Minor changes:
makeProtein2GeneFromEnsembl
: Improved error message on match failure. Now returns the protein IDs that failed to match more clearly.
New classes:
Protein2Gene
:DataFrame
withproteinID
,geneID
, andgeneName
columns. Use correspondingmakeProtein2GeneFromEnsembl
to generate object simplify using Ensembl protein IDs as input.
New functions:
getEnsDb
: Now exporting function that was used internally to obtainEnsDb
object from AnnotationHub.makeProtein2GeneFromEnsembl
: New utility function that takes Ensembl protein identifiers as input, and returns corresponding gene identifiers and gene names (i.e. HUGO gene symbols).
Minor changes:
- Reworked some internal code in
makeGRangesFromEnesmbl
to enable export of newgetEnsDb
function. - Reworked internal handling of AnnotationHub and EnsDb metadata.
makeGRangesFromEnsembl
/getEnsDb
: Improved sorting of Ensembl releases so that current release greater than 99 returns as expected. Since Ensembl is now at 101, we need to convert to integers internally instead of attempting to sort on strings.
New functions:
splitByLevel
: Easily split a data frame into a list using a defined factor column (f
argument). Can easily include the reference level withref = TRUE
, which is useful for statistical calculations on pairwise contrasts.
Minor changes:
- Migrated some globals to acidbase package, for improved consistency.
New methods:
intersectAll
: Definedlist
method.intersectionMatrix
: Definedlist
method.
Minor changes:
- Now reexporting
requireNamespaces
from acidbase package.
Minor changes:
- Migrated
alphaThreshold
andlfcThreshold
methods to DESeqAnalysis package. These are not used in other packages and may not be generally applicable toSummarizedExperiment
class, so rethinking here.
Minor changes:
- Maintenance release, updating minimum R dependency to 4.0.
Minor changes:
autopadZeros
: Migrated character method support to syntactic package, since this is useful for low-level code run inside koopa.
Minor changes:
aggregate
,aggregateCols
,aggregateRows
: Relaxed assert checks on validity of dimnames, so we can use internally in acidgsea package, which needs to handle gene symbols containing syntactically invalid hyphens.
Minor changes:
convertGenesToSymbols
: Added method support forGRanges
class objects. Automatically sets names as unique gene symbols.HGNC2Ensembl
: Enforce TSV handling internally inimport
call.MGI2Ensembl
: Fix for column name handling.
Minor changes:
importSampleData
: Thepipeline = "cpi"
option is now defunct. Usepipeline = "none', sheet = 2L
for CPI samples.importSampleData
: Added optionalautopadZeros
argument for easy handling of sample identifiers, which are often input by users without correct padding. This helps easily convert sample_1, sample_2, ... sample_10 to expected sample_01, sample_02, ... sample_10 sorting. Currently disabled by default.importTx2Gene
: AddedignoreGeneVersion
option, now enabled by default. This helps process gene identifiers by default in a manner suitable for downstream tximport-DESeq2 workflow.headtail
: Removed Unicode support in favor of simple ASCII return, to avoid build warnings in latest R 4.0 release version.- Miscellaneous unit test updates to reflect changes in
DataFrame
class and NCBI server updates. camelCase
,dottedCase
,organism
,snakeCase
,upperCamelCase
: S4 methods forDataFrame
are now defined directly againstDataFrame
, instead of attempting to inherit fromDataTable
virtual class. This will break otherwise on bioc-devel 3.12, which seems to have changed inheritance.
Minor changes:
matchEnsemblReleaseToURL
,matchHumanOrthologs
: update unit tests to reflect Ensembl server migration, which has rendered Ensembl archives inaccessible via biomaRt until March 24th. Unit tests now check against current release instead of a pinned archive release.
Minor changes:
makeSingleCellExperiment
,makeSummarizedExperiment
: RemovedspikeNames
support due to a breaking change in SingleCellExperiment, which has removedisSpike
in favor ofaltExps
. Refer to the SingleCellExperiment documentation for updated current best practice for handling spike-in transcripts, which now requires slotting a separate SummarizedExperiment object containing things like ERCCs inside the main SingleCellExperiment.
Minor changes:
- Migrating
pseudobulk
methods to pointillism package. Rethinking the approach used here to work better with per-cluster aggregation operations. Will be updated in the next pointillism release.
Major changes:
- Reworked
aggregate
,aggregateCols
,aggregateRows
support to reflect internal migration away from Matrix.utils dependency.
Minor changes:
- NAMESPACE fix for unexpected removal of Matrix.utils from CRAN. Now defining the previously imported aggregate.Matrix function directly here in basejump.
aggregate
: Now defining matrix method.aggregate*
generics now consistently use "x" instead of "object".
Major changes:
- Migrated functions from brio (now renamed to pipette), freerange, and transformer, in preparation for Bioconductor submission and reviews.
- Updated messages to utilize the cli package.
New functions:
integerCounts
: Simple method support for returning a rounded integer counts matrix. Intended primarily for downstream handoff to bulk RNA-seq differential expression callers, such as DESeq2.
Minor changes:
- Updated dependency package version requirements. Needed primarily for bug
fix in transformer, improving
mutateAll
functionality.
Minor changes:
- Updated package documentation to support roxygen2 7.0 update.
- Reworked
formalsList
global slightly. Using thesynesthesia
color palette forheatmap.color
argument doesn't always perform well enough, so I'm switching to a blue/black/yellow palette defined byblueYellow
in acidplots instead forheatmap.color
. Thesynesthesia
palette performs really well for correlation heatmaps, and is now recommended by default via theacid.heatmap.correlation.color
global now instead. filterCells
: Improved internal sampleName / sampleID handling.
New functions:
correlation
: Added S4 method support that mimics basecor
methods, but is more flexible, supporting additional arguments via...
in generic. This way we can provide intelligent and quick correlation calculations for nested assays inside aSummarizedExperiment
and forDESeqResults
(see DESeqAnalysis package).
Major changes:
- Updated Bioconductor dependencies to require new 3.10 release.
Bug fixes:
filterCells
requires an internaldecode
step to handleRle
evaluation, which worked previously in Bioconductor 3.9 release.- Updated unit tests to reflect
SingleCellExperiment
example object resave in acidtest 0.2.7 update, which changed the numbers.
New reexports:
- Reexporting
metadata2
andmetadata2<-
functions from transformer. These will be used internally in the pending DESeqAnalysis update.
New functions:
- Genome version detection:
ensemblVersion
,gencodeVersion
,refseqVersion
,flybaseVersion
,wormbaseVersion
. Similar shell variants are available in the koopa package.
Minor changes:
- Moved some low-level functions to new acidbase package. Updated NAMESPACE and reexports to reflect these changes.
Minor changes:
importSampleData
: Pipeline now defaulting to "none" instead of "bcbio", since this flag is now properly hard coded in bcbio R packages. Added new Constellation (CPI) pipeline option.makeSampleData
: Now checks for all NA columns and rows, similar to approach inimportSampleData
. This helps improve return consistency. Automatic rowname setting has been tweaked a bit to no longer attempt to remove original ID column.
Minor changes:
makeSampleData
: Made function slightly more flexible. Now allowing automatic rowname coercion from columns ("sampleID", "rowname", "rn"), similar to approach employed by data.table and tibble packages.- Now exporting
stripGeneVersions
alias, which uses the same code internally asstripTranscriptVersions
.
Disabled methods:
- Disabled DelayedArray class methods for
calculateMetrics
,estimateSizeFactors
, andnonzeroRowsAndCols
untilis_pristine
bug in DelayedArray v0.11.8 is fixed on Bioconductor Devel (3.10). This is causing unit tests to fail otherwise. See related issue
Minor changes:
- Migrated S4 methods from syntactic that work on Bioconductor classes here. This keeps the syntactic package very light weight and focused only on character string sanitization.
Major changes:
melt
: Updatedmin
andminMethod
defaults formatrix
method. Themin
argument now defaults to-Inf
, and theminMethod
now defaults to"absolute"
, instead of"perRow"
, since this behavior is more intuitive to the user.
Minor changes:
nonzeroRowsAndCols
: Addedassay
argument, switching from internalcounts
usage, to make theSummarizedExperiment
method more flexible.
Minor changes:
mcolnames
: Moved S4 methods previously defined in syntactic here.- Improved website documentation.
Major changes:
autopadZeros
: Added improved support for detection and automatic handling of zeros in need of padding on the left side of a character vector. This addition is necessary for handling of Genewiz processed FASTQ file names.makeTx2Gene
functions now supportignoreTxVersion
argument, similar to conventions defined in tximport package.
Minor changes:
importSampleData
: Updated to tentatively support a general pipeline via the "none" argument. In this case, only asampleID
column is required in metadata. This is been developed in conjunction with my new Genewiz to kallisto Nextflow processing pipeline being implemented at CPI.
Minor changes:
melt
: Added method support for contingencytable
class.- Removed
set_*
reexports from magrittr packages.
Updated R dependency to 3.6.
New functions:
melt
: Added S4 methods for melting data into long format. Provides support formatrix
,Matrix
,DataFrame
,SummarizedExperiment
, andSingleCellExperiment
currently.nonzeroRowsAndCols
: Quickly remove non-zero rows and columns from a matrix orSummarizedExperiment
.
Major changes:
- Migrated
EggNOG
andPANTHER
S4 classes to separate packages.
Minor changes:
calculateMetrics
: Now callsnonzeroRowsAndCols
internally first whenprefilter = FALSE
, speeding up calculations significantly for very largeSingleCellExperiment
objects. This was added to improve loading of example unfiltered 10X Genomics Chromium data.
Deprecations:
- Deprecated
readSampleData
andreadTx2Gene
in favor ofimportSampleData
andimportTx2Gene
, respectively.
Minor changes:
makeSummarizedExperiment
: Now automatically handles non-Ensembl gene symbols present in assays that aren't defined inrowRanges
. This applies primarily to 10X Cell Ranger v3 output, which includes some non-Ensembl gene symbols: CD11b, CD4, CD8a, HLA-DR, IgG1, PD-1, etc. The function still intentionally errors on unannotated Ensembl features, which often indicates an accidental release version mismatch.- Updated dependency versions.
Minor changes:
meltCounts
: Added initial method support forSingleCellExperiment
. Currently requires deparsing of the count matrix to callreshape2::melt
. Returns columns with S4 run-length encoding (Rle) applied.mapGenes
: Converted warning to message whenstrict = FALSE
.
New functions:
- Migrated
readSampleData
andreadTx2Gene
from bcbioBase. - Reworked
readSampleData
internal code, but still supporting bcbio pipeline conventions (i.e. "description" column for samples) as the default. I've reworked this approach so we can also callreadSampleData
inside the Chromium package (for 10X Genomics single-cell RNA-seq data) without having to depend on the bcbio R packages. - Reexporting new dplyr-like methods that support S4
DataFrame
:inner_join
,left_join
,right_join
,full_join
,anti_join
;mutate_all
,mutate_at
,mutate_if
;select_all
,select_at
,select_if
.
Major changes:
- Removed dplyr and magrittr dependencies in internal code, where applicable.
aggregateCols
: Sped up return by callingSingleCellExperiment
rather thanmakeSingleCellExperiment
. Note that this now doesn't return session information in the object.cell2sample
: Renamed return fromtibble
totbl_df
, for consistency.- Made some previous deprecated functions still used in bcbioRNASeq v0.2 release
series (which has now been updated to v0.3) defunct. This applies primarily
to
assert*
functions that have been reworked using a goalie approach. estimateSizeFactors
: Removed option to calculate "deseq2-median-ratio" using DESeq2. We may revisit this idea in a future release.makeSampleData
: Improved internal code forDataFrame
method.
Minor changes:
- Updated basejump dependencies, in preparation for bioconda release update supporting bcbioRNASeq.
- Now importing AnnotationDbi, BiocParallel, Biostrings, and biomaRt.
- Safe to import
select
from AnnotationDbi, now that we're no longer depending on dplyr. - Switched to using
droplevels
instead ofrelevel
on S4 objects internally, where applicable. This is better supported by S4Vectors package. - Simplified reexport documentation for S4 functions and methods.
Minor changes:
filterCells
: Improved downstream handling ofnCells
argument. Ensure double filtering is still allowed.
Minor changes:
- Updated goalie dependency.
- Tightened up
appendToBody
andmethodFormals
calls to have better backward compatibility with R 3.5. calculateMetrics
: Bug fix for gettingrowData
viamcols
withoutuse.names = TRUE
. This improves Bioconductor backward compatibility. Also updated internal code to callhasMetrics
from goalie.- Made
separatorBar
defunct. Useseparator
function instead.
New functions:
calculateMetrics
: Migrated code here from bcbioSingleCell. Improved method to supportDelayedArray
class for large matrices.
Minor changes:
- Improved code coverage and adjusted unit tests for breaking changes seen due to new covr update.
Major changes:
makeSampleData
: Switched to S4 method that works ondata.frame
andDataFrame
class objects. Enforcing lower camel case for column names.makeSummarizedExperiment
andmakeSingleCellExperiment
: Switched to S4 method approach that currently requiresSimpleList
input for assays. Previously, we allowedlist
input for assays, but we're tightening this up to simply the package code base and improve consistency.aggregate
methods now consistently return the primary assay named ascounts
. This follows the recommended conventions defined in SingleCellExperiment. Aggregation functions will now intentionally fail forSummarizedExperiment
objects that don't contain an assay namedcounts
.
Minor changes:
- Improved documentation consistency by offloading
params.Rd
file to new acidroxygen package. This will be linked in the other Acid Genomics packages. - Updated unit tests to follow new package conventions (see above changes).
100th release!
Minor changes:
mapCellsToSamples
: Relaxed grep matching oncells
input to support legacy bcbioSingleCell objects. This change was needed to improveupdateObject
method in the upcoming bcbioSingleCell update.
Minor changes:
- Reexporting new functions in syntactic:
makeLabel
,makeTitle
,makeWords
. - Updated documentation to include modification timestamp.
Minor changes:
- Added back deprecated assert checks that are required for bcbioRNASeq v0.2 release series.
Start of new release series. Version bump reflects changes in dependency packages. See the acidtest, bioverbs, freerange, syntactic, and transformer release notes for more details.
Minor changes:
- Bug fix for
combine
method on SummarizedExperiment. Needed to ensure row names are assigned on rowData to provide backward compatibility for Bioc 3.7. - Improved unit test exceptions on Docker and AppVeyor.
- Improved installation instructions.
mapGenesToRownames
: Improved matching forSummarizedExperiment
objects that don't contain gene-to-symbol mappings defined inrowData
.meltCounts
: Improved factor handling. Also addedmatrix
method support. Added advanced option to disableminCounts
filtering, by setting asNULL
.
Deprecations:
- Tightened up the list of deprecated functions.
Minor changes:
- Made
theme_midnight
andtheme_paperwhite
defunct, in favor if variants in new acidplots package. - Decreased the number of functions reexported from goalie package.
- Updated basejump dependency package versions.
- Improved Travis CI Docker configuration.
New functions:
matchEnsemblReleaseToURL
: Takes an Ensembl release version (e.g.96
) as input and returns the corresponding archive URL.matchHumanOrthologs
: Convenience function that wraps biomaRt package to map model system gene identifiers to HGNC IDs and symbols. This is particularly useful for running orthologus GSEA with our pfgsea package.
Major changes:
combine
method forSummarizedExperiment
now includes all matrices defined inassays
slot. Also improved support forcolData
handling on subsets whereNA
values have been removed.
Minor changes:
Gene2Symbol
: Modifiedformat
formal to use "unmodified" instead of "long", which is more intuitive.makeGene2Symbol
: Added support forformat
argument, similar toGene2Symbol
generator function.
Minor changes:
- Relaxed the deprecations on some functions to provide backward compatibility
support for bcbioBase and bcbioRNASeq packages:
readFileByExtension
,readYAML
,fixNA
. - Now ensuring
theme_midnight
andtheme_paperwhite
are deprecated but exported with support, by suggesting acidplots package. - Added back defunct function warnings:
assertHasRownames
,tx2geneFromGFF
.
Major changes:
- Now pinned to R >= 3.5.
New functions:
- Added method support for
alphaThreshold
andlfcThreshold
againstAnnotated
class. These values get stored in themetadata
slot of the object.
New functions:
showHeader
: Utility function forshow
methods defined in other packages.
Minor changes:
- Updated basejump dependencies (see
DESCRIPTION
for details).
Minor changes:
- Reworked S4 generic reexport method, in an attempt to get pkgdown to build
vignette correctly. Otherwise,
sampleData
is erroring.
Major changes:
meltCounts
: Switched from usingnonzeroGenes
formal approach tominCounts
andminCountsMethod
, which is more flexible.
Minor changes:
- Consolidated minimalism and firestarter code into acidplots package.
- Improve global options defined in
formalsList
. - Added back basejump vignette.
Minor changes:
- Bug fix release for freerange update.
emptyRanges
format has been renamed frommcolsNames
tomcolnames
.
Major changes:
aggregateRows
,aggregateCols
,aggregateCellsToSamples
: Improved internal code for SummarizedExperiment metadata handling. Applies primarily to colData and rowData juggling for these methods.interestingGroups
: Reworked to define method againstAnnotated
class.SummarizedExperiment
inherits from this class, supportingmetadata
.mapCellsToSamples
: Tightened up match assert checks.mcolnames
now uses S4 methods, primarily againstVector
class.organism
: Reworked S4 methods. Added support forAnnotated
andDataTable
classes from S4Vectors.
Minor changes:
- Added additional unit tests, to improve code coverage.
- Now covering the aggregation functions in better detail, using a minimal
SingleCellExperiment object that works with
aggregateCols
. - Consistently use "acid" prefix instead of "basejump" for global options.
- Miscellaneous working example improvements.
detectLanes
: Renamed primary argument fromobject
topath
. Improvedpattern
formal to evaluatelanePattern
global.- For S4 generators, renamed test paramter from
basejump.test
toacid.test
. sampleData
: Improved error message whensampleName
factor column is missing.zeroVsDepth
now returnsdepth
column asinteger
instead ofnumeric
.
New functions:
rankedMatrix
: New utility function for quickly performing ranked matrix calculations. Particularly useful for differential expression comparison across studies using log2 fold change or Wald test statistic.
Minor changes:
- Simplified reexport methods for functions that define S4 methods.
Major changes:
- Split out Ensembl (AnnotationHub/ensembldb) annotation processing and GFF/GTF
file loading utilites to new freerange package. All of these functions
remain re-exported here in basejump. This includes:
annotable
,convertUCSCBuildToEnsembl
,detectOrganism
,emptyRanges
,makeGRangesFromEnsDb
,makeGRangesFromEnsembl
,makeGRangesFromGFF
,makeGRangesFromGTF
. - Split out heatmap functions to firestarter package. This includes
plotHeatmap
,plotCorrelationHeatmap
, andplotQuantileHeatmap
. - Split out all ggplot2 functions to new minimalism package.
Minor changes:
- Offloaded
removeNA
andsanitizeNA
code to brio package, so these functions can be imported in thew new freerange package. - Moved
organism_mappings
internal dataset to freerange package.
Minor changes:
convertSampleIDsToNames
: Removed code to assignsampleName
column toNULL
. This step doesn't work consistently forDataFrame
across Bioconductor installations, and has been found to error on R 3.4 and the current bioc-devel on AppVeyor.- Miscellaneous documentation fixes, removing extra formatting in titles.
makeGRangesFromGFF
: Compressed Ensembl GTF file example was erroring out on AppVeyor CI, due to Windows' poor handling of temp files on non-admin accounts. Switched to a non-gzipped example file to avoid this issue. Also removed tabular table from documentation, which currently doesn't render correctly via pkgdown.
Minor changes:
- Migrated code to Acid Genomics.
Minor changes:
- Additional bug fixes for
sampleData
and blacklisted metadata handling.
Minor changes:
- Updated dependencies, specificially brio and goalie.
- Miscellaneous documentation improvements.
Major changes:
makeGRangesFromGFF
: Reworked internal code, making it more modular. Added initial support for RefSeq GFF3 files. Also improved sanitization and special handling of files from FlyBase and WormBase.
Minor changes:
plotHeatmap
family: Bug fix needed for internalis.na
call onannotationCol
, which should be wrapped withany
to return boolean. This errors (as it should) on R 3.6, but I missed it on R 3.5.- Consolidating GRanges return code defined in
.makeGRanges
, which is run for both GFF file and ensembldb import. We've improved the Rle encoding steps here to work with complex GFF3 files (e.g. GENCODE).
Minor changes:
- Updated transformer package reexports to include new data.table coercion methods.
- Added additional useful compression function reexports from brio package.
Minor changes:
- Working on making the current basejump code base completely backward compatible with bcbioBase v0.4.1 and bcbioRNASeq v0.2.8 release series.
- Now reexporting
goalie::bapply
and additional useful pipes from magrittr package. - Keeping the now deprecated
plotGene
generic reexported, while encouring users to update their code to useplotCounts
instead. interestingGroups
: Simplified internal assert checks, removingmatchesInterestingGroups
, which can become circular.
Minor changes:
- Code fixes to provide backward compatibility support for R 3.4. Tested using R 3.4.1 with Bioconductor 3.6 release.
- Needed to add
unname
to some assert checks for expectedlogical(1)
return, which only happens in R 3.4 but not R 3.5. uniteInterestingGroups
: Improved internal assert checks and name handling.
Minor changes:
- Deprecating
plotGene
in favor ofplotCounts
. This change will be reflected in future updates of packages that depend on basejump, including the bcbio R packages. - Split out subpackage reexports into separate files.
- Reexporting
assert
from goalie package.
Minor changes:
decode
andencode
are properly reexported from brio.- Updated Travis CI and AppVeyor CI configurations.
Minor changes:
- Note that S4Transformer package import has been renamed to transformer.
- Needed to add
decode
call internally for some plotting functions, to ensure that run-length encoded (Rle) rowData gets handled correctly. - Bug fix for internal
interestingGroups
handling in plot functions.
Offloaded to brio:
decode
,encode
. These are useful for data sanitization. Still re-exported here in basejump.
Offloaded to goalie:
printString
. This is a low-level function that is useful for setting the cause attribute in error messages. Still reexported here in basejump.
Deprecations:
sanitizeRowData
has been deprecated in favor ofatomize
.sanitizeAnnotable
deprecation has been updated to point toatomize
.
Minor changes:
- Updated basejump subpackage dependencies.
- Needed to add
decode
call internally for some plotting functions, to ensure that run-length encoded (Rle) rowData gets handled correctly. - Bug fix for internal
interestingGroups
handling in plot functions.
Offloaded to brio:
decode
,encode
. These are useful for data sanitization. Still re-exported here in basejump.
Offloaded to goalie:
printString
. This is a low-level function that is useful for setting the cause attribute in error messages. Still reexported here in basejump.
Deprecations:
sanitizeRowData
has been deprecated in favor ofatomize
.sanitizeAnnotable
deprecation has been updated to point toatomize
.
Minor changes:
- Consolidated reexports from basejump sub-packages into
reexports.R
file. - Miscellaneous documentation updates, improving link appearance for functions exported in other packages.
Minor changes:
- Reorganized imports in
DESCRIPTION
file to make them more human readable. Note that basejump sub-packages are imported first, then Bioconductor packages, followed by CRAN packages, and required default packages. - Split out NAMESPACE imports into a separate
imports.R
file.
This release defines the initial point where basejump becomes even more modular, offloading some functions to new brio, syntactic, and S4Transformer packages.
Note that all offloaded functions will continue to be reexported in basejump. If you notice a function that is missing and not correctly re-exported, please file an issue.
Note that S4Transformer has since been renamed to transformer.
Offloaded to S4Transformer:
as
coercion methods moved to S4Transformer package. This methods define our useful interconversions between Bioconductor and tidyverse data classes, includingDataFrame
andtbl_df
(tibble).coerceS4ToList
/flatFiles
.
Offloaded to bb8:
cleanSystemLibrary
. This doesn't scale well to all installations and is really only intended for personal use, so bb8 package is more appropriate.- Documentation functions, including
parseRd
,RdTags
,saveRdExamples
, andtabular
are outside the scope of basejump.
Offloaded to brio:
basenameSansExt
.dots
.export
.import
.initDir
.loadData
.localOrRemoteFile
.pasteURL
.realpath
.sanitizeColData
.sanitizeRowData
.sanitizeSampleData
.saveData
.transmit
.writeCounts
.
Offloaded to goalie:
matchArgsToDoCall
.MethodDefinition
.standardizeCall
.
Minor changes:
- Added
nullOK
support to goalie assert checks, where applicable.
This release defines the initial point where basejump begins to import bioverbs.
Major changes:
- Now importing generics using our bioverbs S4 generic package. All generics previously defined in basejump will continue to be reexported, to maintain backward compatibility for reverse dependencies (revdeps).
Minor changes:
aggregateCellsToSamples
: Split out S4 method to a separate file. Previously was defined inaggregate-methods.R
.- Reorganized collapse family of functions. Refer to changes in
collapse-methods.R
, which is now split out tocollaseToString-methods.R
. - Split out the markdown family of functions back out into separate files.
matchesGene2Symbol
,matchesInterestingGroups
: Reworked internal code and moved to separate files. No longer relies uponmakeTestFunction
from checkmate package.
Documentation:
- Switched documentation titles to sentence case from title case. It's generally more readable.
I bumped the release series from v0.8 to v0.9 because this represents a significant change to the internal codebase, where I have now switched to using my new goalie assert check engine from assertive.
New functions:
decode
: Decode S4 run-length encoding (Rle).encode
: Apply S4 run-length encoding (Rle).geneNames
: Convenience function that returns gene names (symbols) mapped to the stable, but not human-friendly gene identifiers.matchesGene2Symbol
,matchesInterestingGroups
: New functions designed to match correspondingGene2Symbol
objects orinterestingGroups
.pasteURL
: Convenience function that generates URL strings.sanitizeColData
: rework of previoussanitizeSampleData
approach.
Major changes:
- Now using goalie package instead of assertive for internal assert
checks. The new
goalie::assert
function is more flexible in many cases. Similarly,goalie::validate
is now being used in place ofassertthat::validate_that
. plotHeatmap
now calculates the z-score normalization internally, rather than relying upon the codebase inside pheatmap.export
: ImprovedSummarizedExperiment
method to also writeGene2Symbol
andEnsembl2Entrez
mappings to disk, when defined. The human-friendly output formal has been renamed fromhuman
tohumanize
, to reflect an action (verb). This corresponds better to ourhumanize
generic function. Also reworked some internal code that handles output of colData and rowData to disk.makeGRangesFromEnsembl
: Switched to using S4 run-length encoding (Rle) in our metadata column (mcols) return. This functionality matches the conventions used by GenomicRanges in theGRanges
return, and reduces the memory footprint of very large annotation objects.
Minor changes:
- Note that some internal instances of
has_length
aren't quite strict enough. Switch to usinglength(x) > 0L
or improvedhasLength
assert check defined in the goalie package. geneSynonyms
: Switched to using newpasteURL
function internally instead of usingpaste
with/
separator.HGNC2Ensembl
generator: switch to usingpasteURL
internally.loadData
, and other related load family functions: simplified internal code using our new goalie asserts.
Deprecations:
- Removed
assertFormalGene2symbol
from deprecations.
Documentation:
- Improved documentation style throughout the package, switching from the usage
of scalar types like
string
tocharacter(1)
, andboolean
tological(1)
. This better matches the actual data structure in R. Some other packages like checkmate also use this convention, which I think is more readable than my previous approach.
New functions:
deg
: Utility function to quickly obtain differentially expressed gene (DEG) identifiers as a character vector.
Major changes:
plotHeatmap
: Now defining row scaling internally, rather than relying on the functionality defined inpheatmap
.
New functions:
humanize
. New generic that enables easy conversion to human-friendly column and/or row names. Useful for CSV file export in particular.
Minor changes:
export
: Addedhumanize
argument support.- Improved grep pattern matching inside
makeNames
functions (e.g..sanitizeAcronyms
). sampleData
: Improved blacklist pattern matching againstSeurat
objects forSingleCellExperiment
method.
Minor changes:
- Now importing
hasUniqueCols
from goalie. Switched from previous approach usingareSamplesUnique
. - Documentation improvement for
genomeBuild
, used inmakeGRanges
functions. - Miscellaneous documentation improvements to pass build checks.
New functions:
relevelRowRanges
,relevelColData
.
Deprecations:
markdownPlotlist
. Renamed tomarkdownPlots
.
Minor changes:
autopadZeros
: Improved internal code to keep track of names forcharacter
method. Also added method support forSummarizedExperiment
, which works on the column names (e.g. sample names) only by default.Ensembl2Entrez
: Simplify validity check to requireinteger
inentrezID
column. Also reworked and improved internal code that supports run-length encoding (Rle) usingdecode
.export
: Improved documentation forname
argument.makeGRangesFromEnsembl
: Improved messages to user.plotCountsPerBiotype
: Improved error messages for when the biotype isn't defined.plotPCA
: Add support for unique sample detection withareSamplesUnique
, similar to the approach used in the bcbioRNASeq quality control R Markdown.sampleNames
: SimplifiedsampleName
extraction, using[[
internally.- Miscellaneous documentation fixes.
- Updated unit test for
sanitizeRowData
.
New functions:
autopadZeros
. Useful for padding zeros inside of a character vector.basenameSansExt
. Quickly get the basename without the file extension for desired file path(s). Surprisingly, this isn't defined in the tools package so I wrote my own.cleanSystemLibrary
. I ran into some shared library configuration issues on the Azure infrastructure, so this utility function is useful for checking wheter an R installation has a clean library.plotGenderMarkers
: Migrated theSummarizedExperiment
method from bcbioRNASeq package.
Minor changes:
- Split out call functions defined in
calls.R
into separate R files. Refer todots
, for example. - Split out
environment.R
into separate R files. SeedetectHPC
for example. - Miscellaneous documentation improvements.
Minor changes:
- Improved usage of
assertthat::validate_that
in S4 class validity checks. Removed former approach using internal.valid
function. - Temporarily soft deprecated some functions that will be formally deprecated
in a future release. See
deprecated.R
file. interestingGroups
doesn't attempt validity check usingvalidObject
by default, which can be enabled instead usingcheck = TRUE
.sampleData
: Tightened up internal assert checks.PANTHER
: Minor tweaks to internal variable names inside.splitTerms
.transmit
: Improved messages. Temporarily disabled working example, since it consistently fails on Travis CI.
Major changes:
Migrating some additional base code that can be dispatched on either
SummarizedExperiment
or SingleCellExperiment
from the bcbio R packages.
We're going to split out some of the single-cell RNA-seq functionality into
separate packages, since a lot of my work moving forward deals with the 10X
Genomics Cell Ranger platform, rather than the bcbio-supported inDrops platform.
In particular:
barcodeRanksPerSample
.filterCells
.plotBarcodeRanks
.plotCellCounts
.plotGenesPerCell
.plotMitoRatio
.plotMitoVsCoding
.plotNovelty
.plotReadsPerCell
.plotUMIsPerCell
.plotUMIsVsGenes
.
Minor changes:
- Migrated some
plotHeatmap
andplotPCA
code from bcbio R packages. - Added new
formalsList
global variable, which stashesgetOption
defaults used in some functions, namely the save/load functions and some plotting functions. - Switched to using
formalsList
along withformals
declaration internally to make the parameters more consistent across functions. - Miscellaneous fixes to working examples and documentation.
- Split out sanitization functions into separate R files (e.g.
removeNA
).
Major changes:
The split-out sub-package approach isn't working quite right, so the code base has been consolidated back into a single basejump package. Development toward splitting the package will continue, but a conceptual re-imagining of how to organize the functions is needed.
Major changes:
This release is the beginning of an attempt to rework the basejump codebase a bit and make the package easier to unit test on Travis CI. bcbio R packages will be pinned to v0.7.2 during this development period.
Here we are working to split out the functionality of basejump into several, smaller sub-packages:
- basejump.annotations
- basejump.assertions
- basejump.classes
- basejump.coercion
- basejump.developer
- basejump.experiment
- basejump.generics
- basejump.globals
- basejump.io
- basejump.markdown
- basejump.plots
- basejump.sanitization
New functions:
assertAllAreURL
.assertAllAreValidNames
.validNames
.
Major changes:
- Added a draft vignette explaining the functions available in the package.
gene2symbol
,makeGene2symbolFromEnsembl
, andmakeGene2symbolFromGFF
functions now support theunique
argument, which returns sanitized values in thegeneName
column, ensuring there are no duplicates. This is enabled by default (recommended) but can be disabled usingunique = FALSE
. This functionality was added to ensure consistent gene name handling in single-cell RNA-seq analyses.saveData
now supportsbasejump.save.ext
andbasejump.save.compress
global options, so the desired file type (e.g. RDS instead of RDA) and compression (e.g. xz instead of gzip) can be easily specified for an entire project.
Minor changes:
sampleNames
now supports assignment forSummarizedExperiment
method.- Now exporting
lanePattern
regular expression pattern as a global, which was previously defined in the bcbioBase package. - Bug fix for
as
coercion method support. Need to ensureexportMethods(coerce)
is included inNAMESPACE
file, otherwisetibble
coercion usingas(x, "tbl_df")
won't work when called from an Rscript without the package library loaded. Thanks @roryk for noticing this. - Updated
gene2symbol
generic to use...
, since we've added theunique = TRUE
argument in this release. annotable
: Moved tomakeGRanges.R
file, and improved the internal code to export supported formals also used inmakeGRangesFromEnsembl
. The function should work exactly the same as previous releases, but now with clearer supported arguments in the documentation.- Skipping code coverage for
cleanSystemLibrary
, since Travis CI installs packages into the system library, and causes this check to returnFALSE
. - Consistently using
as(from, "tbl_df")
for internal tibble coercion in all functions. geneSynonyms
andpanther
:organism
argument matching no longer suggests a default. The current list of supported organisms is in the documentation, and described in the internalmatch.arg
call.- All
SummarizedExperiment
methods usevalidObject
validity checks, where applicable. - Consolidated documentation for all
makeGRanges
,makeGene2symbol
, andmakeTx2gene
functions. - Heatmap functions: simplified the internal code responsible for defining
annotationCol
andannotationColors
automatically. sampleData
: Made validity check stricter, requiringsampleName
column to be defined, otherwise the function will intentionally error.- Now using
formals
internall to keep ggplot2 theme formals consistent. - Updated example data scripts and resaved internal data.
- Updated contribution guidelines in
CONTRIBUTING.md
file.
New functions:
cleanSystemLibrary
: Utility function to check whether a user has installed packages into the R system library. Refer to.libPaths
documentation for more information on library paths.
Major changes:
- Now using
build
instead ofgenomeBuild
for Ensembl annotation functions. ThegenomeBuild
argument still works but now will inform the user about the change.
Minor changes:
- Migrated
prepareTemplate
from bcbioBase package. Simplifed this function to copy all files insideextdata/rmarkdown/shared
within a specified package. Currently in use for bcbioRNASeq, bcbioSingleCell, and the new pointillism clustering package. - Made Ensembl release matching stricter, based on the metadata columns.
In this release, we are migrating some of the S4 generics previously exported in the bcbioBase package. We are consolidating these functions here for simplicity and stability.
New functions:
makeSummarizedExperiment
: RenamedprepareSummarizedExperiment
, previously exported in the bcbioBase package. We are using themake
prefix here for consistency (see other gene annotation functions).- Migrated S4 generics and methods from bcbioBase:
flatFiles
,metrics
,plotCorrleationHeatmap
,plotGene
,plotHeatmap
,plotQC
, andplotQuantileHeatmap
.
Major changes:
- Now using
curl::has_internet
internally to check for Internet connection. This applies to the annotation functions that query web databases. - Added coercion method support for converting a
SummarizedExperiment
to an unstructuredlist
. This is the method used internally forflatFiles
.
New functions:
matchInterestingGroups
: New developer function to automatically handleinterestingGroups
argument used across various plotting functions and in the bcbio infrastructure packges.
Minor changes:
- Migrated
separatorBar
andupdateMessage
global export from bcbioBase. ImprovedseparatorBar
appeareance to automatically scale to current session width, usinggetOption("width")
.
New functions:
convertSymbolsToGenes
: providesSummarizedExperiment
method support for converting objects containing gene symbols ("geneName") as rownames back to gene identifiers ("geneID").eggnog
: quickly download current annotations from EggNOG database. Useful for annotating gene-to-protein matches; currently in use with the brightworm RNAi screening package, which contains WormBase gene ID and EggNOG ID annotations.
Major changes:
- Now suggesting BiocManager instead of BiocInstaller for installation.
broadClass
now supportsGRanges
andSummarizedExperiment
. Support fordata.frame
and/orDataFrame
class objects has been removed.
Minor changes:
convertGenesToSymbols
andconvertTranscriptsToGenes
now haveorganism
andgene2symbol
arguments setNULL
by default.- Upgraded to roxygen2 v6.1 for documentation, which improves handling of aliases in the Rd manual files.
- Added
dplyr::pull
to reexported functions. - Improved package documentation by declaring the supported class(es) for each function argument.
- Moved
foldChangeToLogRatio
andlogRatioToFoldChange
constructors intonumeric
method declarations. - Simplified internal code for
gene2symbol
SummarizedExperiment
method. toStringUnique
now usesx
instead ofatomic
as primary argument.
Minor changes:
- Bug fix for
convertGenesToSymbols
method forSummarizedExperiment
. Previously, ifgeneName
column was a factor, this function would error. This issue has been fixed by ensuring that the symbols provided ingeneName
are coerced to a character vector. - Improved conda installation instructions.
Major changes:
- Now importing SummarizedExperiment package and providing basic method support for generics that were previously used in the bcbioBase package.
- Improved GFFv3 handling in
makeGRangesFromGFF
and other GFF utility functions, includingmakeGene2symbolFromGFF
andmakeTx2geneFromGFF
. Note thatmakeGRangesFromGFF
now returns additional metadata columns accessible withS4Vectors::mcols
, and that these columns are now sorted alphabetically.
Migrated functions:
Previously, these functions were exported in the bcbioBase package, but they provide non-bcbio-specific functionality, and should be included here in the basejump package instead:
assertFormalInterestingGroups
.gene2symbol
.interestingGroups
,uniteInterestingGroups
.sampleData
,sanitizeSampleData
.sampleNames
.selectSamples
.
Providing basic SummarizedExperiment
class method support for counts
.
Minor changes:
geometricMean
generic was not exported correctly.
Minor changes:
- Now requiring ggplot2 v3.0 internally.
theme_midnight
andtheme_paperwhite
now extendggplot2::theme_linedraw
, improving the consistency between these themes.- Example data is now consistenly formatted using snake case:
rnaseq_counts
andsingle_cell_counts
, instead of the previous camel case conventions:rnaseqCounts
,singleCellCounts
. - Camel variants of the ggplot themes are now deprecated.
- Updated internal gene synonyms data from NIH.
Minor changes:
- Removed
makeNames
argument fromreadFileByExtension
function. Use themakeNames
family of functions manually after data import instead. This helps avoid unwanted sanitization of data. - Simplified assert checks for internal load function.
- Improved code coverage.
- AppVeyor CI updates to work with Bioconductor 3.8 devel.
Minor changes:
- Markdown function consistency improvements. Now all relevant Markdown
functions use
text
as the primary argument, instead ofobject
.
Minor changes:
makeGRangesFromEnsembl
now supports remapping of UCSC genome build to Ensembl. However, this isn't recommended, and will warn the user.convertUCSCBuildToEnsembl
now returnsNULL
instead of erroring on genome build match failure.stripTranscriptVersions
now matches ".
", "-
", and "_
" version delimiters.- Made unused
dynamicPlotlist
function defunct. - Consider deprecating
assertIsCharacterOrNULL
andassertIsDataFrameOrNULL
in a future release.
Infrastructure changes:
- Reenable Travis CI blocklist, excluding develop branch from checks.
- Reorganized documentation of deprecated and defunct functions.
Minor changes:
- Tweaked gray color accents for
theme_midnight
andtheme_paperwhite
.
Minor changes:
- Fixed NAMESPACE issue with
GenomeInfoDb::seqnames
. - Improved
readGFF
working example to reflect switch toGRanges
return. - Added macOS bioc-release image to Travis CI build checks.
Major changes:
readGFF
now usesrtracklayer::import
internally to return GFF file as aGRanges
object instead of adata.frame
.
Minor changes:
assertIsGFF
andparseGFFAttributes
functions are now defunct.- Simplified internal GFF handling code for
makeGRangesFromGFF
,makeGene2symbolFromGFF
, andmakeTx2geneFromGFF
.
- Migrated
sanitizeSampleData
to bcbioBase package. - Updated Bioconductor install method for 3.7.
Minor changes:
- Improved internal S4 method code for
fixNA
andremoveNA
. - Tweaked gray accent colors in
theme_midnight
andtheme_paperwhite
. Now using British spelling internally for ggplot code. - Improved
strip.background
fortheme_paperwhite
, removing the black box around the labels when facet wrapping is enabled.
Minor changes:
- Improved documentation for assert check functions.
- Deprecated
geomean
in favor ofgeometricMean
. - Simplified internal code for
grepString
. - Added message during
hgnc2gene
call. - Miscellaneous documentation fixes.
- Moved internal constructors into the S4 method definitions, where applicable.
- Simplified default parameter definition for
panther(organism = "XXX")
. - Improved code coverage, using
nocov
where appropriate.
Minor changes:
emptyRanges
: Now usingmatch.arg
internally to captureseqname
argument.- Removed legacy
.assignCamelArgs
and.assignCamelFormals
internal functions. - Improved internal handling of XLSX files in
localOrRemoteFile
.
New functions:
emptyRanges
enables easy creation of placeholder ranges forGRanges
objects, where transgene and FASTA spike-ins are needed.hgnc2gene
enables easy mapping of HGNC to Ensembl gene identifiers.mgi2gene
enables easy mapping of MGI to Ensembl gene identifiers.panther
function enables easy querying of the PANTHER website. Human, mouse, nematode worm, and fruit fly are currently supported. The specific PANTHER release (e.g. 13) can be declared using therelease
argument. Otherwise, the function will return the most recent annotations from the PANTHER website.- Added
isURL
check function. readJSON
adds support for JSON files. Like the other read functions, it supports both local files and remote URLs.
ggplot2 themes:
theme_midnight
andtheme_paperwhite
provide minimal, high contrast ggplot2 themes with bold sans serif labels.
Major changes:
loadData
now supports.rda
,.rds
, and.RData
files. The function will error by design if multiple data extensions are detected inside the directory specified with thedir
argument.
Minor changes:
- Consolidated assert check function code.
- Moved assertive imports to
basejump-package.R
file. - Consolidated globals inside package to
globals.R
file. - Removed internal
.biocLite
function. Now usingrequireNamespace
instead, without attempting to install automatically. - Added internal support for safe loading RDS files.
- Switched back to using
message
,warning
, andstop
instead of the rlang equivalents. - Improved internal method declaration using
getMethod
where applicable. multiassignAsEnvir
is now recommended in place ofmultiassignAsNewEnvir
.readFileByExtension
will now attempt to use the rio package for file extensions that are not natively supported.writeCounts
now usesmapply
internally.- Migrated
assertFormalAnnotationCol
to bcbioBase package.
Major changes:
- Introducing new functions for the acquistion of gene and transcript
annotations from Ensembl:
ensembl
,genes
, andtranscripts
. These functions allow the return ofGRanges
,DataFrame
, anddata.frame
class objects from AnnotationHub using ensembldb. - Improved internal
broadClass
definition code to match against chromosome from Ensembl if available. loadDataAsName
now works with unquoted names, improving consistency withloadData
(non-standard evaluation).
Minor changes:
- Added new
convertUCSCBuildToEnsembl
function, for easy remapping of UCSC to Ensembl genome build names (e.g.hg38
toGRCh38
). - Migrated matrix methods for
plotCorrelationHeatmap
here from bcbioRNASeq, for improved consistency with other heatmap functions. - Exporting
makeNames
variant ofbase::make.names
that sanitizes using underscores rather than dots. - Converted
readYAML
from a generic to standard function. - Added support for AppVeyor CI code testing on Windows.
- Made Travis CI build checks stricter, adding support for
BiocCheck
. - Added new assert checks:
assertAreGeneAnnotations
,assertAreTranscriptAnnotations
,isAnImplicitInteger
. - Simplified working examples for assert checks to just show successes.
Deprecations:
annotable
function has been deprecated in favor of the newensembl
function.checkAnnotable
deprecated in favor ofassertIsAnnotable
.checkGene2symbol
deprecated in favor ofassertIsGene2symbol
.checkTx2gene
deprecated in favor ofassertIsTx2gene
.assertFormalColorFunction
deprecated in favor ofassertIsHexColorFunctionOrNULL
.initializeDir
deprecated in favor ofinitializeDirectory
.- Defunct:
summarizeRows
,wash
,packageSE
,prepareSE
,metadataTable
,comp
,revcomp
,symbol2gene
.
- Now exporting all assert checks in camel case instead of snake case, to match consistency in the rest of the package.
- Added
sanitizeColData
function. - Added
assertAllAreNonExisting
function. - Now exporting
midnightTheme
as atheme_midnight
alias to match the syntax in the ggplot2 package. - Added working examples and code coverage for all assert check functions.
- Simplified the internal collapse code for
annotable
to simply work on the Entrez identifier column (entrez
). If a manually passed in data frame still has duplicates, the function will now abort instead of attempting to usecollapseToString
. - Added ggplot2 color palette assert checks:
assertColorScaleContinuousOrNULL
,assertColorScaleDiscreteOrNULL
,assertFillScaleContinuousOrNULL
,assertFillScaleDiscreteOrNULL
.
- Switch to using assertive internally for assert checks.
- Now exporting these assert check functions:
assert_formal_annotation_col
,assert_formal_color_function
,assert_formal_compress
,assert_formal_gene2symbol
,assert_formal_header_level
,assert_has_rownames
,assert_is_a_number_or_null
,assert_is_a_string_or_null
,assert_is_an_implicit_integer
,assert_is_an_implicit_integer_or_null
assert_is_an_integer_or_null
,assert_is_annotable
,assert_is_character_or_null
,assert_is_data.frame_or_null
,assert_is_gene2symbol
,assert_is_implicit_integer
,assert_is_implicit_integer_or_null
,assert_is_tx2gene
,has_rownames
,initializeDirectory
,is_implicit_integer
. - Renamed
md*
functions tomarkdown*
.
- Added
convertGenesToSymbols
andconvertTranscriptsToGenes
functions. Previously some of this functionality was contained within thegene2symbol
andtx2gene
generics for the character method. This behavior was inconsistent withgene2symbol
andtx2gene
usage in the bcbio R packages, so I decided to split these out into separate functions. Nowgene2symbol
andtx2gene
work consistently with theannotable
function to return gene-to-symbol and transcript-to-gene identifier mappings in adata.frame
. markdownHeader
,markdownList
, andmarkdownPlotlist
are now exported as S4 generics. Themd*
function variants are now exported as aliases.geomean
has been renamed togeometricMean
.- Miscellaneous improvements to error messages and warnings.
- Offloaded internal bcbio-specific code to new package named bcbioBase.
Consequently, this makes the basejump package leaner, meaner, and easier
to manage. The following functions are now exported in that package:
bcbio
,checkInterestingGroups
,flatFiles
,interestingGroups
,metrics
,plotDot
,plotGene
,plotQC
,plotViolin
,prepareSummarizedExperiment
,prepareTemplate
,readDataVersions
,readLogFile
,readProgramVersions
,readSampleMetadataFile
,sampleMetadata
,sampleYAML
,sampleYAMLMetadata
,sampleYAMLMetrics
, andselectSamples
. These functions are now deprecated here in basejump (seedeprecated.R
file for more information). comp
andrevcomp
have been deprecated in favor ofcomplement
andreverseComplement
from the Biostrings package.- Internally, errors, messages, and warnings now use methods from the
rlang package:
abort
,inform
, andwarn
, in place ofstop
,message
, andwarning
, respectively. - Improved error handing for missing files in
loadData
. Additionally, the file name must match the internal name in the RData file, otherwiseloadData
will warn the user. This is more strict than the default behavior ofbase::load
, but helps prevent accidental overwrite in the current working environment. localOrRemoteFile
, previously an internal function, is now exported.annotable
now uses internal GRCh37 annotations from the annotables package, which is saved in theextdata/
directory internally. Previously, these genome annotations were accessed from lazy loaded data saved in thedata/
directory of the package repository.annotables
now checks for all packages attached by ensembldb and AnnotationHub and forces detachment at the end of the function call. Otherwise, this can result in the unwanted effect of ensembldb masking other user-loaded functions, such as the tidyverse suite (e.g.dplyr::select
).- Consistently reformatted code indents to be a multple of 4 spaces, as recommended by Bioconductor.
camel
now handles delimited numbers a little differently. Previously, delimiters in between numbers (e.g. the commas in "1,000,000") were stripped. Sometimes this can result in confusing names. For example, if we have a column formatted in dotted case containing a decimal (e.g. "resolution.1.6"), the decimal would be stripped (e.g. "resolution16" in camel). Now, we sanitize a numeric delimiter as a lower case "x" character (e.g. "resolution1x6"). This ensures that numbers containing decimals remain semantically meaningful when sanitized withcamel
.- Internally,
gsub
(andgrepl
) calls have been simplified to use the default order of "pattern, replacement, x". - Internally, all implicit integers have been converted to explicit integers, where applicable.
- Bug fix for multiplexed sample input into
readSampleMetadataFile
. We were detecting the presence ofindex
column but should instead check againstsequence
column. - Added
dynamicPlotlist
andmdPlotlist
plotting utilities. - Added
uniqueSymbols
parameter toannotable
function.
- Added
plotHeatmap
functionality. - Migrated
tpm
generic from bcbioRNASeq, for future use in bcbioSingleCell. - Added matrix method support for
plotHeatmap
. - Added matrix method support for
plotQuantileHeatmap
, which works similarly asplotHeatmap
.
- Improved
matrix
anddgCMatrix
method support inaggregateReplicates
andaggregateFeatures
functions. Both of these functions now use a consistentgroupings
parameter, which uses a named factor to define the mappings of either samples (columns) foraggregateReplicates
or genes/transcripts (rows) foraggregateFeatures
. - Update for
makeNames
sanitization functions. Now they will work onnames(x)
for vectors by default. - Improved
detectOrganism
to match against "H. sapiens", etc. - Added internal GRCh37 transcript to gene mapping.
- Improved organism matching to detect "Homo_sapiens" and "H. sapiens".
- Factors are now supported in the
makeNames
utilities:camel
,dotted
,snake
, andupperCamel
. - Improved handling of
NA
values from LibreOffice and Microsoft Excel output inreadFileByExtension
. This function now sets""
,NA
, and#N/A
strings asNA
correctly.
- Renamed
fc2lr
tofoldChangeToLogRatio
andlr2fc
andlogRatioToFoldChange
. - Moved
plotDot
andplotViolin
generics here from bcbioSingleCell. - Added internal GRCh37 gene annotations.
- Moved
microplate
code from the wormbase package here, since it's of general interest.
- Added
checkAnnotable
,checkGene2symbol
,checkTx2gene
, andsanitizeAnnotable
utility functions that will be used in the bcbio R packages.
- Added
midnightTheme
ggplot2 theme. Originally this was defined asdarkTheme
in the bcbioSingleCell package, but can be useful for other plots and has been moved here for general bioinformatics usage. The theme now usesggplot2::theme_minimal
as the base, with some color tweaks, namely dark gray axes without white axis lines. - Improve NAMESPACE imports to include
stats::formula
andutils::capture.output
.
loadData
andloadDataAsName
now default toreplace = TRUE
. If an object with the same name exists in the destination environment, then a warning is generated.collapseToString
only attempts to dynamically return the original object class on objects that aren't classdata.frame
. I updated this code to behave more nicely with grouped tibbles (grouped_df
), which are a virtual class ofdata.frame
and therefore can't be coerced usingas(object, "grouped_df")
.- DNA sequence utility functions
comp
andrevcomp
now returnNULL
for integers and numerics. - For
prepareSummarizedExperiment
, added support for droppingNULL
objects in assays list. This is useful for handling output from bcbioRNASeq whentransformLimit
is reached. In this case, therlog
andvst
matrices aren't generated and setNULL
in the assays list. UsingFilter(Negate(is.null), assays)
we can drop theseNULL
objects and prevent a downstream dimension mismatch in theSummarizedExperiment::SummarizedExperiment
call. - Improved support for multiplexed files in
readSampleMetadataFile
. This now checks for a sequence column containing ACGT nucleotides. When those are detected, therevcomp
column is generated. Otherwise this step is skipped. This is useful for handling multiplexed sample metadata from 10X Genomics Cell Ranger single-cell RNA-seq samples. - Updated
annotable
function to include nested Entrez identifiers in theentrez
column. This is useful for downstream functional analysis.
- Added bcbio
plotQC
generic. - Added back
toStringUnique
code, which is still in use in the wormbase package. - Added deprecations for
summarizeRows
(nowcollapseToString
) andwash
functions. - Updated installation method to include manual installation of ensembldb. Otherwise, basejump installation fails due to GenomeInfoDb and GenomeInfoDbData not getting installed completely.
- Now suggesting that the user installs suggested packages in the README.
- Updated PANTHER annotation scripts.
- Bug fix for
detectOrganism
. Now allowingNULL
return for unsupported organism, with a warning.
- Added overwrite support for
saveData
. Now will skip on existing files whenoverwrite = FALSE
. - Bug fix for
readDataVersions
, which shouldn't have the column types defined, usingcol_types = "ccT"
. - Improved key value pair method for
loadDataAsName
. Now rather than using a named character vector for themappings
argument, the user can simply pass the key value pairs in as dots. For example,newName1 = "oldName1", newName2 = "oldName2"
. The legacymappings
method will still work, as long as the dots argument is a length of 1. - Ensembl release version now defaults to
NULL
instead ofcurrent
forannotable
,gene2symbol
,symbol2gene
andtx2gene
functions. - Allow
rowData
to be left unset inprepareSummarizedExperiment
. This is useful for setting up objects that don't contain gene annotations. - Removed sample selection by pattern matching (
pattern
,patternCol
arguments) inreadSampleMetadata
. This feature wasn't fully baked and doesn't offer enough functionality to the user.
- Bump version to match bcbioRNASeq package.
- Improved unit testing coverage of
prepareSummarizedExperiment
. - Added quiet mode support to functions that output messages, where applicable.
- Consolidated roxygen2 function imports to
basejump-package.R
file. - Deprecated
sampleDirs
generic. - Improved organism detection in
detectOrganism
and added support for chicken genome. - Clarified warning messages in
prepareSummarizedExperiment
to make sample loading withloadRNASeq
andloadSingleCell
in the bcbio packages less confusing. - Improved
NULL
returns inreadDataVersions
,readLogFile
, andreadProgramVersions
utility functions. - Fixed export of
*GTF
alias functions to simply wrap the*GFF
functions with S4 methods support. - Improved lane split technical replicate handling in
readSampleMetadataFile
. - Improved
camel
syntax for both lax and strict modes. AddedupperCamel
function. - Switched
str_
to basegrep
andgsub
in internal functions.
- Improved consistency of
setMethod
calls usingsignature
. - Converted
loadRemoteData
to a standard function instead of using S4 dispatch, allowing theenvir
argument to be set properly.
- Added additional package version requirements in
DESCRIPTION
file. - Implicit integers are allowed.
- Using GitHub version of covr package for unit tests.
- Renamed
multiassignAsNewEnv
tomultiassignAsNewEnvir
- Added
*GFF
function variants forgene2symbolFromGTF
andtx2geneFromGTF
. - Added shared bcbio generic functions:
aggregateReplicates
,bcbio
,bcbio<-
,interestingGroups
,metrics
,plotGene
,sampleDirs
,sampleMetadata
,selectSamples
. These functions are saved inbcbioGenerics.R
file. - Incorporated bcbio utility functions shared across bcbioRNASeq and
bcbioSingleCell:
readDataVersions
(deprecated.dataVersions
),readLogFile
(deprecated.logFile
),readProgramVersions
(deprecated.programs
),sampleYAML
(deprecated.sampleYAML
),sampleYAMLMetadata
(deprecatedsampleYAMLMetadata
),sampleYAMLMetrics
(deprecated.sampleYAMLMetrics
). - Deprecated
metadataTable
function. - Moved all roxygen2 documentation to
methods-*.R
files where applicable. - Now using
assays
inprepareSummarizedExperiment
generic definition as primary object. - Improved
assignAndSaveData
to add silent return of file path. - Now consistently using clear function definitions in chain operations with
magrittr pipe (
%>%
). - Added
.prepareSampleMetadata
utility function, for use with loading sample metadata from an external CSV, Excel, or YAML file. - Added
loadData
functionality back to the package. - Initial commit of
loadDataAsName
function. - Improved
annotable
function documentation and support for Ensembl release versions. - Improved sanitization rules for
camel
,dotted
, andsnake
name functions. Added thestrict
argument tocamel
anddotted
. - Improved the documentation for the
makeNames
functions, by splitting each into their own separate methods file. - Improved S4 method support for the logRatio functions.
- Added integer support for
geomean
function. Also improved internal code ofgeomean
based on Paul McMurdie's Stack Overflow post. See function documentation for more information. - Added release support for
tx2gene
functions. - Reduced the number of reexported functions (for documentation) down to the
magrittr pipe (
%>%
),Matrix
,DataFrame
, andtibble
. - Added silent return of file path to
saveData
function. - Improved documentation for tibble coercion using
as(object, "tibble")
- Renamed
collapse
function tocollapseToString
, to avoid NAMESPACE collisions with tidyverse packages (dplyr, glue).
- Upgraded
annotable
function to query Ensembl using the ensembldb package rather than annotables.
- Improved unit testing coverage.
- Renamed
prepareSE
toprepareSummarizedExperiment
. Improved row and column name handling in the function. It now outputs more helpful diagnostic messages on error. - Reworked and simplified
detectHPC
function to allow for unit testing.
- NAMESPACE improvements. Reduced the number of re-exported functions to simplify the package.
- Improved code coverage and unit testing with additional testthat checks. Specifically, added unit testing for remote download functions and improved testing for GTF file utilities.
- Code coverage now above 90%!
- Renamed
packageSE
toprepareSE
for better semantic meaning. - Made multiple generics more flexible by inclusion of passthrough (
…
). - Reduced the number of deprecated functions.
- Initial commit of internal
localOrRemote
utility function. - Initial commit of
prepareTemplate
function. - Added additional
data-raw/
scripts. - Added
onLoad.R
script back to ensure proper attachment of annotables data package. - Removed tidyverse S4 method support.
- Improved remote file handling for
readFileByExtension
,readGTF
, andreadYAML
functions.
- Offloaded devtools functions to personal package.
- Upgraded all functions to S4 functions where possible.
- Assign utilities were kept as S3 functions since S4 dispatch makes
parent.frame
assignment not work correctly. - Deprecated snake case and British spelling variants to reduce the number of exported functions.
- Added more working examples.
- Added unit testing for annotables functions.
- Improved documentation and consistency with bcbio packages.
- Improved integration of gene annotation calls using annotables package.
- Initial support for SummarizedExperiment creation with
packageSE
.
- Added annotables common code from bcbio packages.
- Added automatic file reading using readr package.
- Combined write counts functions from bcbioRNASeq and bcbioSinglecell.
- Initial commit of
assign_data
for use in bcbioSingleCell sample loops.
- Minor NAMESPACE updates while working on bcbio packages.
- Tweaks for tidyverse S4 generic verbs. In particular,
as_tibble
now provides better consistency for rowname conversion.
- dplyr 0.7 NAMESPACE fixes and function tweaks.
- Updated exports based on wormbase package.
- Improved naming functions to dynamically handle character vectors and objects that support naming assignments.
- Added
removeNA
utility function.
- Added NAMESPACE utilities to deal with tidyverse generic verbs.
- Switched package documentation method to use roxygen2 with pkgapi.
- Added snake case function variants.
- Added back
saveData
utility functions.
- Bug fixes for dplyr 0.6.0 update and improved kable handling.
- Dependency fix for successful compilation on the HMS RC Orchestra cluster.
- Consolidated functions in the documentation.
- Improved documentation.
- Removed dependencies and transfer functions to bcbioRNASeq.
- Initial draft release.