- GNU Bourne-Again SHell
- concatenate files and print on the standard output
- change file mode bits (i.e. the permissions for whether users can read, write and execute)
- remove sections from each line of files
- display a line of text (i.e. send that line of text to STDOUT)
- print lines matching a pattern
- compression/decompression tool using Lempel-Ziv coding (LZ77)
gzip
for compressing, and the other two are for decompressing.zcat
decompresses to STDOUT and is simply an alias forgunzip -c
- output the first part of files
- list directory contents
- an interface to the on-line reference manuals (or "man pages")
- Write lines consisting of the sequentially corresponding lines from each FILE, separated by TABs, to standard output. With no FILE, or when FILE is -, read standard input.
- output the last part of files
- The Perl 5 language interpreter
- Utilities for the Sequence Alignment/Map (SAM) format
- Tools for processing sequences in the FASTA or FASTQ format
- sort lines of text files using one or more column
- can perform numeric or alphabetic/alphanumeric sort
- Less is a program similar to more, but which allows backward movement in the file as well as forward movement. Also, less does not have to read the entire input file before starting, so with large input files it starts up faster than text editors like vi.
- word, line, character, and byte count
- locate a program file in the user's path
- A highly important whitespace character that splits a line of text to the next line
- A highly important whitespace character that is commonly used as a delimiter between columns (e.g. in a tsv or tab-separated values file)
- Goes before a bash variable when retrieving (but not when setting) its value
- Acts as a pattern anchor on the right-hand side when used in the context of a regular expression
- In many programming languages this reverses the logic of an operation
- e.g.
!=
means not equal to - Shortcut for checking if an operation or function call returns FALSE, negating it (
! some_operation
) will cause it to evaluate to TRUE
- e.g.
- A common way to embed lines of text in code that is ignored by the compiler (i.e. create a comment)
- Used at the start of scripts in combination with
!
to create the shebang line#!
- Also used for making hashtags although that's not relevant for this course
- In bash this is used for globbing
- Indicates multiplication operation in most programming languages
- Found to the left of the 1 on most keyboards
- Used in unix-like systems as a shortcut to your home directory
- Used as an operator for regular expressions in some languages (i.e.
~=
sorta equal to)
- Used as a minus sign
- Various uses in different contexts in bash
- Convention to use this before named arguments e.g.
program -i input_file
- Can be used to tell a command-line program to read from STDIN instead of an input file e.g.
some_other_program | program -i
- used in unix-like systems to flow data between multiple processes
- Two vertical bars together almost always means logical OR
||
- In R a single
|
can mean OR in many contexts
- Adding this to the end of a command in bash sends the process to the background and frees your terminal to allow more commands to be run
- Two ampersands together mean logical AND
&&
- The normal function of the
?
key on your keyboard - used a LOT in unix-like operating systems to separate directory levels.
- On its own it has a special meaning as the "root" directory of a filesystem
- not to be confused with backslash
- above the return/enter key on most keyboards
- used in windows to separate directory levels
- acts as a special escape character in various programming languages as a way to hide special symbols from being interpreted by the compiler
Genomic annotation refers to the process of identifying functional elements in a genome. Annotations are also what we call any set of coordinate-based information that result from this process, in other words, the locations of functional elements and their relationships to other annotations. Examples: gene, a gene's exons, an exon's splice sites, a gene's transcripts, repetitive elements.
A commonly used tab-separated format for storing and manipulating genomic annotations. See also.
The minimal unit of binary data representation/storage. Can exist in two states (on or off, i.e. 0 or 1).
A set of eight bits. A minimum amount of information needed to store an ASCII character in a plain text file.
A data type in some programming languages used to store a single character (printable or whitespace). Examples: a
, ?
, \n
(newline).
A simple plain-text format for storing one or more DNA, RNA or protein sequence records.
A convenience feature used by bash to restrict files to all those matching more generic patterns specified by the glob. The general use is to us *
to represent a component of the filename that matches the files the user wants to specify. See also
A stream of data from one program to another. In bash, a pipe of standard output (STDOUT) is achieved by separating commands with the bar |
symbol.
A common data type used in virtually all programming languages. Used for values that are made up of ordered sequences of characters. A string can contain any sequence of characters, printable or whitespace. Examples: AAAAA
, hello world
, GATTACA
.
- A user-settable shorthand way to run a command in bash. Often used to run a command that reqires complex set of parameters
- These are set using the built-in alias command, usually in the
.bashrc
or.bash_profile
file:alias alias_name="command_to_run"
- A value (e.g. string, integer) given to a program at runtime. Typically these will specify the input for the program or the path to an input file along with specifying options or settings for the program.
- A variable that can store one or more values at defined positions (indexes). Python arrays are indexed starting at 0.
- Something you hopefully don't have very often with your significant other
- A variable or value that is provided when calling a function or running a program
- In R and Python, arguments can be positional (provided in a specific order) or explicitly named
- A variable that can store one or more key-value pairs (unordered), with each key uniquely referring to its value.
- Equivalent to a binary value. Represented as TRUE/FALSE (R) or True/False (Python)
- Vectors are variables that can store one or more values at defined positions (indexes). R vectors are indexed starting at 1.
- Names can be assigned to all elements in an R vector
- A function is a block of reusable code that is used to perform a similar or related action using a specific input or set of inputs
- In object-oriented programming languages, an object can have its own functions, which are usually described as methods.
- Because methods have direct access to the object's attributes, they do not need to be explicitly provided as arguments.