Title: | Streamlined scRNA-Seq Analysis Pipeline |
---|---|
Description: | Streamlined scRNA-Seq analysis pipeline. |
Authors: | Igor Dolgalev [cre, aut], Anna Yeaton [aut] |
Maintainer: | Igor Dolgalev <[email protected]> |
License: | MIT + file LICENCE |
Version: | 0.0.0.9004 |
Built: | 2024-11-23 04:51:05 UTC |
Source: | https://github.com/igordot/scooter |
Add assay to Seurat object.
add_seurat_assay(seurat_obj, assay, counts_matrix, log_file = NULL)
add_seurat_assay(seurat_obj, assay, counts_matrix, log_file = NULL)
seurat_obj |
Seurat object. |
assay |
Seurat assay to add the matrix to. |
counts_matrix |
Raw counts matrix. |
log_file |
Filename for the log file. |
Seurat object of cells found in both the existing object and new data.
Function to extract data from Seurat object.
as_data_frame_seurat( seurat_obj, assay = NULL, slot = NULL, features = NULL, reduction = NULL, metadata = TRUE )
as_data_frame_seurat( seurat_obj, assay = NULL, slot = NULL, features = NULL, reduction = NULL, metadata = TRUE )
seurat_obj |
A Seurat object. |
assay |
Assay such as RNA. |
slot |
Slot such as counts. Default is scale.data. |
features |
Features from assay. |
reduction |
Character vector of reduction types. |
metadata |
Boolean. To grab metadata or not |
A metadata file merged on cell identifiers.
Get cluster averages.
calc_clust_averages(metadata, data, group)
calc_clust_averages(metadata, data, group)
metadata |
Metadata. |
data |
Gene expression data. |
group |
Column in metadata. |
.
Run dimensionality reduction, pca, tse, and umap
calculate_clusters( pcs, num_dim, log_file, num_neighbors = 30, res = NULL, algorithm = 3 )
calculate_clusters( pcs, num_dim, log_file, num_neighbors = 30, res = NULL, algorithm = 3 )
pcs |
Data. |
num_dim |
Number of PCs to use for tsne and umap. |
log_file |
log file. |
num_neighbors |
Number of neighbors to use for umap. |
res |
Resolution |
algorithm |
See Seurat::FindClusters() |
.
Calculate mitochondrial percentage from Seurat object.
calculate_mito_pct(seurat_obj)
calculate_mito_pct(seurat_obj)
seurat_obj |
A Seurat object. |
Seurat object.
Get variable genes and scale data.
calculate_variance( seurat_obj, assay = "RNA", nfeatures = 2000, log_file = NULL )
calculate_variance( seurat_obj, assay = "RNA", nfeatures = 2000, log_file = NULL )
seurat_obj |
Seurat object. |
assay |
Assay. |
nfeatures |
Number of variable features to output. |
log_file |
A log file. |
A named list of the top features, and the scaled data.
Check identity of the Seurat object.
check_identity_column(seurat_obj, identity_column)
check_identity_column(seurat_obj, identity_column)
seurat_obj |
A Seurat object. |
identity_column |
The name of the identity column to pull from object metadata. |
The name of the identity column, potentially corrected if resolution.
Function to create a color vector.
create_color_vect(seurat_obj, group = "orig.ident")
create_color_vect(seurat_obj, group = "orig.ident")
seurat_obj |
A Seurat object. |
group |
Assay such as RNA. |
A vector of colors.
Create a new Seurat object from a matrix.
create_seurat_obj( counts_matrix, assay = "RNA", min_cells = 1, min_genes = 1, log_file = NULL, project = "proj" )
create_seurat_obj( counts_matrix, assay = "RNA", min_cells = 1, min_genes = 1, log_file = NULL, project = "proj" )
counts_matrix |
A matrix of raw counts. |
assay |
Seurat assay to add the data to. |
min_cells |
Include genes/features detected in at least this many cells. |
min_genes |
Include cells where at least this many genes/features are detected. |
log_file |
Filename for the logfile. |
project |
Project name for Seurat object. |
Seurat object.
Calculate differential expression for one group versus all
differential_expression_global( data, metadata, metadata_column, log_fc_threshold = 0.5, min.pct = 0.1, test.use = "wilcox", out_path = ".", write = FALSE, log_file = NULL )
differential_expression_global( data, metadata, metadata_column, log_fc_threshold = 0.5, min.pct = 0.1, test.use = "wilcox", out_path = ".", write = FALSE, log_file = NULL )
data |
Gene expression data. |
metadata |
Metadata. |
metadata_column |
Column in metadata. |
log_fc_threshold |
Log fc threshold. |
min.pct |
Minimum percentage of cells a gene must be expressed in to be tested. |
test.use |
Test to use. |
out_path |
output path. |
write |
Boolean to write or not. |
log_file |
log file. |
.
Calculate differential expression between specific groups
differential_expression_paired( data, metadata, metadata_column, list_groups = NULL, log_fc_threshold = 0.5, min.pct = 0.1, test.use = "wilcox", out_path = ".", write = FALSE, log_file = NULL )
differential_expression_paired( data, metadata, metadata_column, list_groups = NULL, log_fc_threshold = 0.5, min.pct = 0.1, test.use = "wilcox", out_path = ".", write = FALSE, log_file = NULL )
data |
Gene expression data. |
metadata |
Metadata. |
metadata_column |
Column in metadata. |
list_groups |
dataframe of groups to compare in the metadata column. |
log_fc_threshold |
Log fc threshold. |
min.pct |
Minimum percentage of cells a gene must be expressed in to be tested. |
test.use |
Test to use. |
out_path |
output path. |
write |
Boolean to write or not. |
log_file |
log file. |
.
Calculate differentially expressed genes within each subpopulation/cluster
differential_expression_per_cluster( seurat_obj, cluster_column, group_column, test = "wilcox", out_path = ".", write = TRUE, log_file = NULL )
differential_expression_per_cluster( seurat_obj, cluster_column, group_column, test = "wilcox", out_path = ".", write = TRUE, log_file = NULL )
seurat_obj |
Gene expression data. |
cluster_column |
Metadata column specifying the groups to split by. |
group_column |
Metadata column specifying the groups for differential expressin within each split. |
test |
Statistical method to use. |
out_path |
Output path. |
write |
Boolean to save results to disk. |
log_file |
log file. |
.
Filter out cells based on minimum and maximum number of genes and maximum percentage mitochondrial reads. If cutoffs are not provided, the min_genes will be the 0.02 quantile, and the max genes will be 0.98 quantile and the mitochondrial percentage will be 10
filter_data( data, log_file = NULL, min_genes = NULL, max_genes = NULL, max_mt = 10 )
filter_data( data, log_file = NULL, min_genes = NULL, max_genes = NULL, max_mt = 10 )
data |
A tibble with metadata. |
log_file |
Log file. |
min_genes |
Minimum number of genes per cell. |
max_genes |
Maximim number of genes per cell. |
max_mt |
Maximum percentage of mitochondrial reads per cell. |
Filtered data
Get geneset scores.
geneset_score(module_tbl, counts_raw, min_cpm = 0, limit_pct = 1)
geneset_score(module_tbl, counts_raw, min_cpm = 0, limit_pct = 1)
module_tbl |
geneset table. |
counts_raw |
Raw counts |
min_cpm |
. |
limit_pct |
. |
.
Determine the color scheme. Can be specified for samples or clusters to avoid confusion.
get_color_scheme(type = "clusters")
get_color_scheme(type = "clusters")
type |
Type of scheme ("samples" or "clusters"). |
A vector of colors.
Determine the point size for reduced dimensions scatter plots (smaller for larger datasets).
get_dr_point_size(num_cells)
get_dr_point_size(num_cells)
num_cells |
Number of cells (points on the plot) or a Seurat object. |
Numeric point size.
Get a small matrix of raw counts from the PBMC dataset.
get_test_counts_matrix()
get_test_counts_matrix()
A matrix of raw counts.
pbmc_mat <- get_test_counts_matrix()
pbmc_mat <- get_test_counts_matrix()
Read in 10x Genomics Cell Ranger Matrix Market format data.
import_mtx(data_path, gene_column = 2, log_file = NULL)
import_mtx(data_path, gene_column = 2, log_file = NULL)
data_path |
Path to directory that holds the files output from 10x. |
gene_column |
The column with the gene names. |
log_file |
Filename for the log file. |
Named list of matrices. One matrix for each data type as specified in the third column of the features.tsv file. As of Oct 3rd 2019, the two options are 'Gene Expression' and 'Antibody Capture'
Read in Gene Expression and Antibody Capture data from a 10x Genomics Cell Ranger sparse matrix or from a text file.
load_sample_counts_matrix(sample_name, path, log_file = NULL)
load_sample_counts_matrix(sample_name, path, log_file = NULL)
sample_name |
A character that will be used as a prefix for all cell names. |
path |
Path to directory containing 10x matrix, or path to a text file. |
log_file |
Filename for the log file. |
Named list of matrices. One matrix for each data type. Currently the only two data types are 'Gene Expression' and 'Antibody Capture'
Log normalize data.
log_normalize_data(data, log_file = NULL)
log_normalize_data(data, log_file = NULL)
data |
A seurat object. |
log_file |
log file. |
normalized data
Function to merge two metadata tables together.
merge_metadata(metadata1, metadata2, log_file = NULL)
merge_metadata(metadata1, metadata2, log_file = NULL)
metadata1 |
A Seurat object or a tibble containing metadata with either a column called "cell" with cell IDs or rownames with cell IDs. |
metadata2 |
A tibble containing metadata with either a column called "cell" with cell IDs or rownames with cell IDs. |
log_file |
A log filename. |
A metadata file merged on cell identifiers.
Normalize data
normalize_data( data, method, nfeatures = 2000, metadata = NULL, assay = NULL, log_file = NULL )
normalize_data( data, method, nfeatures = 2000, metadata = NULL, assay = NULL, log_file = NULL )
data |
Input data. |
method |
Normalization method ("log" or "sct"). |
nfeatures |
. |
metadata |
. |
assay |
. |
log_file |
Log file. |
Normalized data
Plot the distribution of specified features/variables.
plot_distribution(data, features, grouping, color_scheme = NULL)
plot_distribution(data, features, grouping, color_scheme = NULL)
data |
Seurat object or metadata. |
features |
Vector of features to plot (such as "nGene", "nUMI", "percent.mito"). |
grouping |
X. |
color_scheme |
(optional) Vector of colors. |
A vector of colors.
Run dimensionality reduction, pca, tse, and umap
run_dr( data, dr_method, prefix, assay = NULL, var_features = FALSE, features = NULL, graph = NULL, num_dim_use = NULL, reduction = NULL, num_neighbors = NULL, num_pcs = NULL, ... )
run_dr( data, dr_method, prefix, assay = NULL, var_features = FALSE, features = NULL, graph = NULL, num_dim_use = NULL, reduction = NULL, num_neighbors = NULL, num_pcs = NULL, ... )
data |
Data to use for dimensionality reduction. |
dr_method |
Dimensionality reduction method. |
prefix |
. |
assay |
. |
var_features |
. |
features |
. |
graph |
. |
num_dim_use |
. |
reduction |
. |
num_neighbors |
. |
num_pcs |
. |
list of dimensionality reduced/
run_pca(), run_tsne(), run_umap()
Run PCA
run_pca(data, num_pcs, prefix = "PC_")
run_pca(data, num_pcs, prefix = "PC_")
data |
A tibble with metadata. |
num_pcs |
Maximim number of genes per cell. |
prefix |
suffix. |
named list of feature loadings, cell embeddings, sdev, output from pca
Run TSNE
run_tsne(data, seed.use = 1, dim.embed = 2, prefix = "tSNE_")
run_tsne(data, seed.use = 1, dim.embed = 2, prefix = "tSNE_")
data |
Data to run tsne on. |
seed.use |
seed to use. |
dim.embed |
Number of tsne embeddings to return. |
prefix |
suffix. |
tsne.
Run UMAP
run_umap(data, num_neighbors, min_dist = 0.3, prefix = "UMAP_")
run_umap(data, num_neighbors, min_dist = 0.3, prefix = "UMAP_")
data |
Data to run UMAP on. |
num_neighbors |
Number of neighbors. |
min_dist |
Distance metric. |
prefix |
Prefix. |
umap
Function to write Seurat counts matrix to csv.
save_seurat_counts_matrix( seurat_obj, proj_name = "", label = "", out_dir = ".", assay = "RNA", slot = "data", log_file = NULL )
save_seurat_counts_matrix( seurat_obj, proj_name = "", label = "", out_dir = ".", assay = "RNA", slot = "data", log_file = NULL )
seurat_obj |
A Seurat object. |
proj_name |
Name of the project that will be the prefix of the file name. |
label |
An optional label for the file. |
out_dir |
Directory in which to save csv. |
assay |
The assay within the Seurat object to retrieve data from. |
slot |
The slot within the Seurat object to retrieve data from. |
log_file |
A log filename. |
A csv file in the out_dir.
SCT normalize data.
sctransform_data(counts, metadata, nfeatures, log_file = NULL)
sctransform_data(counts, metadata, nfeatures, log_file = NULL)
counts |
Raw counts. |
metadata |
Metadata for each cell. |
nfeatures |
Number of variable features to output. |
log_file |
A log file. |
A named list of the vst output, the final scaled data, and the top variable genes.
Wrapper for SeuratObject::Idents() with extra safety checks.
set_identity(seurat_obj, identity_column)
set_identity(seurat_obj, identity_column)
seurat_obj |
A Seurat object. |
identity_column |
The name of the identity column to pull from object metadata. |
A Seurat object with an updated identity set.
Small function to write to message and to log file.
write_message(message_str, log_file = NULL)
write_message(message_str, log_file = NULL)
message_str |
A string to write as a message. |
log_file |
A log filename. |
A message and writes the message to the specified log file.
write_message(message_str = "Finished Step 1", log_file = "log.file.txt")
write_message(message_str = "Finished Step 1", log_file = "log.file.txt")