Combine & Normalize — combineNormalize • SCWorkflow

Scales and Normalizes data, Combines samples, runs Dimensional Reduction, Clusters, and returns a combined Seurat Object.

combineNormalize(
  object,
  npcs = 30,
  SCT.level = "Merged",
  vars.to.regress = NULL,
  nfeatures = 2000,
  low.cut = 0.1,
  high.cut = 8,
  low.cut.disp = 1,
  high.cut.disp = 1e+05,
  selection.method = "vst",
  only.var.genes = FALSE,
  draw.umap = TRUE,
  draw.tsne = TRUE,
  seed.for.pca = 42,
  seed.for.tsne = 1,
  seed.for.umap = 42,
  clust.res.low = 0.2,
  clust.res.high = 1.2,
  clust.res.bin = 0.2,
  methods.pca = NULL,
  var.threshold = 0.1,
  pca.reg.plot = FALSE,
  jackstraw = FALSE,
  jackstraw.dims = 5,
  exclude.sample = NULL,
  cell.count.limit = 35000,
  reduce.so = FALSE,
  project.name = "scRNAProject",
  cell.hashing.data = FALSE
)

Arguments

object: a list of seurat objects for each sample.
npcs: Select the number of principal components for your analysis. Please see the elbow plot in the previous template to figure out what number of PCs explains your variance cut-off. For example, if the elbow plot has point at (15,0.02), it means that 15 PCs encapsulate 98% of the variance in your data.(Default: 30)
SCT.level: Select at which stage to apply SCtransform nomalization. Merged: Merge all samples and apply SCTransfrom on merged object. Sample: Apply SCTranform on individual samples then merge into single Seurat object. (Default: "Merged")
vars.to.regress: Subtract (‘regress out’) this source of heterogeneity from the data. For example, to Subtract mitochondrial effects, input "percent.mt." Options: percent.mt, nCount.RNA, S.Score, G2M.Score, CC.Difference. (Default: NULL)
nfeatures: Number of variable features. (Default: 2000)
low.cut: Set low cutoff to calculate feature means in Seurat::FindVariableFeatures. (Default: 0.1)
high.cut: Set high cutoff to calculate feature means in Seurat::FindVariableFeatures. (Default: 8)
low.cut.disp: Set low cutoff to calculate feature dispersions in Seurat::FindVariableFeatures.(Default: 1)
high.cut.disp: Set high cutoff to calculate feature dispersions in Seurat::FindVariableFeatures. (Default: 100000)
selection.method: Method to choose top variable features. Options: vst, mean.var.plot, dispersion. (Default: 'vst')
only.var.genes: If dataset is larger than ~40k filtered cells, set to TRUE. If TRUE, only variable genes will be available for downstream analysis. If dataset is larger than the number of cells set in "Conserve Memory Max Cell Limit" "Only Variable Genes" is automatically set to TRUE. (Default: FALSE)
draw.umap: If TRUE, draw UMAP plot. (Default: TRUE)
draw.tsne: If TRUE, draw TSNE plot. (Default: TRUE)
seed.for.pca: Set a random seed for PCA calculation. (Default: 42)
seed.for.tsne: Set a random seed for TSNE calculation. (Default: 1)
seed.for.umap: Set a random seed for UMAP calculation. (Default: 42)
clust.res.low: Select minimum resolution for clustering plots. The lower you set this, the FEWER clusters will be generated. (Default: 0.2)
clust.res.high: Select the maximum resolution for clustering. The higher you set this number, the MORE clusters you will produced. (Default: 1.2)
clust.res.bin: Select the bins for your cluster plots. For example, if you input 0.2 as your bin, and have low/high resolution ranges of 0.2 and 0.6, then the template will produce cluster plots at resolutions of 0.2, 0.4 and 0.6. (Default: 0.2)
methods.pca: Methods available: Marchenko-Pastur: use eigenvalue null upper bound from URD, Elbow: Find threshold where percent change in variation between consecutive PCs is less than X% (set in var.threshold). If none is selected (regardless of other selections) the plot will not be generated. (Default: 'none')
var.threshold: For Elbow method, set percent change threshold in variation between consecutive PCs. (Default: 0.1)
pca.reg.plot: Opt to visualize the effect of your regression variables on in a PCA plot. This plot will create PCA plots with and without regression variables applied and can be used to help determine if regression is necessary to properly normalize your data. (Default: FALSE)
jackstraw: Opt to visualize your data in a Jackstraw plot. Jackstraw plot can add more description than an elbow plot but is compute intensive process and may not be suitable for larger datasets. (Default: FALSE)
jackstraw.dims: Recommended max 10.(Default: 5)
exclude.sample: Exclude unwanted samples from the merge step. Include sample names to be removed. If you want to exclude several samples, separate each sample number by comma (e.g. sample1,sample2,sample3,sample4). (Default: NULL)
cell.count.limit: If total number of cell exceeds this limit conserve memory option of SCTransform will be used and return only Variable Genes. (Default: 35000)
reduce.so: Remove any additional assays from input Seurat Objects except for the original RNA Assay. This option should be used if input Seurat Object was created outside of the NIDAP pipeline. (Default: FALSE)
project.name: Add project name to the Seurat object metadata. (Default: 'scRNAProject')
cell.hashing.data: Set to "TRUE" if you are using cell-hashed data. (Default: FALSE)

Value

Seurat Objects and QC plots

Details

This is Step 3 in the basic Single-Cell RNA-seq workflow. This template will summarize the multi-dimensionality of your data into a set of "principal components" to allow for easier analysis.