Filters cells and Genes for each sample and generates QC Plots to evaluate data before and after filtering.

filterQC(
  object,
  min.cells = 20,
  filter.vdj.genes = F,
  nfeature.limits = c(NA, NA),
  mad.nfeature.limits = c(5, 5),
  ncounts.limits = c(NA, NA),
  mad.ncounts.limits = c(5, 5),
  mitoch.limits = c(NA, 8),
  mad.mitoch.limits = c(NA, 3),
  complexity.limits = c(NA, NA),
  mad.complexity.limits = c(5, NA),
  topNgenes.limits = c(NA, NA),
  mad.topNgenes.limits = c(5, 5),
  n.topgnes = 20,
  do.doublets.fitler = T,
  plot.outliers = "FALSE",
  group.column = NA,
  nfeatures = 2000,
  low_cut = 0.1,
  high_cut = 8,
  low_cut_disp = 1,
  high_cut_disp = 1e+05,
  selection_method = "vst",
  npcs = 30,
  integratedata = FALSE,
  clust_res_low = 0.2,
  clust_res_high = 1.2,
  clust_res_bin = 0.2,
  only_var_genes = FALSE,
  seed_for_PCA = 42,
  seed_for_TSNE = 1,
  seed_for_UMAP = 42
)

Arguments

object

a list of seurat objects for each sample.

min.cells

Filter out genes found in less than this number of cells. E.g. Setting to 20 will remove genes found in fewer than 3 cells of a sample. (Default: 20)

filter.vdj.genes

If FALSE to remove VDJ genes from the scRNA transcriptome assay. This is to prevent clustering bias in T-cells of the same clonotype. Only recommended if you are also doing TCR-seq. (Default: FALSE)

nfeature.limits

Filter out cells where the number of genes found in each cell exceed the selected lower or upper limits. Usage c(lower limit, Upper Limit). E.g. setting to c(200,1000) will remove cells that have fewer than 200 genes or more than 1000 genes for each sample. (Default: c(NA, NA))

mad.nfeature.limits

Set filter limits based on how many Median Absolute Deviations an outlier cell will have. Calculated from the median gene number for all cells in your sample. Usage c(lower limit, Upper Limit) E.g. setting to c(3,5) will remove all cells with more than 3 absolute deviations less than the median or 5 absolute deviations greater than the median. (Default: c(5,5))

ncounts.limits

Filter out cells where the total number of molecules (umi) detected within a cell exceed the selected limits. Usage c(lower limit, Upper Limit). E.g. setting to c(200,100000) will remove cells that have fewer than 200 or greater than 100000 molecules. (Default: c(NA, NA))

mad.ncounts.limits

Set filter limits based on how many Median Absolute Deviations an outlier cell will have. Calculated from the median number of molecules for all cells in your sample. Usage c(lower limit, Upper Limit) E.g. setting to c(3,5) will remove all cells with more than 3 absolute deviations less than the median or with more than 5 absolute deviations greater than the median. (Default: c(5,5))

mitoch.limits

Filter out cells whose proportion of mitochondrial genes exceed the selected lower or upper limits. Usage c(lower limit, Upper Limit). E.g. setting to c(0,8) will not set the lower limit and removes cells with more than 8% mitochondrial RNA. (Default: c(NA,8))

mad.mitoch.limits

Set filter limits based on how many Median Absolute Deviations an outlier cell will have. Calculated from the Median percentage of mitochondrial RNA for all cells in your sample. Usage c(lower limit, Upper Limit). E.g. setting to c(NA,3) will not set a lower limit and remove all cells with more than 3 absolute deviations greater than the median. (Default: c(NA,3))

complexity.limits

Complexity represents Number of genes detected per UMI. The more genes detected per UMI, the more complex the data. Filter out cells whose Complexity exceed the selected lower or upper limits. Cells that have a high number of UMIs but only a low number of genes could be dying cells, but also could represent a population of a low complexity cell type (i.e red blood cells). We suggest that you set the lower limit to 0.8 if samples have suspected RBC contamination. Usage c(lower limit, Upper Limit). E.g. setting to c(0.8,0) will not set an upper limit and removes cells with complexity less than 0.8. (Default: c(NA,NA))

mad.complexity.limits

Set filter limits based on how many Median Absolute Deviations an outlier cell will have. Calculated from the Median complexity for all cells in your sample. Usage c(lower limit, Upper Limit). E.g. setting to c(5,NA) will not set an upper limit and remove all cells with more than 5 absolute deviations less than the median. (Default: c(5,NA))

topNgenes.limits

Filter Cells based on the percentage of total counts in top N most highly expressed genes. Outlier cells would have a high percentage of counts in just a few genes and should be removed. The same considerations outlined in "complexity.limits" should be taken for this filter. Usage c(lower limit, Upper Limit). E.g. setting to c(NA,50) will not set a lower limit and remove cells with greater than 50% of reads in the top N genes. (Default: c(NA,NA))

n.topgnes

Select the number of top highly expressed genes used to calculate the percentage of reads found in these genes. E.g. a value of 20 calculates the percentage of reads found in the top 20 most highly expressed Genes. (Default: 20)

do.doublets.fitler

Use scDblFinder to identify and remove doublet cells. Doublets are defined as two cells that are sequenced under the same cellular barcode, for example, if they were captured in the same droplet. (Default: TRUE)

mad.topNgenes.limitsSet

Filter limits based on how many Median Absolute Deviations an outlier cell will have. Calculated from the Median percentage of counts in the top N Genes. Usage c(lower limit, Upper Limit). E.g. setting to c(5,5) will remove all cells with more than 5 absolute deviations greater than or 5 absolute deviations less than the median percentage. (Default: c(5,5))

Value

Seurat Object and QC plots

Details

This is Step 2 in the basic Single-Cell RNA-seq workflow. Multiple cell and gene filters can be selected to remove poor quality data and noise. Returns data as a Seurat Object, and a variaty of figues to evaluate the quality of data and the effect of applied filters.