HuBMAP Pipelines

HIVE pipelines serve as a common processing framework for data collected from multiple sites / centers. Our primary criteria for prioritizing pipeline development is whether a pipeline can uniformly process data or whether a datatype is generated by multiple HuBMAP sites (even if using different platforms or techniques).

Decisions on pipelines we focus on is between the data providing site and pipeline team lead. Anyone may propose a pipeline for development and should contact the shared pipeline team to make a determination. The following additional considerations affect prioritization of pipelines:

  1. Does the data fall under the definition of what would contribute to a 3D biomolecular map?
  2. Does the data and metadata meet quality requirements?
  3. Does the data add to what already exists in the portal and would investigators outside the consortium potentially be interested in it?
  4. Does the pipeline development team have the capacity and methods to develop the pipeline?

CODEX

The HuBMAP Consortium CODEX pipeline uses Cytokit to process CODEX datasets from raw data to OME-TIFF compliant segmentation results and compiled antigen fluorescence images.

SPRM – Spatial Process & Relationship Modeling

SPRM is a statistical modeling program used to calculate a range of descriptors from multichannel images. It can be used for any type of multichannel 2D or 3D image (e.g., CODEX, IMS).

Single-cell RNA sequencing

HuBMAP single-cell RNA-seq data sets are processed with a two-stage pipeline, using Salmon for transcript quantification and Scanpy for secondary analysis. This pipeline is implemented in CWL, calling command-line tools encapsulated in Docker containers.

Single-cell ATAC-seq

The HuBMAP Consortium uses a three-stage pipeline for scATAC-seq data sets, composed of SnapTools, SnapATAC, and chromVAR. This pipeline is written in CWL, calling command-line tools encapsulated in Docker containers.

Bulk RNA sequencing

HuBMAP single-cell RNA-seq data sets are processed using Salmon for transcript quantification. This pipeline is implemented in CWL, calling command-line tools encapsulated in Docker containers.

Bulk ATAC-seq

The HuBMAP Consortium uses a two-stage pipeline for scATAC-seq data sets, composed of SnapTools and MACS2 This pipeline is written in CWL, calling command-line tools encapsulated in Docker containers.