Computational Research

SAIL computational team strives to make emerging statistical and computational techniques in single cell analytics accessible to the wider MSKCC community. To achieve this goal, our team takes three major responsibilities: 1) standardized data access and processing, 2) data analytics and methods development, and 3) organizing single-cell workshops.

At SAIL, we process single-cell omics data of different modalities every day. As such, we maintain a robust data infrastructure responsible for managing, storing, and processing the single cell genomics and imaging experiments entrusted with our team. We also develop new and modified bioinformatics pipelines in order to accommodate the cutting-edge technologies being developed by SAIL’s experimental side. We primarily utilize Amazon Web Services to store and process data; if you work with SAIL and need any assistance in accessing your data or have any questions, please feel free to attend our weekly office hours where our team is available to provide guidance and support. We also make heavy use of the high-performance computing (HPC) resources available at MSK and are proud users and contributors to the multimodal data platform, Isabl.

Once the data has been processed, SAIL provides various forms of support to enable our collaborators to extract novel biological insights from their data. For example, we have been involved in driving the computational analysis of collected data in close-collaboration with the biological/wet lab group. This typically involves in-depth analysis of the data in which we adapt existing computational tools to better characterize the data and understand the biology (see Publications page for more). SAIL also benchmarks new tools developed by the single-cell field and aspires to make the most effective methods available to the single-cell community at MSKCC.

We believe that the biologist who collected the data possesses intuition about the data that should be used to effectively guide computation and analysis of the data. With this in mind, we put substantial efforts to train and educate wet-lab biologists at MSK to be able to analyze their data on their own. As part of this, SAIL regularly organizes single-cell data analysis workshops in which wet-lab biologists (graduate students and post-docs) can participate to learn more about novel computational methods being developed in the field and implement them on their own for their data. We have compiled Python notebooks with implementations and explanations of state-of-the-art methods to use for single-cell data analysis (such as normalization, clustering, cell typing, trajectory analysis). These notebooks provide a platform for our collaborators to build their analysis. Furthermore, as a more continuous support system beyond workshops, we provide consultation services during weekly office hours to help guide the analyses and provide ideas.

SAIL office hours

(1) Open for general MSKCC labsProviding computational guidance / support on your own single-cell data analysis project (scRNA-seq, scATAC-seq).

Host: Roshan Sharma; Booking: here.

(2) Only open for labs with data generated by SAIL: Addressing your data access needs including AWS setup, data access and processing, sequence re-alignment etc. 

Host: Andrew Moorman; Booking: here.

Please note that we do not provide support for HPC or other IT related questions.

Data Analysis Collaboration

We adapt and develop novel computational tools to perform analysis of single cell data and extract biological insights. For this, we collaborate closely with several research labs at MSKCC to carry out an in-depth analysis of the data. We value the strength in biological insights to guide statistical analyses that enables us to understand and gain new knowledge about the data. We work with various modalities of genomics data such as single-cell RNA-sequencing, single-cell CITE-sequencing, single-cell ATAC-sequencing and spatial genomics. Please click here for some of our publications

Data Pre-Processing

We also provide computational support to research labs at MSKCC to genome align and process the sequenced reads. We integrate existing and develop new pipelines to process data from several established and novel technologies. Furthermore, we provide resources such as supplementary
notebooks to enable our collaborators to perform quality control and downstream analysis of their data on their own.

Single-cell Data Analysis Workshop

We frequently organize single-cell data analysis workshops to help experimental scientists at MSKCC to develop computational skills to empower them to analyze the data they collect. We firmly believe that scientists who design the experiment and collect data have a deeper understanding of the biology that can help guide computational analysis. The workshops consist of a thorough discussion (with Python code and examples from literature) of existing computational tools used in single-cell data analysis. All the code implemented in the workshop is also made available with extensive documentation. You can view the materials from previous workshops here:

  1. Workshop-2021
  2. Workshop-2023_February/March
  3. Workshop-2023_December_phase1
  4. Workshop-2024 February phase 2: This is the most recent single-cell RNA + ATAC-seq data analysis workshop we have organized at MSKCC. Here, we continue from phase 1 to discuss more advanced data analysis tools.

Workshops for scientists outside MSKCC: In March, 2024, we organized a week long NYC wide single-cell data analysis workshop called SCALE. We invited 25 participants and the workshop included presentations followed by "hands-on" sessions where the participants implemented the discussed code. We focused on some of the important data analyses tools for analyzing scRNA-seq data. The workshop was co-organized by SAIL with the Office of Scientific Education and Training and Dr. Richard Koche from Epigenomics Innovation Research Lab at MSKCC. The slides and code from the SCALE workshop are accessible here. We hope to continue these efforts into the future and further broaden the modalities discussed.