Headshot of Zach

Zach Maas

ML + BioInformatics Researcher | CV
Graduate Student, BioFrontiers, Boulder CO

My work:

My PhD work focuses on what additional information we can get out of sequencing data, and how we can combine different protocols. I care deeply about understanding data quality and the interpretability of models that we use. Some things I do:

I also work across a range of interdisciplinary projects:

For so much of sequencing, we have a dearth of data generally and of the data we do have, quality is often an issue. Despite this, with machine learning and statistical tools we find results that can help both basic science and clinical research in a way that feels almost miraculous.

My favorite paper recently: Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. This feels like such a brilliantly obvious approach to figuring out a better way to interpret sequence models.

My Papers:

  1. Atlas of nascent RNA transcripts reveals enhancer to gene linkages
    (Under Review) We built a database of published nascent sequencing experiments up through 2020, with high quality manually annotated metadata. Using this atlas of enhancer transcripts, we find cell type specific markers as well as enhancer-to-gene regulatory linkages. I bootstrapped this project and did the initial curation of data and discovery of published experiments as well as database design.
  2. Internal and External Normalization of Nascent RNA Sequencing Run-On Experiments
    I developed a new model and formalized methods to estimate the efficiency of nascent sequencing samples in the absence of an external spike-in control. Surveying a large number of published experiments, we find that most experiments have under-sequenced spike-ins with high variability in their estimates of experimental efficiency.
  3. Deconvolution of Nascent Sequencing Data Using Transcriptional Regulatory Elements
    I developed the first approach to separate bulk nascent sequencing samples computationally, finding that estimating cell type mixing proportion is uniquely challenging for complex models. I also find that undifferentiated cell types cause an unusual failure mode for linear deconvolution models.
  4. Transcription Factor Enrichment Analysis (TFEA): Quantifying the activity of hundreds of transcription factors from a single experiment
    We built an algorithm to simultaneously infer differential activity of every transcription factor in the genome using nascent sequencing data (which works with some other protocols). I helped with method development and statistical improvements to this model, and tested it on a wide variety of data.
  5. Selective inhibition of CDK7 reveals high-confidence targets and new models for TFIIH function in transcription
    We did experiments on the inhibition of CDK7, a kinase associated with RNA Polymerase II, finding that the protein can operate as a master regulator in transcription. I worked on analysis of a variety of sequencing data as well as SILAC mass spec data in this study.
  6. TFIID enables RNA polymerase II promoter-proximal pausing We did in-vitro experiments on the human transcription pre-initiation complex and its subunit TFIID. We found that TFIID appears to be a necessary component for promoter proxmial pausing. I worked on analysis of nascent sequencing data as well as development of new tools to handle edge cases where our data caused existing models to break.