Report: Summarize ideas and findings of the analyses

Overview of functional assays and regulatory activity comparison

We first summarized the significant regions identified by STARR-seq, MPRA, and CRISPR screens. Using MACS peak calling on the ATAC-STARR-seq input library, we identified approximately 150,000 open chromatin regions. Among these, 18,341 regions exhibited regulatory activity in at least two reporter assays. Within this subset, 15,285 regions were screened by at least one CRISPR assay, and 1,652 regions were found to be significant in at least one CRISPR result. CRISPR assays demonstrated a higher sensitivity to promoters, identifying 20.2% of promoters, compared to 6.3% of enhancers and 4.7% of silencers. This finding highlights the preferences of CRISPR assays for certain regulatory contexts.

To evaluate the correlations across reporter assays, we summarized the scores at accessible regions and measured the correlations between the assays. Reporter assays exhibited correlations ranging from 0.2 to 0.6, with the strongest correlation observed between WHG-STARR-seq and tiling MPRA (Spearman correlation: r = 0.54). These results suggest that, while there is overlap in the regulatory regions detected, assay-specific differences also exist.

Our analysis revealed that regulatory activities measured by reporter assays provide a 70–80% chance of correctly distinguishing CRISPRi hits, based on a cutoff of absolute z-score ≥ 1 (AUROC: 0.7-0.8). This finding underscores the predictive potential of reporter assays for identifying functional elements in CRISPRi data. Compared to scores from reporter assays, chromatin accessibility emerged as the most predictive feature for CRISPRi signals. Ongoing analyses aim to evaluate the marginal gains in predictivity achieved by integrating data from functional characterization assays.

Compare regulatory activity across promoters, enhancers, and silencers

Functional characterization assays revealed distinct activity profiles across cis-regulatory elements. MPRA assays exhibited higher activity in promoters compared to enhancers. Similarly, CRISPRi-growth screens demonstrated stronger regulatory effects in promoters, particularly in essential genes within K562 cells. This suggested that the disruptions at promoters of essential genes had greater impacts on cell fitness than disruptions at promoters of non-essential genes.

Further comparisons of regulatory scores across chromatin states revealed that CRISPRi-Growth regions were enriched for promoter-like signatures, while CRISPRi-HCR FlowFISH highlighted regions with both proximal and distal enhancer-like signatures. These findings aligned with our observations of activity differences across promoters, enhancers, and silencers. Additionally, ATAC-STARR-seq displayed repressive effects in bivalent regions, which is also supported by negative z-scores in regions marked by H3K27me3 modification.

Although reporter assays are episomal and do not directly reflect chromatin context, we observed significant differences in their activity across regions with different chromatin states. This suggested that factors associated with chromatin states may influence the regulatory activities measured in reporter assays.

Explore heterogeneity of regulatory regions through TF occupancy and colocalization

To investigate TFs involvement in regulatory regions, our ENCODE working group had already explored the TF occupancy of cis-regulatory regions using sequence model and ChIP-seq data. We first investigated assay-specific preferences by analyzing the distribution of TF motifs using sequence models. Distinct TF motifs were enriched in specific assays, including SP1, GATA, and NFY for Lenti-MPRA; AP1 and YY1 for ATAC-STARR; and ATF for WG-STARR. The preference for promoter-associated TF motifs in regions identified by Lenti-MPRA was consistent with the observed differences in regulatory activity across promoters, enhancers, and silencers.

The working group also examined the ChIP-seq data for those enriched TFs in reporter assays. We calculated feature-enrichment scores by dividing the average ChIP-seq signal at active elements by that of inactive, GC-matched background elements. This analysis revealed significant enrichments for TFs such as EP300 (p = 7.2e-14), YY1 (p = 2.1e-14), JUN (p = 2.4e-11), and ATF (p = 5.5e-11), along with histone modifications H3K4me3 and H3K27ac at promoter elements across all assays. These results, which were highly significant (p < 0.0001, Wilcoxon rank-sum test), underscore the consistent association of these factors with active regulatory regions.

Previous studies have demonstrated that transcriptional regulation varies across regions due to specific TF binding preferences and combinations. Additionally, previous MPRA analyses of synthetic fragments have demonstrated that varying combinations of TF motifs could result in different levels of regulatory activity. Based on these findings, I would like to advance our TF analysis by integrating more than 700 ChIP-seq data in ENCODE. I explored the heterogeneity of regulatory elements by integrating over 500 genomic features, including functional characterization assays, ChIP-seq data, and transcription start site (TSS) annotations. Clustering analyses revealed subgroups of regulatory regions with distinct combinations of features. For instance, CTCF-bounded regions could be decomposed based on the combinations of chromatin organization factors, including CTCF, cohesin, and Rad21.

Ongoing analyses aim to refine our understanding of the factors that define active regulatory regions, including enhancers, promoters, and silencers. We plan to extend our ChIP-seq results to systematically identify key elements that distinguish (1) active from inactive regions and (2) subgroups of regulatory regions that may have diverse functional roles.

Identifying TF clusters associations with regulatory activity

Grouping TFs and studying the occupancy of TF clusters have been a key approach for enhancer discovery and characterization. We collaborated with an ENCODE working group applying topic modeling to cluster TFs into modules based on their binding sites. Our analyses identified TF modules that were either positively or negatively correlated with regulatory activity, providing additional results in TF colocalization in gene regulation.

Using genome-wide reporter assays, we identified the top positively and negatively correlated TF modules associated with overall regulatory effects. To further investigate the functional relevance of these clusters, we performed enrichment analyses based on protein domain and family annotations. We found bZIP (IPR004827, IPR046347) and ETS (IPR046328, IPR000418) protein domain enriched in top positive correlated TF modules while Rad21 (IPR006909, IPR006910), SMC/SMC3 (IPR010935, IPR041741, IPR036277, IPR024704), homeobox (IPR008422, IPR001356, IPR017970, IPR009057, IPR008422), and histone deacetylase (IPR003084) were enriched in top negative correlated TF modules. These findings were consistent with the enrichment of AP1 motifs identified in our sequence model analyses and aligned with the repressive effects of chromatin organization reported by the ENCODE MPRA working group.

Chromatin Organization and Looping

The detection of CTCF and cohesin binding sites in regions with discordant signals, such as the FADS loci, led us to hypothesize a potential influence of chromatin organization on assay outcomes. To test the hypothesis, we constructed chromatin interaction networks based on accessible regions and loop calls in Hi-C data. We observed the enrichment of physical interactions in regulatory regions identified in each assay. In addition, among the assays, we also noticed that regulatory regions identified in CRISPRi-HCR FlowFISH assay had the highest frequency of looped regions compared to its background. Further aggregate peak analysis (APA) also showed an enrichment of Hi-C signals at the enhancer-promoter pairs identified in CRISPRi-HCR FlowFISH. These observations suggested a potential correlation between chromatin looping and the detection capabilities of different assays, highlighting the importance of chromatin architecture in the functional characterization of regulatory elements. Ongoing effort aim to extend the APA analysis to regulatory regions identified by other functional characterization assays.