Data dictionary

Below are the main data used in the analyses. We express our sincere appreciation to the research teams and individuals involved in conducting, processing, and sharing the experiments and data that underpin our analyses.

Summary: FCC assays done in K562

Read more about the FCC data

  • ATAC-STARR (ASTARR): sequenced fragments (proccessed by Alex and Revathy; experiment done by Keith)
  • WHG-STARR (WSTARR): sequenced fragments (proccessed by Alex; experiment done by Kari)
  • Tiling MPRA (TMPRA): raw barcode count table (proccessed by Hannah)
  • Lenti-MPRA (LMPRA): log2 fold change of each element (shared by Vikram from the Lenti-MPRA study)
  • CRISPRi-HCR FlowFISH (CRISPRi-HCRFF): z scores (proccessed by Jin Woo)
  • CRISPRi-Growth screen (CRISPRi-Growth): z scores (proccessed by Jin Woo)

ATAC-STARR-seq (ASTARR)

https://www.encodeproject.org/functional-characterization-experiments/ENCSR312UQM/

WHG-STARR-seq (WSTARR)

  • More information
    • https://www.encodeproject.org/functional-characterization-experiments/ENCSR661FOW/

Tiling MPRA (TMPRA)

ENCODE FCC K562 Tiling MPRA from Tewhey’s lab

  • ENCODE MPRA K562 OL43-GATA/MYC region
    • Homo sapiens K562, 24 hours post-nucleic acid delivery time genetically modified (episome) using transient transfection for multiple loci
  • [ENCODE MPRA]

https://www.encodeproject.org/search/?type=File&searchTerm=OL43&file_format=bed

MPRA of multiple loci in K562 Homo sapiens K562, 48 hours post-nucleic acid delivery time genetically modified (episome) using transient transfection for multiple loci Loci: FEN1, FADS1, FADS2, FADS3 Lab: Ryan Tewhey, JAX Project: ENCODE

Functional Characterization Experiment ENCSR394HXI

ryan-tewhey:mpra-OL13-fads-k562

MPRA of multiple loci in K562 Homo sapiens K562, 24 hours post-nucleic acid delivery time genetically modified (episome) using transient transfection for multiple loci Loci: GATA1, MYC Lab: Ryan Tewhey, JAX Project: ENCODE

Functional Characterization Experiment ENCSR917SFD

ryan-tewhey:mpra-OL43-gata_myc_ctrl-k562

MPRA of multiple loci in K562 Homo sapiens K562, 24 hours post-nucleic acid delivery time genetically modified (episome) using transient transfection for multiple loci Loci: LMO2, HBE1, RBM38, HBA2, BCL11A Lab: Ryan Tewhey, JAX Project: ENCODE

Functional Characterization Experiment ENCSR363XER

ryan-tewhey:mpra-OL45-common_ctrl-k562

In this analysis, we are using the unnormalized and normalized data processed and shared by Hannah from the Tewhey lab.

Lenti-MPRA (LMPRA)

Lentivirus-based MPRA (lentiMPRA) produces “in-genome” readouts (M Grace Gordon et al., 2020)

CRISPRi-HCR FlowFISH (CRISPRi-HCRFF)

CRISPRi Growth screen (CRISPRi-Growth)

Summary: CREs/Peaks identified in K562

  • CREs calls for reporter assays (STARR/MPRA)
    • Unified processing pipeline for peak calling in high-throughput reporter assays
  • CREs calls for CRISPRi-HCR FlowFISH
    • CRISPR activity screen analysis (CASA) on CRISPRi-HCR Flow-FISH data
  • CREs calls for CRISPRi-Growth screen
    • Calling DHS regions using DESeq analysis for CRISPRi-Growth
  • CRISPR Enhancer-Gene (E2G) benchmark/prediction
    • the benchmark data and prediction results of enhancer-gene linking model
  • ATAC peaks
    • MACS peaks called of the ASTARR input libraries

https://github.com/ENCODE-DCC/encValData/blob/master/as/mpra_starr.as

CREs calls for reporter assays (STARR/MPRA)

An unified processing pipeline has been developed by Junke from the Yu Lab to standardize the enhancer calling process for high-throughput reporter assays.

CREs calls for CRISPRi-HCR FlowFISH

CRISPR activity screen analysis (CASA) on CRISPRi-HCR Flow-FISH data

The CASA analysis pipeline, developed by the Sabeti Lab, has been applied to CRISPRi-HCR Flow-FISH data to identify regulatory elements. The results of significant regions in K562 FlowFISH CRISPRi screens from Reilly lab can be downloaded from the ENCODE portal (Link)

CREs calls for CRISPRi-Growth screen

Calling DHS regions using DESeq analysis for CRISPRi-Growth

For the analysis of CRISPRi-Growth data, DHS (DNase I hypersensitive sites) regions with significant effect on cell fittness have been identified using DESeq analysis, performed by Alex.

ENCODE-rE2G predictions of enhancer-gene regulatory interactions (CRISPR-E2G)

CRISPR E2G Benchmark

CRISPR E2G Prediction

  • ENCODE: ENCSR328LMT
    • Description:
      • ENCODE-rE2G (extended) predictions of enhancer-gene regulatory interactions for K562
    • Annotation type:
      • element gene regulatory interaction predictions
    • Experimental input (Assay: DNase-seq, ChIP-seq, HiC, ChIA-PET):
      • ENCSR000EOT (K562 DNase-seq)
      • ENCSR668LDD (K562 ChIP-seq: H3K4me3)
      • ENCSR000AKP (K562 ChIP-seq: H3K27ac)
      • ENCSR000EGE (K562 ChIP-seq: EP300)
      • ENCSR000DWE (K562 ChIP-seq: CTCF)
      • ENCSR597AKG (K562 ChIA-PET: CTCF)
      • ENCSR545YBD (K562 in situ Hi-C)
      • ENCSR479XDG (K562 intact Hi-C)

ATAC peaks

MACS peaks called of the ASTARR input libraries

processed by Alex

Summary: Hi-C assays done in K562

Read more about the Hi-C data

  • in-situ Hi-C
  • intact Hi-C
  • Deep intact Hi-C

in-situ Hi-C

Experiment summary for ENCSR545YBD - HiC (in situ Hi-C) - Homo sapiens K562 - K562 in situ Hi-C experiment

Summary: Genomic/Transcriptomic information of K562

  • Chromatin states (cCREs / ChromHMM)
  • TSS annotation
  • ChIP-seq (TF/Histone) data
  • TF binding modules
  • Accessible regions (ATAC/DHS regions)
  • TSS annotation
  • Gene expression (RNA-seq)

Chromatin states

Read more about the chromatin states data

ChIP-seq (TF/Histone) data

  • K562 ChIP-seq table from the ENCODE flagship

  • ENCODE ChIP-seq (Histone)

    • Link:
      • https://www.encodeproject.org/search/?type=Experiment&searchTerm=chipseq&assay_title=Histone+ChIP-seq&biosample_ontology.term_name=K562&perturbed=false&assembly=GRCh38&files.file_type=bed+narrowPeak&files.file_type=bigWig
  • ENCODE ChIP-seq (Transcription Factors)

    • Link:

TF binding modules

  • Shannon et al. look at TF binding patterns across the genome enrichment by cluster the TFs into different modules/groups.

Accessible regions (ATAC/DHS regions)

  • K562 Accessible regions (ATAC/DHS regions) used for testing assay coverage

TSS annotation

Read more about the TSS annotation used in this documentation

Gene expression (RNA-seq)

Gene annotation

Description - List of genes used as positive controls, intersection of Biomen (2014) and Hart (2015) essentials. Each entry is separated by a newline. - The scores of these genes are used as the dependent distribution for inferring dependency probability

Release Citation - Current DepMap Release data, including CRISPR Screens, PRISM Drug Screens, Copy Number, Mutation, Expression, and Fusions - DepMap, Broad (2024). DepMap 24Q2 Public. Figshare+. Dataset. https://doi.org/10.25452/figshare.plus.25880521.v1

More Information