Open chromatin regions

Applying MACS on ATAC-STARR-seq input libraries

To define the accessible regions used for this studies, we applied the MACS peak calling at the ATAC-STARR input libraries.

The input libraries of ATAC-STARR libraries were built on the genomic fragments after Tn5 digestion. The input library was transfected in K562 cell for testing regulatory activities.

The sequencing of input libraries results in X billions of unique fragments across six technical replicates.

The MACS tool was applied to ASTARR input libraries to acquire the chromatin accessible regions. This process was conducted by Alex.

There are two sets of accessible regions in this - less stringent region set: union of base pairs across replicates - more stringent region set: overlap/interaction of base pairs across all replicates, only the base pairs that exist in all replicates are retained

Note: write the description of length using Mode and IQR, 90 percentile - Union: Mode = 422 bp, Median = 800 bp, 1031-435 = 596 bp, 90 percentile = 1475 bp - Overlap: Mode = 342 bp, Median = 597 bp, 909-401 = 508 bp, 90 percentile) = 1293 bp

Detail numbers

ATAC (Union)
#{Region} = 246,852
Length (Min) = 218 bp
Length (1st Q) = 435 bp
Length (Median) = 665 bp
Length (Mean) = 800 bp
Length (Mode) = 422 bp
Length (3rd Q) = 1031 bp
Length (Max) = 6251 bp
Length (90 percentile) = 1475 bp

ATAC (Overlap or intersection)
#{Region} = 150,042
Length (Min) = 1 bp
Length (1st Q) = 401 bp
Length (Median) = 597 bp
Length (Mean) = 712 bp
Length (Mode) = 342 bp
Length (3rd Q) = 909 bp
Length (Max) = 6172 bp
Length (90 percentile) = 1293 bp

Distribution of length

[TODO] Use figure panel to include the distribution of GC content and accessibility (TPM) https://quarto.org/docs/authoring/figures.html

Distribution of GC content

Distribution of TSS proximity

ENCODE K562 ATAC/DNase peaks

Distribution of length

Comparison: Chromosome distribution plots of accessible regions

Size comparison

Assay Index_Experiment Index_File Method Count_Row Count_Region
ATAC-STARR-Input ENCSR312UQM . Overlap peaks by bp across replicates 150,042 150,042
ATAC-STARR-Input ENCSR312UQM . Union peaks by bp across replicates 246,852 246,852
DNase-seq ENCSR000EKS ENCFF274YGF peaks 118,721 118,721
DNase-seq ENCSR000EOT ENCFF185XRG peaks 159,277 159,277
ATAC-seq ENCSR483RKN ENCFF558BLC pseudoreplicated peaks 203,874 107,082
ATAC-seq ENCSR483RKN ENCFF925CYR IDR thresholded peaks 123,009 51,861
ATAC-seq ENCSR868FGK ENCFF333TAT pseudoreplicated peaks 269,800 161,693
ATAC-seq ENCSR868FGK ENCFF948AFM IDR thresholded peaks 181,340 90,015