Prepare ENCODE ATAC/DNase Peaks 01

Generate download script to download the data

Set environment

Code
suppressMessages(suppressWarnings(source("../run_config_project_sing.R")))
show_env()
You are working on        Singularity: singularity_proj_encode_fcc 
BASE DIRECTORY (FD_BASE): /data/reddylab/Kuei 
REPO DIRECTORY (FD_REPO): /data/reddylab/Kuei/repo 
WORK DIRECTORY (FD_WORK): /data/reddylab/Kuei/work 
DATA DIRECTORY (FD_DATA): /data/reddylab/Kuei/data 

You are working with      ENCODE FCC 
PATH OF PROJECT (FD_PRJ): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC 
PROJECT RESULTS (FD_RES): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results 
PROJECT SCRIPTS (FD_EXE): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts 
PROJECT DATA    (FD_DAT): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data 
PROJECT NOTE    (FD_NBK): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks 
PROJECT DOCS    (FD_DOC): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs 
PROJECT LOG     (FD_LOG): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log 
PROJECT REF     (FD_REF): /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references 

Set global variables

Code
TXT_FOLDER_REF = "encode_open_chromatin"
TXT_FOLDER_OUT = "encode_open_chromatin"
Code
txt_fdiry  = file.path(FD_REF, TXT_FOLDER_REF)
vec = dir(txt_fdiry)
for (txt in vec){cat(txt, "\n")}
ENCODE_K562_hg38_ATAC_DNase.tsv 
K562.ENCSR000EKS.ENCAN694OCK.metadata.tsv 
K562.ENCSR000EOT.ENCAN780RWD.metadata.tsv 
K562.ENCSR483RKN.ENCAN217QUL.metadata.tsv 
K562.ENCSR868FGK.ENCAN824UKX.metadata.tsv 

Import metadata from reference file

Code
### set file path
txt_folder = TXT_FOLDER_REF
txt_fdiry  = file.path(FD_REF, txt_folder)
txt_fname = "ENCODE_K562_hg38_ATAC_DNase.tsv"
txt_fpath = file.path(txt_fdiry, txt_fname)

### read table
dat = read_tsv(txt_fpath, show_col_types = FALSE)

### show and assign
dat_metadata_selected = dat
print(dim(dat))
fun_display_table(dat)
[1] 10 11
Assay Biosample Index_Experiment Index_Process Index_File File_Type Output_Type isogenic_replicate Genome File_Summary Lab
ATAC K562 ENCSR868FGK ENCODE4 v1.9.1 GRCh38 (ENCAN824UKX) ENCFF357GNC bigWig signal p-value 1, 2, 3 hg38 call-macs2_signal_track_pooled/rep.pooled.pval.signal.bigwig Michael Snyder, Stanford
ATAC K562 ENCSR868FGK ENCODE4 v1.9.1 GRCh38 (ENCAN824UKX) ENCFF333TAT bed narrowPeak pseudoreplicated peaks 1, 2, 3 hg38 call-overlap_ppr/pooled-pr1_vs_pooled-pr2.overlap.bfilt.narrowPeak.gz Michael Snyder, Stanford
ATAC K562 ENCSR868FGK ENCODE4 v1.9.1 GRCh38 (ENCAN824UKX) ENCFF948AFM bed narrowPeak IDR thresholded peaks 1, 2, 3 hg38 call-idr_ppr/pooled-pr1_vs_pooled-pr2.idr0.05.bfilt.narrowPeak.gz Michael Snyder, Stanford
ATAC K562 ENCSR483RKN ENCODE4 v1.9.1 GRCh38 (ENCAN217QUL) ENCFF600FDO bigWig signal p-value 1, 2 hg38 call-macs2_signal_track_pooled/rep.pooled.pval.signal.bigwig Michael Snyder, Stanford
ATAC K562 ENCSR483RKN ENCODE4 v1.9.1 GRCh38 (ENCAN217QUL) ENCFF558BLC bed narrowPeak pseudoreplicated peaks 1, 2 hg38 call-overlap_ppr/pooled-pr1_vs_pooled-pr2.overlap.bfilt.narrowPeak.gz Michael Snyder, Stanford
ATAC K562 ENCSR483RKN ENCODE4 v1.9.1 GRCh38 (ENCAN217QUL) ENCFF925CYR bed narrowPeak IDR thresholded peaks 1, 2 hg38 call-idr_ppr/pooled-pr1_vs_pooled-pr2.idr0.05.bfilt.narrowPeak.gz Michael Snyder, Stanford
DNase K562 ENCSR000EKS ENCODE4 v3.0.0 GRCh38 (ENCAN694OCK) ENCFF972GVB bigWig read-depth normalized signal 1 hg38 call-starch_to_bigwig/normalized.nuclear.0.05.density.bw Gregory Crawford, Duke
DNase K562 ENCSR000EKS ENCODE4 v3.0.0 GRCh38 (ENCAN694OCK) ENCFF274YGF bed narrowPeak peaks 1 hg38 call-compress/nuclear.0.001.peaks.narrowpeaks.bed.gz Gregory Crawford, Duke
DNase K562 ENCSR000EOT ENCODE4 v3.0.0-alpha.2 GRCh38 (ENCAN780RWD) ENCFF414OGC bigWig read-depth normalized signal 1 hg38 call-starch_to_bigwig/normalized.nuclear.0.05.density.bw John Stamatoyannopoulos, UW
DNase K562 ENCSR000EOT ENCODE4 v3.0.0-alpha.2 GRCh38 (ENCAN780RWD) ENCFF185XRG bed narrowPeak peaks 1 hg38 call-compress/nuclear.0.001.peaks.narrowpeaks.bed.gz John Stamatoyannopoulos, UW
Code
### set file path
txt_folder = TXT_FOLDER_REF
txt_fdiry  = file.path(FD_REF, txt_folder)
txt_fname = "*metadata.tsv"
txt_fglob = file.path(txt_fdiry, txt_fname)
vec_txt_fpath = Sys.glob(txt_fglob)

### read table
lst = lapply(vec_txt_fpath, function(txt_fpath){
    dat = read_tsv(
        txt_fpath, 
        show_col_types = FALSE,
        col_types = cols(
            `Biological replicate(s)` = col_character(),
            .default = col_guess()
        )
    )
    return(dat)
})
dat = bind_rows(lst)

### show and assign
dat_metadata_import = dat
print(dim(dat))
fun_display_table(head(dat))
[1] 85 59
File accession File format File type File format type Output type File assembly Experiment accession Assay Donor(s) Biosample term id Biosample term name Biosample type Biosample organism Biosample treatments Biosample treatments amount Biosample treatments duration Biosample genetic modifications methods Biosample genetic modifications categories Biosample genetic modifications targets Biosample genetic modifications gene targets Biosample genetic modifications site coordinates Biosample genetic modifications zygosity Experiment target Library made from Library depleted in Library extraction method Library lysis method Library crosslinking method Library strand specific Experiment date released Project RBNS protein concentration Library fragmentation method Library size range Biological replicate(s) Technical replicate(s) Read length Mapped read length Run type Paired end Paired with Index of Derived from Size Lab md5sum dbxrefs File download URL Genome annotation Platform Controlled by File Status s3_uri Azure URL File analysis title File analysis status Audit WARNING Audit NOT_COMPLIANT Audit ERROR
ENCFF070TML bigBed narrowPeak bigBed narrowPeak peaks GRCh38 ENCSR000EKS DNase-seq /human-donors/ENCDO000AAD/ EFO:0002067 K562 cell line Homo sapiens NA NA NA NA NA NA NA NA NA NA DNA NA NA NA NA NA 2011-04-19 ENCODE NA see document NA 1 1_1, 1_2, 1_3 NA NA NA NA NA NA /files/ENCFF274YGF/ 2141861 ENCODE Processing Pipeline 9a8290ef2eec9deb327d002be3c1b224 NA https://www.encodeproject.org/files/ENCFF070TML/@@download/ENCFF070TML.bigBed NA NA NA released s3://encode-public/2020/11/22/9cad70d7-20c7-4f15-a615-86a97c6a85f6/ENCFF070TML.bigBed https://datasetencode.blob.core.windows.net/dataset/2020/11/22/9cad70d7-20c7-4f15-a615-86a97c6a85f6/ENCFF070TML.bigBed?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D ENCODE4 v3.0.0 GRCh38 released low spot score, mixed read lengths NA NA
ENCFF972GVB bigWig bigWig NA read-depth normalized signal GRCh38 ENCSR000EKS DNase-seq /human-donors/ENCDO000AAD/ EFO:0002067 K562 cell line Homo sapiens NA NA NA NA NA NA NA NA NA NA DNA NA NA NA NA NA 2011-04-19 ENCODE NA see document NA 1 1_1, 1_2, 1_3 NA NA NA NA NA NA /files/ENCFF635XTF/, /files/ENCFF257HEE/ 747881155 ENCODE Processing Pipeline 1b0432087f9087c0a9e4f0f5a9d08deb NA https://www.encodeproject.org/files/ENCFF972GVB/@@download/ENCFF972GVB.bigWig NA NA NA released s3://encode-public/2020/11/22/953ca40d-376f-4043-8dc9-710dd64a5ea9/ENCFF972GVB.bigWig https://datasetencode.blob.core.windows.net/dataset/2020/11/22/953ca40d-376f-4043-8dc9-710dd64a5ea9/ENCFF972GVB.bigWig?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D ENCODE4 v3.0.0 GRCh38 released low spot score, mixed read lengths NA NA
ENCFF274YGF bed narrowPeak bed narrowPeak peaks GRCh38 ENCSR000EKS DNase-seq /human-donors/ENCDO000AAD/ EFO:0002067 K562 cell line Homo sapiens NA NA NA NA NA NA NA NA NA NA DNA NA NA NA NA NA 2011-04-19 ENCODE NA see document NA 1 1_1, 1_2, 1_3 NA NA NA NA NA NA /files/ENCFF635XTF/, /files/ENCFF257HEE/ 1348006 ENCODE Processing Pipeline 9262ab2b89cd60d05deb831cdd6b509e NA https://www.encodeproject.org/files/ENCFF274YGF/@@download/ENCFF274YGF.bed.gz NA NA NA released s3://encode-public/2020/11/22/e2abbea1-48ca-4895-b2da-82c7e622fde9/ENCFF274YGF.bed.gz https://datasetencode.blob.core.windows.net/dataset/2020/11/22/e2abbea1-48ca-4895-b2da-82c7e622fde9/ENCFF274YGF.bed.gz?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D ENCODE4 v3.0.0 GRCh38 released low spot score, mixed read lengths NA NA
ENCFF414OGC bigWig bigWig NA read-depth normalized signal GRCh38 ENCSR000EOT DNase-seq /human-donors/ENCDO000AAD/ EFO:0002067 K562 cell line Homo sapiens NA NA NA NA NA NA NA NA NA NA DNA NA NA NA NA NA 2018-05-09 ENCODE NA NA NA 1 1_1, 1_2 NA NA NA NA NA NA /files/ENCFF205FNC/, /files/ENCFF180EJG/ 703947460 ENCODE Processing Pipeline ac6a28c1889b241d0f4f9ba28a1e514d NA https://www.encodeproject.org/files/ENCFF414OGC/@@download/ENCFF414OGC.bigWig NA NA NA released s3://encode-public/2020/11/18/e5e1f67e-5af2-484a-b399-23bd39f2352d/ENCFF414OGC.bigWig https://datasetencode.blob.core.windows.net/dataset/2020/11/18/e5e1f67e-5af2-484a-b399-23bd39f2352d/ENCFF414OGC.bigWig?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D ENCODE4 v3.0.0-alpha.2 GRCh38 released NA NA extremely low read depth
ENCFF327DFG bigBed narrowPeak bigBed narrowPeak peaks GRCh38 ENCSR000EOT DNase-seq /human-donors/ENCDO000AAD/ EFO:0002067 K562 cell line Homo sapiens NA NA NA NA NA NA NA NA NA NA DNA NA NA NA NA NA 2018-05-09 ENCODE NA NA NA 1 1_1, 1_2 NA NA NA NA NA NA /files/ENCFF185XRG/ 2742316 ENCODE Processing Pipeline 4995a0e5fe97e15cfd57b9d3fe0b243b NA https://www.encodeproject.org/files/ENCFF327DFG/@@download/ENCFF327DFG.bigBed NA NA NA released s3://encode-public/2020/11/18/0a02f295-c201-4022-b70d-f57e57883d92/ENCFF327DFG.bigBed https://datasetencode.blob.core.windows.net/dataset/2020/11/18/0a02f295-c201-4022-b70d-f57e57883d92/ENCFF327DFG.bigBed?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D ENCODE4 v3.0.0-alpha.2 GRCh38 released NA NA extremely low read depth
ENCFF185XRG bed narrowPeak bed narrowPeak peaks GRCh38 ENCSR000EOT DNase-seq /human-donors/ENCDO000AAD/ EFO:0002067 K562 cell line Homo sapiens NA NA NA NA NA NA NA NA NA NA DNA NA NA NA NA NA 2018-05-09 ENCODE NA NA NA 1 1_1, 1_2 NA NA NA NA NA NA /files/ENCFF205FNC/, /files/ENCFF180EJG/ 1821516 ENCODE Processing Pipeline 04653af177917b3dda96b9454fd8f90e NA https://www.encodeproject.org/files/ENCFF185XRG/@@download/ENCFF185XRG.bed.gz NA NA NA released s3://encode-public/2020/11/18/977b8c4d-588a-47dc-9371-b702ce9715c2/ENCFF185XRG.bed.gz https://datasetencode.blob.core.windows.net/dataset/2020/11/18/977b8c4d-588a-47dc-9371-b702ce9715c2/ENCFF185XRG.bed.gz?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D ENCODE4 v3.0.0-alpha.2 GRCh38 released NA NA extremely low read depth

Explore table

Check genome assembly

Code
dat = dat_metadata_import
table(dat$`File assembly`)

GRCh38 
    85 

Check biosample

Code
dat = dat_metadata_import
table(dat$`Biosample term name`)

K562 
  85 

Check data

Code
vec1 = dat_metadata_selected$Index_File
vec2 = dat_metadata_import$`File accession`
all(vec1 %in% vec2)
TRUE

Arrange metadata tables

Helper function

Code
fun_simplify_table = function(dat){
    
    ### rename some columns
    dat = dat %>% 
        dplyr::mutate(
            Index_Experiment = `Experiment accession`,
            Index_File       = `File accession`,
            File_Format      = `File format`,
            File_Type        = `File type`,
            Output_Type      = `Output type`,
            Genome           = `File assembly`,
            #Target           = str_remove(`Experiment target`, "-human"),
            Bio_Replicates   = `Biological replicate(s)`,
            Analysis         = `File analysis title`,
            File_Name        = basename(`File download URL`),
            File_URL         = `File download URL`
        )
    
    ### select the needed columns
    dat = dat %>%
        dplyr::select(
            Assay,
            Index_Experiment,
            Index_File,
            File_Format,
            File_Type,
            Output_Type,
            Genome,
            #Target,
            Bio_Replicates,
            Analysis,
            md5sum,
            File_Name,
            File_URL
        )

    ### return the simplified table
    return(dat)
}

Simplify the metatable

Code
### arrange and simplify the table
dat = dat_metadata_import
dat = fun_simplify_table(dat)

### subset by selected files
vec = dat_metadata_selected$Index_File
dat = dat %>% dplyr::filter(Index_File %in% vec)

### assign and show
dat_metadata_simplify = dat
print(dim(dat))
fun_display_table(dat)
[1] 10 12
Assay Index_Experiment Index_File File_Format File_Type Output_Type Genome Bio_Replicates Analysis md5sum File_Name File_URL
DNase-seq ENCSR000EKS ENCFF972GVB bigWig bigWig read-depth normalized signal GRCh38 1 ENCODE4 v3.0.0 GRCh38 1b0432087f9087c0a9e4f0f5a9d08deb ENCFF972GVB.bigWig https://www.encodeproject.org/files/ENCFF972GVB/@@download/ENCFF972GVB.bigWig
DNase-seq ENCSR000EKS ENCFF274YGF bed narrowPeak bed peaks GRCh38 1 ENCODE4 v3.0.0 GRCh38 9262ab2b89cd60d05deb831cdd6b509e ENCFF274YGF.bed.gz https://www.encodeproject.org/files/ENCFF274YGF/@@download/ENCFF274YGF.bed.gz
DNase-seq ENCSR000EOT ENCFF414OGC bigWig bigWig read-depth normalized signal GRCh38 1 ENCODE4 v3.0.0-alpha.2 GRCh38 ac6a28c1889b241d0f4f9ba28a1e514d ENCFF414OGC.bigWig https://www.encodeproject.org/files/ENCFF414OGC/@@download/ENCFF414OGC.bigWig
DNase-seq ENCSR000EOT ENCFF185XRG bed narrowPeak bed peaks GRCh38 1 ENCODE4 v3.0.0-alpha.2 GRCh38 04653af177917b3dda96b9454fd8f90e ENCFF185XRG.bed.gz https://www.encodeproject.org/files/ENCFF185XRG/@@download/ENCFF185XRG.bed.gz
ATAC-seq ENCSR483RKN ENCFF600FDO bigWig bigWig signal p-value GRCh38 1, 2 ENCODE4 v1.9.1 GRCh38 044f8568fb4c7735cf01e4f8d0d3e5b9 ENCFF600FDO.bigWig https://www.encodeproject.org/files/ENCFF600FDO/@@download/ENCFF600FDO.bigWig
ATAC-seq ENCSR483RKN ENCFF558BLC bed narrowPeak bed pseudoreplicated peaks GRCh38 1, 2 ENCODE4 v1.9.1 GRCh38 3cd5e262b185a5459cfeaa5a2f1a2f18 ENCFF558BLC.bed.gz https://www.encodeproject.org/files/ENCFF558BLC/@@download/ENCFF558BLC.bed.gz
ATAC-seq ENCSR483RKN ENCFF925CYR bed narrowPeak bed IDR thresholded peaks GRCh38 1, 2 ENCODE4 v1.9.1 GRCh38 a2b384cec478736092661d14eef3cef9 ENCFF925CYR.bed.gz https://www.encodeproject.org/files/ENCFF925CYR/@@download/ENCFF925CYR.bed.gz
ATAC-seq ENCSR868FGK ENCFF357GNC bigWig bigWig signal p-value GRCh38 1, 2, 3 ENCODE4 v1.9.1 GRCh38 00b10e391e8f2361990a9b4f94ac055c ENCFF357GNC.bigWig https://www.encodeproject.org/files/ENCFF357GNC/@@download/ENCFF357GNC.bigWig
ATAC-seq ENCSR868FGK ENCFF948AFM bed narrowPeak bed IDR thresholded peaks GRCh38 1, 2, 3 ENCODE4 v1.9.1 GRCh38 76bc2c332fd638d7cb45490d5404e352 ENCFF948AFM.bed.gz https://www.encodeproject.org/files/ENCFF948AFM/@@download/ENCFF948AFM.bed.gz
ATAC-seq ENCSR868FGK ENCFF333TAT bed narrowPeak bed pseudoreplicated peaks GRCh38 1, 2, 3 ENCODE4 v1.9.1 GRCh38 0f7a6c13e23c2e3fc8716153a89ed481 ENCFF333TAT.bed.gz https://www.encodeproject.org/files/ENCFF333TAT/@@download/ENCFF333TAT.bed.gz

Prepare download files

Helper function

Code
fun_map_assay_label = function(txt){
    vec1 = c("ATAC-seq", "DNase-seq")
    vec2 = c("ATAC",     "DNase")
    res  = fun_str_map_detect(txt, vec1, vec2, .default=txt)
    return(res)
}
Code
fun_map_file_ext = function(txt){
    vec1 = c("bigWig", "bed narrowPeak")
    vec2 = c("bw",     "bed.gz")
    res  = fun_str_map_detect(txt, vec1, vec2, .default=txt)
    return(res)
}

Rename filename

Code
### rename filename
dat = dat_metadata_simplify
dat = dat %>% dplyr::mutate(
    File_Name = paste(
        "K562",
        "hg38",
        Index_Experiment,
        Index_File,
        fun_map_assay_label(Assay),
        fun_map_file_ext(File_Format),
    sep = ".")
)

### assign and show
dat_metadata_arrange = dat
fun_display_table(dat)
Assay Index_Experiment Index_File File_Format File_Type Output_Type Genome Bio_Replicates Analysis md5sum File_Name File_URL
DNase-seq ENCSR000EKS ENCFF972GVB bigWig bigWig read-depth normalized signal GRCh38 1 ENCODE4 v3.0.0 GRCh38 1b0432087f9087c0a9e4f0f5a9d08deb K562.hg38.ENCSR000EKS.ENCFF972GVB.DNase.bw https://www.encodeproject.org/files/ENCFF972GVB/@@download/ENCFF972GVB.bigWig
DNase-seq ENCSR000EKS ENCFF274YGF bed narrowPeak bed peaks GRCh38 1 ENCODE4 v3.0.0 GRCh38 9262ab2b89cd60d05deb831cdd6b509e K562.hg38.ENCSR000EKS.ENCFF274YGF.DNase.bed.gz https://www.encodeproject.org/files/ENCFF274YGF/@@download/ENCFF274YGF.bed.gz
DNase-seq ENCSR000EOT ENCFF414OGC bigWig bigWig read-depth normalized signal GRCh38 1 ENCODE4 v3.0.0-alpha.2 GRCh38 ac6a28c1889b241d0f4f9ba28a1e514d K562.hg38.ENCSR000EOT.ENCFF414OGC.DNase.bw https://www.encodeproject.org/files/ENCFF414OGC/@@download/ENCFF414OGC.bigWig
DNase-seq ENCSR000EOT ENCFF185XRG bed narrowPeak bed peaks GRCh38 1 ENCODE4 v3.0.0-alpha.2 GRCh38 04653af177917b3dda96b9454fd8f90e K562.hg38.ENCSR000EOT.ENCFF185XRG.DNase.bed.gz https://www.encodeproject.org/files/ENCFF185XRG/@@download/ENCFF185XRG.bed.gz
ATAC-seq ENCSR483RKN ENCFF600FDO bigWig bigWig signal p-value GRCh38 1, 2 ENCODE4 v1.9.1 GRCh38 044f8568fb4c7735cf01e4f8d0d3e5b9 K562.hg38.ENCSR483RKN.ENCFF600FDO.ATAC.bw https://www.encodeproject.org/files/ENCFF600FDO/@@download/ENCFF600FDO.bigWig
ATAC-seq ENCSR483RKN ENCFF558BLC bed narrowPeak bed pseudoreplicated peaks GRCh38 1, 2 ENCODE4 v1.9.1 GRCh38 3cd5e262b185a5459cfeaa5a2f1a2f18 K562.hg38.ENCSR483RKN.ENCFF558BLC.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF558BLC/@@download/ENCFF558BLC.bed.gz
ATAC-seq ENCSR483RKN ENCFF925CYR bed narrowPeak bed IDR thresholded peaks GRCh38 1, 2 ENCODE4 v1.9.1 GRCh38 a2b384cec478736092661d14eef3cef9 K562.hg38.ENCSR483RKN.ENCFF925CYR.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF925CYR/@@download/ENCFF925CYR.bed.gz
ATAC-seq ENCSR868FGK ENCFF357GNC bigWig bigWig signal p-value GRCh38 1, 2, 3 ENCODE4 v1.9.1 GRCh38 00b10e391e8f2361990a9b4f94ac055c K562.hg38.ENCSR868FGK.ENCFF357GNC.ATAC.bw https://www.encodeproject.org/files/ENCFF357GNC/@@download/ENCFF357GNC.bigWig
ATAC-seq ENCSR868FGK ENCFF948AFM bed narrowPeak bed IDR thresholded peaks GRCh38 1, 2, 3 ENCODE4 v1.9.1 GRCh38 76bc2c332fd638d7cb45490d5404e352 K562.hg38.ENCSR868FGK.ENCFF948AFM.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF948AFM/@@download/ENCFF948AFM.bed.gz
ATAC-seq ENCSR868FGK ENCFF333TAT bed narrowPeak bed pseudoreplicated peaks GRCh38 1, 2, 3 ENCODE4 v1.9.1 GRCh38 0f7a6c13e23c2e3fc8716153a89ed481 K562.hg38.ENCSR868FGK.ENCFF333TAT.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF333TAT/@@download/ENCFF333TAT.bed.gz

Check results

Code
dat = dat_metadata_arrange
dat = dat %>% dplyr::select(Index_Experiment, Index_File, File_Format, File_Name)
fun_display_table(dat)
Index_Experiment Index_File File_Format File_Name
ENCSR000EKS ENCFF972GVB bigWig K562.hg38.ENCSR000EKS.ENCFF972GVB.DNase.bw
ENCSR000EKS ENCFF274YGF bed narrowPeak K562.hg38.ENCSR000EKS.ENCFF274YGF.DNase.bed.gz
ENCSR000EOT ENCFF414OGC bigWig K562.hg38.ENCSR000EOT.ENCFF414OGC.DNase.bw
ENCSR000EOT ENCFF185XRG bed narrowPeak K562.hg38.ENCSR000EOT.ENCFF185XRG.DNase.bed.gz
ENCSR483RKN ENCFF600FDO bigWig K562.hg38.ENCSR483RKN.ENCFF600FDO.ATAC.bw
ENCSR483RKN ENCFF558BLC bed narrowPeak K562.hg38.ENCSR483RKN.ENCFF558BLC.ATAC.bed.gz
ENCSR483RKN ENCFF925CYR bed narrowPeak K562.hg38.ENCSR483RKN.ENCFF925CYR.ATAC.bed.gz
ENCSR868FGK ENCFF357GNC bigWig K562.hg38.ENCSR868FGK.ENCFF357GNC.ATAC.bw
ENCSR868FGK ENCFF948AFM bed narrowPeak K562.hg38.ENCSR868FGK.ENCFF948AFM.ATAC.bed.gz
ENCSR868FGK ENCFF333TAT bed narrowPeak K562.hg38.ENCSR868FGK.ENCFF333TAT.ATAC.bed.gz

Checksum table

Code
### get md5sum for each file
dat = dat_metadata_arrange
dat = dat %>% dplyr::select(md5sum, File_Name)

### assign and show
dat_download_checksum = dat
fun_display_table(dat)
md5sum File_Name
1b0432087f9087c0a9e4f0f5a9d08deb K562.hg38.ENCSR000EKS.ENCFF972GVB.DNase.bw
9262ab2b89cd60d05deb831cdd6b509e K562.hg38.ENCSR000EKS.ENCFF274YGF.DNase.bed.gz
ac6a28c1889b241d0f4f9ba28a1e514d K562.hg38.ENCSR000EOT.ENCFF414OGC.DNase.bw
04653af177917b3dda96b9454fd8f90e K562.hg38.ENCSR000EOT.ENCFF185XRG.DNase.bed.gz
044f8568fb4c7735cf01e4f8d0d3e5b9 K562.hg38.ENCSR483RKN.ENCFF600FDO.ATAC.bw
3cd5e262b185a5459cfeaa5a2f1a2f18 K562.hg38.ENCSR483RKN.ENCFF558BLC.ATAC.bed.gz
a2b384cec478736092661d14eef3cef9 K562.hg38.ENCSR483RKN.ENCFF925CYR.ATAC.bed.gz
00b10e391e8f2361990a9b4f94ac055c K562.hg38.ENCSR868FGK.ENCFF357GNC.ATAC.bw
76bc2c332fd638d7cb45490d5404e352 K562.hg38.ENCSR868FGK.ENCFF948AFM.ATAC.bed.gz
0f7a6c13e23c2e3fc8716153a89ed481 K562.hg38.ENCSR868FGK.ENCFF333TAT.ATAC.bed.gz

Generate download scripts

wget -O FILE URL
Code
### setup download file wget command
dat = dat_metadata_arrange
dat = dat %>% dplyr::mutate(
        CMD = paste(
            "wget", "--append-output=run_download.log.txt", "-O", File_Name, File_URL
        )
    )

### add Shebang and initial commands
dat = dat %>% dplyr::select(CMD)
dat = rbind('echo -n "" > run_download.log.txt', dat)
colnames(dat) = "#!/bin/bash"

### assign and show
dat_download_script = dat
fun_display_table(dat)
#!/bin/bash
echo -n "" > run_download.log.txt
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR000EKS.ENCFF972GVB.DNase.bw https://www.encodeproject.org/files/ENCFF972GVB/@@download/ENCFF972GVB.bigWig
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR000EKS.ENCFF274YGF.DNase.bed.gz https://www.encodeproject.org/files/ENCFF274YGF/@@download/ENCFF274YGF.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR000EOT.ENCFF414OGC.DNase.bw https://www.encodeproject.org/files/ENCFF414OGC/@@download/ENCFF414OGC.bigWig
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR000EOT.ENCFF185XRG.DNase.bed.gz https://www.encodeproject.org/files/ENCFF185XRG/@@download/ENCFF185XRG.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR483RKN.ENCFF600FDO.ATAC.bw https://www.encodeproject.org/files/ENCFF600FDO/@@download/ENCFF600FDO.bigWig
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR483RKN.ENCFF558BLC.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF558BLC/@@download/ENCFF558BLC.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR483RKN.ENCFF925CYR.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF925CYR/@@download/ENCFF925CYR.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR868FGK.ENCFF357GNC.ATAC.bw https://www.encodeproject.org/files/ENCFF357GNC/@@download/ENCFF357GNC.bigWig
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR868FGK.ENCFF948AFM.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF948AFM/@@download/ENCFF948AFM.bed.gz
wget --append-output=run_download.log.txt -O K562.hg38.ENCSR868FGK.ENCFF333TAT.ATAC.bed.gz https://www.encodeproject.org/files/ENCFF333TAT/@@download/ENCFF333TAT.bed.gz

Save results

Code
### set output path
txt_folder = TXT_FOLDER_OUT
txt_fdiry  = file.path(FD_DAT, "external", txt_folder)

### create directory if not exist
dir.create(txt_fdiry, showWarnings = FALSE)

### write download file
txt_fname  = "run_download_files.sh"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dat = dat_download_script
write_tsv(dat, txt_fpath)

### write checksum file
txt_fname  = "checksum_md5sum.txt"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dat = dat_download_checksum
write_tsv(dat, txt_fpath, col_names = FALSE)

### write metatable
txt_fname  = "metadata.tsv"
txt_fpath  = file.path(txt_fdiry, txt_fname)

dat = dat_metadata_arrange
write_tsv(dat, txt_fpath)