Prepare STARR-seq data 01

Get K562 STARR-seq data files from Reddy lab

In this analysis, we are using the fragment counts data generated and processed from the Reddy lab. The ASTARR in K562 is designed and done by Keith and WSTARR in K562 is generated by Kari. For more information, check the data dictionary page.

Set environment

Code
source ../run_config_project.sh
show_env
You are working on             Duke Server: HARDAC
BASE DIRECTORY (FD_BASE):      /data/reddylab/Kuei
REPO DIRECTORY (FD_REPO):      /data/reddylab/Kuei/repo
WORK DIRECTORY (FD_WORK):      /data/reddylab/Kuei/work
DATA DIRECTORY (FD_DATA):      /data/reddylab/Kuei/data
CONTAINER DIR. (FD_SING):      /data/reddylab/Kuei/container

You are working with           ENCODE FCC
PATH OF PROJECT (FD_PRJ):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC
PROJECT RESULTS (FD_RES):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results
PROJECT SCRIPTS (FD_EXE):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts
PROJECT DATA    (FD_DAT):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data
PROJECT NOTE    (FD_NBK):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks
PROJECT DOCS    (FD_DOC):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs
PROJECT LOG     (FD_LOG):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log
PROJECT APP     (FD_APP):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/app
PROJECT REF     (FD_REF):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references
PROJECT IMAGE   (FP_PRJ_SIF):  /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/app/singularity_proj_encode_fcc.sif

Check existence of STARR data

WSTARR

Code
ls ${FD_WGS_WSTARR}
fragments  metadata  motifs  peaks  processed_raw_reads  qc  raw_reads
Code
echo ${FD_WGS_WSTARR_FRAGS}
echo 
for FPATH in ${FP_WGS_WSTARR_FRAGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments

A001-input-K562-rep1.masked.dedup.fragments.counts.txt.gz
A001-input-K562-rep2.masked.dedup.fragments.counts.txt.gz
A001-input-K562-rep3.masked.dedup.fragments.counts.txt.gz
A001-input-K562-rep4.masked.dedup.fragments.counts.txt.gz
A001-K562-rep1.masked.dedup.fragments.counts.txt.gz
A001-K562-rep2.masked.dedup.fragments.counts.txt.gz
A001-K562-rep3.masked.dedup.fragments.counts.txt.gz
Code
echo ${FD_WGS_WSTARR_INP_BAM}
echo 
for FPATH in ${FP_WGS_WSTARR_INP_BWIGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
/data/reddylab/kstrouse/superstarr/input_libs/A001/nextseq/processing/starr_seq/A001_nextseq-pe

rep1.f3q10.sorted.dedup.rpkm.bw
rep2.f3q10.sorted.dedup.rpkm.bw
rep3.f3q10.sorted.dedup.rpkm.bw
rep4.f3q10.sorted.dedup.rpkm.bw
Code
echo ${FD_WGS_WSTARR_OUT_BAM_rep01}
echo ${FD_WGS_WSTARR_OUT_BAM_rep23}
echo 
for FPATH in ${FP_WGS_WSTARR_OUT_BWIGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
/data/reddylab/kstrouse/superstarr/output_libs/A001_K562/A001_K562_20201124/combined_reads/processing/starr_seq/A001_K562_20201124_combined-pe
/data/reddylab/kstrouse/superstarr/output_libs/A001_K562/A001_K562_20210213/processing/starr_seq/Strouse_6825_210223A5-pe

A001-K562-rep1.f3q10.sorted.dedup.rpkm.bw
A001-K562-rep2.f3q10.sorted.dedup.rpkm.bw
A001-K562-rep3.f3q10.sorted.dedup.rpkm.bw

ASTARR (KS91)

Code
echo ${FD_WGS_ASTARR_KS91_INP}
echo ${FD_WGS_ASTARR_KS91_OUT}
/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal
/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
Code
for FPATH in ${FP_WGS_ASTARR_KS91_FRAGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.txt.gz
Code
for FPATH in ${FP_WGS_ASTARR_KS91_FRAGS[@]}; do
    ls -l ${FPATH}
done
-rw-r--r-- 1 aeb84 reddylab 3146501530 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2/KS91_K562_hg38_ASTARRseq_Input_rep1.masked.dedup.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 3995769876 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2/KS91_K562_hg38_ASTARRseq_Input_rep2.masked.dedup.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 4291888489 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2/KS91_K562_hg38_ASTARRseq_Input_rep3.masked.dedup.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 4037119129 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2/KS91_K562_hg38_ASTARRseq_Input_rep4.masked.dedup.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 3938483893 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2/KS91_K562_hg38_ASTARRseq_Input_rep5.masked.dedup.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 3550102461 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2/KS91_K562_hg38_ASTARRseq_Input_rep6.masked.dedup.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 320000484 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 588618835 Apr  4 13:26 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.corrected.txt.gz
-rw-r--r-- 1 aeb84 reddylab 595652591 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 672333423 Apr  4 13:29 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.corrected.txt.gz
-rw-r--r-- 1 aeb84 reddylab 672333423 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.txt.gz
-rw-r--r-- 1 aeb84 reddylab 1096240707 Apr  4 14:08 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.corrected.txt.gz
-rw-r--r-- 1 aeb84 reddylab 1107286720 Jul  7  2022 /data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis/KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.txt.gz
Code
for FPATH in ${FP_WGS_ASTARR_KS91_BWIGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.corrected.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.corrected.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep5.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep6.f3q10.sorted.with_umis.dedup.cpm.bw
Code
for FPATH in ${FP_WGS_ASTARR_KS91[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.corrected.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.corrected.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep5.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep6.f3q10.sorted.with_umis.dedup.cpm.bw

ASTARR (KS274)

Code
echo ${FD_WGS_ASTARR_KS274_OUT}
/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
Code
for FPATH in ${FP_WGS_ASTARR_KS274_OUT_FRAGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
K562_ASTARR_repeat_rep1.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep2.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep3.f3q10.fragments.bedpe
Code
for FPATH in ${FP_WGS_ASTARR_KS274_OUT_BWIGS[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
K562_ASTARR_repeat_rep1.f3q10.sorted.with_umis.dedup.rpkm.bw
K562_ASTARR_repeat_rep2.f3q10.sorted.with_umis.dedup.rpkm.bw
K562_ASTARR_repeat_rep3.f3q10.sorted.with_umis.dedup.rpkm.bw
Code
for FPATH in ${FP_WGS_ASTARR_KS274[@]}; do
    ls ${FPATH} | xargs -n 1 basename
done
K562_ASTARR_repeat_rep1.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep2.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep3.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep1.f3q10.sorted.with_umis.dedup.rpkm.bw
K562_ASTARR_repeat_rep2.f3q10.sorted.with_umis.dedup.rpkm.bw
K562_ASTARR_repeat_rep3.f3q10.sorted.with_umis.dedup.rpkm.bw

Set data directories and copy the final processed data of STARR-seq

Create data directories

PROJECT/data/processed
├── STARR_ATAC_K562_Reddy_KS91_210401
│   ├── fragments
│   └── peaks
│
├── STARR_ATAC_K562_Reddy_KS274_240311
│   └── fragments
│
├── STARR_WHG_K562_Reddy_A001_Alex
│   └── fragments
│
└── STARR_WHG_K562_Reddy_A001_Kari
    └── superstarr
        ├── input_libs
        │   └── A001_K562
        └── output_libs
            ├── A001_K562_20201124
            └── A001_K562_20210213
Code
mkdir -p ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/fragments
mkdir -p ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/peaks
mkdir -p ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS274_240311/fragments
mkdir -p ${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Alex/fragments

FDIRY=${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Kari/superstarr
mkdir -p ${FDIRY}/input_libs
mkdir -p ${FDIRY}/output_libs
Code
ls -1 ${FD_DAT}/processed/STARR*
ls -1 ${FD_DAT}/processed/STARR*/superstarr
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS274_240311:
fragments

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS91_210401:
fragments
peaks

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Alex:
fragments

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Kari:
superstarr
input_libs
output_libs

Copy data files

Copy WSTARR fragment counts

Code
FD_OUT=${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Alex/fragments

for FPATH in ${FP_WGS_WSTARR_FRAGS[@]}; do
    FDIRY=$(dirname  ${FPATH})
    FNAME=$(basename ${FPATH})
    cp ${FPATH} ${FD_OUT}/${FNAME}
    
    echo ${FDIRY}
    echo ${FNAME}
    echo
done
/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-input-K562-rep1.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-input-K562-rep2.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-input-K562-rep3.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-input-K562-rep4.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-K562-rep1.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-K562-rep2.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/data/starr_seq/fragments
A001-K562-rep3.masked.dedup.fragments.counts.txt.gz

Copy WSTARR bigwigs

Code
FD_OUT=${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Kari/superstarr/input_libs

for FPATH in ${FP_WGS_WSTARR_INP_BWIGS[@]}; do
    FDIRY=$(dirname  ${FPATH})
    FNAME=$(basename ${FPATH})
    cp ${FPATH} ${FD_OUT}/${FNAME}
    
    echo ${FDIRY}
    echo ${FNAME}
    echo
done
/data/reddylab/kstrouse/superstarr/input_libs/A001/nextseq/processing/starr_seq/A001_nextseq-pe
rep1.f3q10.sorted.dedup.rpkm.bw

/data/reddylab/kstrouse/superstarr/input_libs/A001/nextseq/processing/starr_seq/A001_nextseq-pe
rep2.f3q10.sorted.dedup.rpkm.bw

/data/reddylab/kstrouse/superstarr/input_libs/A001/nextseq/processing/starr_seq/A001_nextseq-pe
rep3.f3q10.sorted.dedup.rpkm.bw

/data/reddylab/kstrouse/superstarr/input_libs/A001/nextseq/processing/starr_seq/A001_nextseq-pe
rep4.f3q10.sorted.dedup.rpkm.bw
Code
FD_OUT=${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Kari/superstarr/output_libs

for FPATH in ${FP_WGS_WSTARR_OUT_BWIGS[@]}; do
    FDIRY=$(dirname  ${FPATH})
    FNAME=$(basename ${FPATH})
    cp ${FPATH} ${FD_OUT}/${FNAME}
    
    echo ${FDIRY}
    echo ${FNAME}
    echo
done
/data/reddylab/kstrouse/superstarr/output_libs/A001_K562/A001_K562_20201124/combined_reads/processing/starr_seq/A001_K562_20201124_combined-pe
A001-K562-rep1.f3q10.sorted.dedup.rpkm.bw

/data/reddylab/kstrouse/superstarr/output_libs/A001_K562/A001_K562_20210213/processing/starr_seq/Strouse_6825_210223A5-pe
A001-K562-rep2.f3q10.sorted.dedup.rpkm.bw

/data/reddylab/kstrouse/superstarr/output_libs/A001_K562/A001_K562_20210213/processing/starr_seq/Strouse_6825_210223A5-pe
A001-K562-rep3.f3q10.sorted.dedup.rpkm.bw

Copy ASTARR fragment counts

Code
FD_OUT=${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/fragments

for FPATH in ${FP_WGS_ASTARR_KS91[@]}; do
    FDIRY=$(dirname  ${FPATH})
    FNAME=$(basename ${FPATH})
    cp ${FPATH} ${FD_OUT}/${FNAME}
    
    echo ${FDIRY}
    echo ${FNAME}
    echo
done
/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.dedup.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.corrected.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.corrected.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.corrected.txt.gz

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.txt.gz

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.exclude_dups.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.exclude_dups.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.exclude_dups.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.exclude_dups.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.exclude_dups.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal/merged2
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.exclude_dups.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.sorted.with_umis.dedup.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.corrected.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.sorted.with_umis.dedup.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.corrected.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep5.f3q10.sorted.with_umis.dedup.cpm.bw

/data/reddylab/Alex/encode4_duke/processing/starr_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-umis
KS91_K562_hg38_ASTARRseq_Output_rep6.f3q10.sorted.with_umis.dedup.cpm.bw
Code
FD_OUT=${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS274_240311/fragments

for FPATH in ${FP_WGS_ASTARR_KS274[@]}; do
    FDIRY=$(dirname  ${FPATH})
    FNAME=$(basename ${FPATH})
    cp ${FPATH} ${FD_OUT}/${FNAME}
    
    echo ${FDIRY}
    echo ${FNAME}
    echo
done
/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
K562_ASTARR_repeat_rep1.f3q10.fragments.bedpe

/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
K562_ASTARR_repeat_rep2.f3q10.fragments.bedpe

/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
K562_ASTARR_repeat_rep3.f3q10.fragments.bedpe

/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
K562_ASTARR_repeat_rep1.f3q10.sorted.with_umis.dedup.rpkm.bw

/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
K562_ASTARR_repeat_rep2.f3q10.sorted.with_umis.dedup.rpkm.bw

/data/reddylab/Keith/encode4_duke/processing/starr_seq/240311_KS274_ASTARR_Output_Nextseq-pe-umis
K562_ASTARR_repeat_rep3.f3q10.sorted.with_umis.dedup.rpkm.bw

Copy ASTARR peak calls

Code
FD_OUT=${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/peaks

for FPATH in ${FP_WGS_ASTARR_KS91_INP_PEAKS[@]}; do
    FDIRY=$(dirname  ${FPATH})
    FNAME=$(basename ${FPATH})
    cp ${FPATH} ${FD_OUT}/${FNAME}
    
    echo ${FDIRY}
    echo ${FNAME}
    echo
done
/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal
KS91_K562_hg38_ASTARRseq_Input.all_reps.masked.union_narrowPeak.q5.bed

/data/reddylab/Alex/encode4_duke/processing/atac_seq/210401_KS91_K562ASTARR_NovaSeq.hg38-pe-blacklist-removal
KS91_K562_hg38_ASTARRseq_Input.q5.in_all.max_overlaps.bed

Check results

Check if the data is copied correctly.

Code
ls -d ${FD_DAT}/processed/STARR*
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS274_240311
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS91_210401
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Alex
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Kari
Code
ls -1 ${FD_DAT}/processed/STARR*
ls -1 ${FD_DAT}/processed/STARR*/superstarr
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS274_240311:
fragments

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS91_210401:
fragments
peaks

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Alex:
fragments

/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Kari:
superstarr
input_libs
output_libs
Code
ls -1 ${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Alex/fragments
A001-input-K562-rep1.masked.dedup.fragments.counts.txt.gz
A001-input-K562-rep2.masked.dedup.fragments.counts.txt.gz
A001-input-K562-rep3.masked.dedup.fragments.counts.txt.gz
A001-input-K562-rep4.masked.dedup.fragments.counts.txt.gz
A001-K562-rep1.masked.dedup.fragments.counts.txt.gz
A001-K562-rep2.masked.dedup.fragments.counts.txt.gz
A001-K562-rep3.masked.dedup.fragments.counts.txt.gz
Code
ls -1 ${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Kari/superstarr/input_libs
rep1.f3q10.sorted.dedup.rpkm.bw
rep2.f3q10.sorted.dedup.rpkm.bw
rep3.f3q10.sorted.dedup.rpkm.bw
rep4.f3q10.sorted.dedup.rpkm.bw
Code
ls -1 ${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Kari/superstarr/output_libs
A001-K562-rep1.f3q10.sorted.dedup.rpkm.bw
A001-K562-rep2.f3q10.sorted.dedup.rpkm.bw
A001-K562-rep3.f3q10.sorted.dedup.rpkm.bw
Code
ls -1 ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/fragments
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep1.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep2.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep3.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep4.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep5.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.dedup.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Input_rep6.masked.exclude_dups.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep1.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.corrected.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep2.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep3.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.corrected.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.fragments.counts.txt.gz
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.corrected.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep4.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep5.f3q10.sorted.with_umis.dedup.cpm.bw
KS91_K562_hg38_ASTARRseq_Output_rep6.f3q10.sorted.with_umis.dedup.cpm.bw
Code
ls -1 ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/peaks
KS91_K562_hg38_ASTARRseq_Input.all_reps.masked.union_narrowPeak.q5.bed
KS91_K562_hg38_ASTARRseq_Input.q5.in_all.max_overlaps.bed
Code
ls -1 ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS274_240311/fragments
K562_ASTARR_repeat_rep1.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep1.f3q10.sorted.with_umis.dedup.rpkm.bw
K562_ASTARR_repeat_rep2.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep2.f3q10.sorted.with_umis.dedup.rpkm.bw
K562_ASTARR_repeat_rep3.f3q10.fragments.bedpe
K562_ASTARR_repeat_rep3.f3q10.sorted.with_umis.dedup.rpkm.bw

Check folder size

Code
du -sh ${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Alex
9.8G    /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Alex
Code
du -sh ${FD_DAT}/processed/STARR_WHG_K562_Reddy_A001_Kari
8.4G    /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_WHG_K562_Reddy_A001_Kari
Code
du -sh ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/fragments
44G /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS91_210401/fragments
Code
du -sh ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS91_210401/peaks
4.1M    /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS91_210401/peaks
Code
du -sh ${FD_DAT}/processed/STARR_ATAC_K562_Reddy_KS274_240311
5.5G    /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/processed/STARR_ATAC_K562_Reddy_KS274_240311