Prepare ENCODE Chromatin State 02

Download the data

set environment

Code
source ../run_config_project.sh
show_env
You are working on             Duke Server: RCC
BASE DIRECTORY (FD_BASE):      /data/reddylab/Kuei
REPO DIRECTORY (FD_REPO):      /data/reddylab/Kuei/repo
WORK DIRECTORY (FD_WORK):      /data/reddylab/Kuei/work
DATA DIRECTORY (FD_DATA):      /data/reddylab/Kuei/data
CONTAINER DIR. (FD_SING):      /data/reddylab/Kuei/container

You are working with           ENCODE FCC
PATH OF PROJECT (FD_PRJ):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC
PROJECT RESULTS (FD_RES):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results
PROJECT SCRIPTS (FD_EXE):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts
PROJECT DATA    (FD_DAT):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data
PROJECT NOTE    (FD_NBK):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks
PROJECT DOCS    (FD_DOC):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs
PROJECT LOG     (FD_LOG):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log
PROJECT REF     (FD_REF):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references
PROJECT IMAGE   (FP_PRJ_SIF):  /data/reddylab/Kuei/container/project/singularity_proj_encode_fcc.sif
PROJECT CONF.   (FP_CNF):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts/config_project.sh

Set global variables

Code
TXT_FOLDER="encode_chromatin_states"

Execute

Run download script

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}

cd ${FD_OUT}
chmod +x ./run_download_files.sh

./run_download_files.sh

Run checksum

Code
FN_EXE=run_checksum_files.sh
FP_EXE=${FD_EXE}/${FN_EXE}

FD_OUT=${FD_DAT}/external/${TXT_FOLDER}
FP_INP=${FD_OUT}/checksum_md5sum.txt
FP_OUT=${FD_OUT}/checksum_results.txt

${FP_EXE} ${FP_CNF} ${FD_OUT} ${FP_INP} ${FP_OUT}
Hostname:           plp-rcc-node-25
Slurm Array Index: 
Time Stamp:         05-15-25+16:54:52

Change directory:
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/external/encode_chromatin_states

Checksum files...


Done!
Run Time: 0 seconds

Copy additional cCREs silencer annotations

Code
FD_INP=${FD_REF}/encode_chromatin_states
ls ${FD_INP}
 ccres_v4.silencer.rest.tsv
 ccres_v4.silencer.starr.tsv
 ENCODE_K562_hg38_chromatin_states.tsv
'Human epigenomes with ChromHMM state (DAC, Kaili Fan).xlsx'
 K562.ENCSR365YNI.ENCAN395TNA.metadata.tsv
 K562.ENCSR913HQX.ENCAN130HDM.metadata.tsv
Code
FD_INP=${FD_REF}/encode_chromatin_states

cp ${FD_INP}/ccres_v4* ${FD_OUT}/

Review

Check output files

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}

cd ${FD_OUT}
ls -sh {*bed.gz,*.tsv} | wc -l
ls -sh {*bed.gz,*.tsv}
4
480K ccres_v4.silencer.rest.tsv
364K ccres_v4.silencer.starr.tsv
4.0M K562.hg38.ENCSR365YNI.ENCFF106BGJ.ChromHMM.bed.gz
 32M K562.hg38.ENCSR913HQX.ENCFF286VQG.cCREs.bed.gz
Code
FN_OUT=K562.hg38.ENCSR913HQX.ENCFF286VQG.cCREs.bed.gz
FP_OUT=${FD_OUT}/${FN_OUT}
zcat ${FP_OUT} | head -n 3
chr1    10033   10250   EH38E2776516    0   .   10033   10250   225,225,225 Low-DNase   All-data/Full-classification
chr1    10385   10713   EH38E2776517    0   .   10385   10713   225,225,225 Low-DNase   All-data/Full-classification
chr1    16097   16381   EH38E3951272    0   .   16097   16381   225,225,225 Low-DNase   All-data/Full-classification
Code
FN_OUT=K562.hg38.ENCSR365YNI.ENCFF106BGJ.ChromHMM.bed.gz
FP_OUT=${FD_OUT}/${FN_OUT}
zcat ${FP_OUT} | head -n 3
chr1    0   16000   Quies   1   .   0   16000   220,220,220
chr1    16000   16200   TxWk    1   .   16000   16200   63,154,80
chr1    16200   17400   Quies   1   .   16200   17400   220,220,220
Code
FN_OUT=ccres_v4.silencer.rest.tsv
FP_OUT=${FD_OUT}/${FN_OUT}
cat ${FP_OUT} | head -n 3
# REST + cCREs                  
Chr Start   End cCRE accession  cCRE class  Silencer class
chr10   100680786   100681128   EH38E4018829    CA-H3K4me3  REST+ silencer
Code
FN_OUT=ccres_v4.silencer.starr.tsv
FP_OUT=${FD_OUT}/${FN_OUT}
cat ${FP_OUT} | head -n 3
# STARR silencer cCREs                          
Chr Start   End cCRE accession  cCRE class  CAPRA quantification    P-value Threshold
chr13   22970420    22970595    EH38E4082602    CA  -2.7481 1.15E-09    Stringent

Check checksum results

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}
FP_OUT=${FD_OUT}/checksum_results.txt

cat ${FP_OUT}
cat ${FP_OUT} | grep "FAILED" && echo "FAILED" || echo "All PASSED"
K562.hg38.ENCSR913HQX.ENCFF286VQG.cCREs.bed.gz: OK
K562.hg38.ENCSR365YNI.ENCFF106BGJ.ChromHMM.bed.gz: OK
All PASSED

Check execution log

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}

head -n 10 ${FD_OUT}/run_download.log.txt
--2025-05-15 16:54:45--  https://www.encodeproject.org/files/ENCFF286VQG/@@download/ENCFF286VQG.bed.gz
Resolving www.encodeproject.org (www.encodeproject.org)... 34.211.244.144
Connecting to www.encodeproject.org (www.encodeproject.org)|34.211.244.144|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://encode-public.s3.amazonaws.com/2022/11/23/179cba53-f3d4-47af-af56-c4e5a3cabeac/ENCFF286VQG.bed.gz?response-content-disposition=attachment%3B%20filename%3DENCFF286VQG.bed.gz&AWSAccessKeyId=ASIATGZNGCNXRX2474FR&Signature=twUgoJS1RV7mWE9XodrFfmLkrtc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEHwaCXVzLXdlc3QtMiJHMEUCIQDOXOVSzF%2FWVrtWc21NXZ5RyaF614VEtjwLsjLI%2B6KjYAIgcRI17y%2FaSz0E7U%2B904Y%2FJcWVQIzy644ckWo1IrvtlGsqswUINRAAGgwyMjA3NDg3MTQ4NjMiDDj%2FcRBQJB4wGfNpMiqQBVmNbfT61m2dOZvL%2FKEDS673CKGrNvm28%2Fnrg%2B%2FmERskT2Yi4AQs3lBgIh1OK2INDnWG1WU3cj11UzTpT2e%2BzzFtlUuD%2BnIw0hq57gtGBa3sQVKa%2BP%2FfMpDVIxjnZaKtlgGh9Oe61Nn%2BLiNWTqYrbdPv3DFxH5Gly%2BTQoy%2BIKfoya2gst5ITmj0e2xFsYcWOSHUQZAnxtYmRjd6sceNywc3tnKmUBWExaE1LfkX5XHXtbqRMD9ZlrVghLRDhbNqWtKo0AhLzhCYsruplapuG3485DIG6iz83yYSSVCL7tlYe3olocd1Rm01jxRNMyKNiWY8G3q4tET1bwPphNqOWv1a8nbhQ99nbiDtgJmuHLiM8inIiO3QIzg9xVeNshhjCdCiTMJJrPGEcdvN9qkYlW%2Bkh2U%2FcA9AVoxBvwyU1PV4ElJuSxxrUBX16NXLHCAaTtp4AIiEyhUCy4gifjy%2BncBJLjzhz8z9E1O2EJTW0cg59eR%2FwI2ySq08Wm0tuhEKrcBoDZjnmhGJoaP3WY9atO3meWzggGJF0RBsYL9Eu%2FUtmvIQJcGxz7UwJLiXP%2BSrwsy8hRXgHvaByE2jX3Fdf3Dvo9FTRhx39vJSpiOGM9W3gs5M7%2FKGp0wFwsHMIDCGrMPPESbdB%2FTUqwCrD7ZXwWo7HYkfAi2dUXuN3edgAu%2BpKqQQBBi387kLfD2hiMfM997wte9Rxzpwj6oSf%2BusdWhEFk5wB0cxTxWdYMpJEidAJyedQFtk8oEqMkfO5kWhfCsy%2FhHtWCnCwjpvnPcP2jTanHOtolRlzRxBnIkkorBkLsCJskHD%2B8sBdayPhHs%2FajJAvzzF1gZAX5sam2g9sfa5YsHsb3Aa6VZNnfcEE6Np6MJqNmcEGOrEB0O3nrW4Ero9TYHHlwiP7kWKDbHQFFlJbUZmmUxyH8FvLwZvdFLo0pAdASWyM2DLy3%2F0Rc2cmnzdp3MEBVMQ1krrlWuoswSr9AZcFaLj50KmngBbO4nyahMRcjGVfj8jSIf9CUFEaqnJes6eqL%2Fezrxb4vPFHq7hVxVX1tqTBnBrD3PA%2BVyBHRe5o%2BUD42u%2F%2FElYDT6%2FEbcaCVbU%2FO6mhywZtwVQcgwOeOZD29RqwD7c0&Expires=1747472088 [following]
--2025-05-15 16:54:48--  https://encode-public.s3.amazonaws.com/2022/11/23/179cba53-f3d4-47af-af56-c4e5a3cabeac/ENCFF286VQG.bed.gz?response-content-disposition=attachment%3B%20filename%3DENCFF286VQG.bed.gz&AWSAccessKeyId=ASIATGZNGCNXRX2474FR&Signature=twUgoJS1RV7mWE9XodrFfmLkrtc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEHwaCXVzLXdlc3QtMiJHMEUCIQDOXOVSzF%2FWVrtWc21NXZ5RyaF614VEtjwLsjLI%2B6KjYAIgcRI17y%2FaSz0E7U%2B904Y%2FJcWVQIzy644ckWo1IrvtlGsqswUINRAAGgwyMjA3NDg3MTQ4NjMiDDj%2FcRBQJB4wGfNpMiqQBVmNbfT61m2dOZvL%2FKEDS673CKGrNvm28%2Fnrg%2B%2FmERskT2Yi4AQs3lBgIh1OK2INDnWG1WU3cj11UzTpT2e%2BzzFtlUuD%2BnIw0hq57gtGBa3sQVKa%2BP%2FfMpDVIxjnZaKtlgGh9Oe61Nn%2BLiNWTqYrbdPv3DFxH5Gly%2BTQoy%2BIKfoya2gst5ITmj0e2xFsYcWOSHUQZAnxtYmRjd6sceNywc3tnKmUBWExaE1LfkX5XHXtbqRMD9ZlrVghLRDhbNqWtKo0AhLzhCYsruplapuG3485DIG6iz83yYSSVCL7tlYe3olocd1Rm01jxRNMyKNiWY8G3q4tET1bwPphNqOWv1a8nbhQ99nbiDtgJmuHLiM8inIiO3QIzg9xVeNshhjCdCiTMJJrPGEcdvN9qkYlW%2Bkh2U%2FcA9AVoxBvwyU1PV4ElJuSxxrUBX16NXLHCAaTtp4AIiEyhUCy4gifjy%2BncBJLjzhz8z9E1O2EJTW0cg59eR%2FwI2ySq08Wm0tuhEKrcBoDZjnmhGJoaP3WY9atO3meWzggGJF0RBsYL9Eu%2FUtmvIQJcGxz7UwJLiXP%2BSrwsy8hRXgHvaByE2jX3Fdf3Dvo9FTRhx39vJSpiOGM9W3gs5M7%2FKGp0wFwsHMIDCGrMPPESbdB%2FTUqwCrD7ZXwWo7HYkfAi2dUXuN3edgAu%2BpKqQQBBi387kLfD2hiMfM997wte9Rxzpwj6oSf%2BusdWhEFk5wB0cxTxWdYMpJEidAJyedQFtk8oEqMkfO5kWhfCsy%2FhHtWCnCwjpvnPcP2jTanHOtolRlzRxBnIkkorBkLsCJskHD%2B8sBdayPhHs%2FajJAvzzF1gZAX5sam2g9sfa5YsHsb3Aa6VZNnfcEE6Np6MJqNmcEGOrEB0O3nrW4Ero9TYHHlwiP7kWKDbHQFFlJbUZmmUxyH8FvLwZvdFLo0pAdASWyM2DLy3%2F0Rc2cmnzdp3MEBVMQ1krrlWuoswSr9AZcFaLj50KmngBbO4nyahMRcjGVfj8jSIf9CUFEaqnJes6eqL%2Fezrxb4vPFHq7hVxVX1tqTBnBrD3PA%2BVyBHRe5o%2BUD42u%2F%2FElYDT6%2FEbcaCVbU%2FO6mhywZtwVQcgwOeOZD29RqwD7c0&Expires=1747472088
Resolving encode-public.s3.amazonaws.com (encode-public.s3.amazonaws.com)... 52.92.228.217, 52.218.233.187, 52.92.165.145, ...
Connecting to encode-public.s3.amazonaws.com (encode-public.s3.amazonaws.com)|52.92.228.217|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32698783 (31M) [binary/octet-stream]