Prepare ENCODE RNA-seq 02

Download the data

set environment

Code
source ../run_config_project.sh
show_env
You are working on             Duke Server: RCC
BASE DIRECTORY (FD_BASE):      /data/reddylab/Kuei
REPO DIRECTORY (FD_REPO):      /data/reddylab/Kuei/repo
WORK DIRECTORY (FD_WORK):      /data/reddylab/Kuei/work
DATA DIRECTORY (FD_DATA):      /data/reddylab/Kuei/data
CONTAINER DIR. (FD_SING):      /data/reddylab/Kuei/container

You are working with           ENCODE FCC
PATH OF PROJECT (FD_PRJ):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC
PROJECT RESULTS (FD_RES):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/results
PROJECT SCRIPTS (FD_EXE):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts
PROJECT DATA    (FD_DAT):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data
PROJECT NOTE    (FD_NBK):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/notebooks
PROJECT DOCS    (FD_DOC):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/docs
PROJECT LOG     (FD_LOG):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/log
PROJECT REF     (FD_REF):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/references
PROJECT IMAGE   (FP_PRJ_SIF):  /data/reddylab/Kuei/container/project/singularity_proj_encode_fcc.sif
PROJECT CONF.   (FP_CNF):      /data/reddylab/Kuei/repo/Proj_ENCODE_FCC/scripts/config_project.sh

Set global variables

Code
TXT_FOLDER="encode_rnaseq"

Execute

Run download script

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}

cd ${FD_OUT}
chmod +x ./run_download_files.sh

./run_download_files.sh

Run checksum

Code
FN_EXE=run_checksum_files.sh
FP_EXE=${FD_EXE}/${FN_EXE}

#FN_LOG=checksum.encode_rnaseq.txt
#FP_LOG=${FD_LOG}/${FN_LOG}

FD_OUT=${FD_DAT}/external/${TXT_FOLDER}
FP_INP=${FD_OUT}/checksum_md5sum.txt
FP_OUT=${FD_OUT}/checksum_results.txt

${FP_EXE} ${FP_CNF} ${FD_OUT} ${FP_INP} ${FP_OUT}
Hostname:           plp-rcc-node-25
Slurm Array Index: 
Time Stamp:         04-24-25+17:55:29

Change directory:
/data/reddylab/Kuei/repo/Proj_ENCODE_FCC/data/external/encode_rnaseq

Checksum files...


Done!
Run Time: 1 seconds

Review

Check output files

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}

cd ${FD_OUT}
ls -sh {*tsv,*.bw} | wc -l
ls -sh {*tsv,*.bw}
3
 11M K562.hg38.ENCSR615EEK.ENCFF421TJX.RNAseq_total.tsv
108M K562.hg38.ENCSR615EEK.ENCFF585HTZ.RNAseq_total.strand_pos.bw
128M K562.hg38.ENCSR615EEK.ENCFF876JOV.RNAseq_total.strand_neg.bw
Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}
FN_OUT=K562.hg38.ENCSR615EEK.ENCFF421TJX.RNAseq_total.tsv
FP_OUT=${FD_OUT}/${FN_OUT}

cat ${FP_OUT} | head -n 3
gene_id transcript_id(s)    length  effective_length    expected_count  TPM FPKM    posterior_mean_count    posterior_standard_deviation_of_count   pme_TPM pme_FPKM    TPM_ci_lower_bound  TPM_ci_upper_bound  TPM_coefficient_of_quartile_variation   FPKM_ci_lower_bound FPKM_ci_upper_bound FPKM_coefficient_of_quartile_variation
10904   10904   93.00   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0   0   0   0   0   0
12954   12954   94.00   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0   0   0   0   0   0

Check checksum results

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}
FP_OUT=${FD_OUT}/checksum_results.txt

cat ${FP_OUT}
cat ${FP_OUT} | grep "FAILED" && echo "FAILED" || echo "All PASSED"
K562.hg38.ENCSR615EEK.ENCFF421TJX.RNAseq_total.tsv: OK
K562.hg38.ENCSR615EEK.ENCFF585HTZ.RNAseq_total.strand_pos.bw: OK
K562.hg38.ENCSR615EEK.ENCFF876JOV.RNAseq_total.strand_neg.bw: OK
All PASSED

Check execution log

Code
FD_OUT=${FD_DAT}/external/${TXT_FOLDER}

head -n 10 ${FD_OUT}/run_download.log.txt
--2025-04-24 17:55:18--  https://www.encodeproject.org/files/ENCFF421TJX/@@download/ENCFF421TJX.tsv
Resolving www.encodeproject.org (www.encodeproject.org)... 34.211.244.144
Connecting to www.encodeproject.org (www.encodeproject.org)|34.211.244.144|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://encode-public.s3.amazonaws.com/2020/10/30/2033273a-286f-4c94-a652-9d75098cdfb5/ENCFF421TJX.tsv?response-content-disposition=attachment%3B%20filename%3DENCFF421TJX.tsv&AWSAccessKeyId=ASIATGZNGCNX7SWO4ZQN&Signature=3Nq55kAiuNFA6kOwV5cpiBiu3As%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEIb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJIMEYCIQD%2F5ut0yD%2BA%2FNlCiDNj5MSOsul3igqARG3VQugvvsbIrwIhAKm567vigDQmDqdhizRca4dvZjITXDXdb08VJTSbFKVZKrMFCB8QABoMMjIwNzQ4NzE0ODYzIgz0cZB26wbGUGvh514qkAWTn7DQ6tOMUFhQE040K18mPYvtIlX1s8RYrKLRLXM9nM%2ByDvitV8cYV7NXfoRiI9EdfksnHVMjEu%2FDwvoeFGm7y9t6oamEekLWb%2Bc6zqxj7T0ReSXqOT0VJyblrhHLkiXw0xRgdnja0VU0j%2B%2FAb8TrU%2Fq3OZCdNicA5AdNIoNDwE%2FjMGFRrf22zC5DnloCMHKuzde%2FjMKdd5jxYEBj5sGm6vevIaoQ5d%2FoCcpWIlmtXqyrVA9ACQ3LxyfgRrFpvoxZwBWQlDc8ChcJJtU1USqLoy40MyfRfiXESF5jtgfodVRKENUK7FEP3R4w%2FOEnBsxF7LDU4piK4NzUCiifxXqKxKD5%2FUWCzE8JSXNMcGxCaH9bLqNy72poqQEqdlpUSZnIfUPN9QdEFblFKyM%2B7cATisFoDK4oxq840IxEc7gyfiriMpqdRdUY4HekwiHkzF3EsEC0oW6zIAXJxsPeIrj24WnphrbERq9Bz24hPlu1WDQu9%2F5ZOmaxYm5IqBfwwdW6m6AyBSRaSg5Jh1rO1NT%2FN%2B4y%2Br6zW46DiNQN7LVrPmrTtmWWm0kUqADn0N6CeUMXP5TF8p1xHC44IRY8Ye0jvIWOGnOqNMEu045o9vFF96qSpnDu%2FVuLDfGJINLoWCIjltnj2kLpYItYtzl%2BNhYWRUJEtGarKiNwM6lF47vAkI5fdLbQ7PGqEdTDxfwtFWpb4sVebHlfV6f8BZv2q%2BPcQiC5CQ%2BK0XNxB%2FxksNNOdF7jFj4H97PtNBzz9MSdu6IaciCYh4VlFBOJSI0Tfvo2gxmGj73R2g7Uq3a2gjZU2XcOGLXb%2ByXiroHzkOFisq3sJ4d7td1mhCxNXzeYlnF2f9mF5R6eu84olzFmdlmmXDDk3arABjqwAb9pQw7JoOLzcIzYC5ezMuqAUWXtRHbVcy0EZQcpZKVpGdtYYrzIGIbVo87Wtxl6%2Bq975OGBIQOTlRf2Pl76%2FQ02KDIW6znNnK%2F69TWj%2FIU0QY8acceFg5ohndY4ErmzQMOlZ6VtTHMVS%2FSAatHNiOZq%2B1aE86AW4B8VPQGvcTKpYSSlj0dKJMVt10bwRTkcRXjzMDqWYIjGs3%2B9vnOV%2BQ5VvcJDF%2FYuoR%2FpFLuBmxN%2B&Expires=1745661318 [following]
--2025-04-24 17:55:18--  https://encode-public.s3.amazonaws.com/2020/10/30/2033273a-286f-4c94-a652-9d75098cdfb5/ENCFF421TJX.tsv?response-content-disposition=attachment%3B%20filename%3DENCFF421TJX.tsv&AWSAccessKeyId=ASIATGZNGCNX7SWO4ZQN&Signature=3Nq55kAiuNFA6kOwV5cpiBiu3As%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEIb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJIMEYCIQD%2F5ut0yD%2BA%2FNlCiDNj5MSOsul3igqARG3VQugvvsbIrwIhAKm567vigDQmDqdhizRca4dvZjITXDXdb08VJTSbFKVZKrMFCB8QABoMMjIwNzQ4NzE0ODYzIgz0cZB26wbGUGvh514qkAWTn7DQ6tOMUFhQE040K18mPYvtIlX1s8RYrKLRLXM9nM%2ByDvitV8cYV7NXfoRiI9EdfksnHVMjEu%2FDwvoeFGm7y9t6oamEekLWb%2Bc6zqxj7T0ReSXqOT0VJyblrhHLkiXw0xRgdnja0VU0j%2B%2FAb8TrU%2Fq3OZCdNicA5AdNIoNDwE%2FjMGFRrf22zC5DnloCMHKuzde%2FjMKdd5jxYEBj5sGm6vevIaoQ5d%2FoCcpWIlmtXqyrVA9ACQ3LxyfgRrFpvoxZwBWQlDc8ChcJJtU1USqLoy40MyfRfiXESF5jtgfodVRKENUK7FEP3R4w%2FOEnBsxF7LDU4piK4NzUCiifxXqKxKD5%2FUWCzE8JSXNMcGxCaH9bLqNy72poqQEqdlpUSZnIfUPN9QdEFblFKyM%2B7cATisFoDK4oxq840IxEc7gyfiriMpqdRdUY4HekwiHkzF3EsEC0oW6zIAXJxsPeIrj24WnphrbERq9Bz24hPlu1WDQu9%2F5ZOmaxYm5IqBfwwdW6m6AyBSRaSg5Jh1rO1NT%2FN%2B4y%2Br6zW46DiNQN7LVrPmrTtmWWm0kUqADn0N6CeUMXP5TF8p1xHC44IRY8Ye0jvIWOGnOqNMEu045o9vFF96qSpnDu%2FVuLDfGJINLoWCIjltnj2kLpYItYtzl%2BNhYWRUJEtGarKiNwM6lF47vAkI5fdLbQ7PGqEdTDxfwtFWpb4sVebHlfV6f8BZv2q%2BPcQiC5CQ%2BK0XNxB%2FxksNNOdF7jFj4H97PtNBzz9MSdu6IaciCYh4VlFBOJSI0Tfvo2gxmGj73R2g7Uq3a2gjZU2XcOGLXb%2ByXiroHzkOFisq3sJ4d7td1mhCxNXzeYlnF2f9mF5R6eu84olzFmdlmmXDDk3arABjqwAb9pQw7JoOLzcIzYC5ezMuqAUWXtRHbVcy0EZQcpZKVpGdtYYrzIGIbVo87Wtxl6%2Bq975OGBIQOTlRf2Pl76%2FQ02KDIW6znNnK%2F69TWj%2FIU0QY8acceFg5ohndY4ErmzQMOlZ6VtTHMVS%2FSAatHNiOZq%2B1aE86AW4B8VPQGvcTKpYSSlj0dKJMVt10bwRTkcRXjzMDqWYIjGs3%2B9vnOV%2BQ5VvcJDF%2FYuoR%2FpFLuBmxN%2B&Expires=1745661318
Resolving encode-public.s3.amazonaws.com (encode-public.s3.amazonaws.com)... 52.92.139.57, 52.92.187.233, 52.92.249.1, ...
Connecting to encode-public.s3.amazonaws.com (encode-public.s3.amazonaws.com)|52.92.139.57|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11061127 (11M) [binary/octet-stream]