build video from images with a bash for loop using ffmpeg - bash

I am trying to take ~50K images and turn them into a movie using ffmpeg. I am running this on a HPC setup, hence the slurm commands. My attempt does not work since I am running into a hard limit due to the shear volume of images. I cannot just list a start number since the pipeline will reject some images so I do not have a proper sequence.
I know a loop could circumvent both issues but I am not sure how to use that with ffmpeg so that it builds one long movie.
The cat command has worked for shorter movies but i just have too many images now.
#!/bin/bash
img_dir='foo/bar/1/2/123456'
folder='fooo'
#BATCH -p general
#SBATCH -N 1
#SBATCH -t 03-00:00:00
#SBATCH --mem=8g
#SBATCH -n 1
#SBATCH --mail-type=BEGIN,REQUEUE,END,FAIL,REQUEUE
#SBATCH --mail-user=<snip>
singularity exec /$img_dir/foo_container cat /$img_dir/processed_images/$folder/*.jpeg | ffmpeg -f image2pipe -i pipe:.jpeg -vf "crop=trunc(iw/2)*2:trunc(ih/2)*2" /$img_dir/processed_images/$folder/$folder.mp4

Related

parallel with multiple scripts

I have multiple scripts that are connected and used the output from each other. I have several input files in the directory sample that I would like to parallelize.
Any idea how this is best done?
sample_folder=${working_dir}/samples
input_bam=${sample_folder}/${sample}.bam
samtools fastq -#40 $input_bam > ${init_fastq}
trim_galore out ${sample_folder} $init_fastq
script.py ${preproc_fastq} > ${out_20}
What I started with:
parallel -j 8 script.py -i {} -o ?? -n8 ::: ./sample/*.bam

Anyone know what's causing this linux error?

I'm trying to run deepvariant via their singularity container on the HPC, however I get this error, and I can't figure it out!
Code:
#!/bin/bash --login
#SBATCH -J AmyHouseman_deepvariant
#SBATCH -o %x.stdout.%J.%N
#SBATCH -e %x.stderr.%J.%N
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p c_compute_wgp
#SBATCH --account=scw1581
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=HousemanA#cardiff.ac.uk # Where to send mail
#SBATCH --array=1-33
#SBATCH --time=02:00:00
#SBATCH --time=072:00:00
#SBATCH --mem-per-cpu=32GB
module purge
module load singularity
module load parallel
set -eu
cd /scratch/c.c21087028/
BIN_VERSION="1.3.0"
singularity pull docker://google/deepvariant:"${BIN_VERSION}"
sed -n "${SLURM_ARRAY_TASK_ID}p" Polyposis_Exome_Analysis/fastp/All_fastp_input/List_of_33_exome_IDs | parallel -j 1 "singularity run singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
docker://google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WES \
-ref=Polyposis_Exome_Analysis/bwa/index/HumanRefSeq/GRCh38_latest_genomic.fna \
--reads=Polyposis_Exome_Analysis/samtools/index/indexed_picardbamfiles/{}PE_markedduplicates.bam \
--output_vcf=Polyposis_Exome_Analysis/deepvariant/vcf/{}PE_output.vcf.gz \
--output_gvcf=Polyposis_Exome_Analysis/deepvariant/gvcf/{}PE_output.vcf.gz \
--intermediate_results_dir=Polyposis_Exome_Analysis/deepvariant/intermediateresults/{}PE_output_intermediate"
Error:
FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://google/deepvariant:1.3.0: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 52.0.218.102:443: connect: network is unreachable
I've asked a lot of people, and I'm still stuck! Thanks, Amy

Why it's not possible to run wget with background option in slurm script?

I used this script for downloading files. Without -b, wget download files one by one. With -b, I have the possibility to download files in background but also simultaneously. Unfortunately, the script doesn't work in SLURM. It only works without -b in Slurm.
Script for downloading files
#!/bin/bash
mkdir data
cd data
for i in 11 08 15 26 ;
do
wget -c -b -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_1.fastq.gz
wget -c -b -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_2.fastq.gz
done
cd ..
Slurm Script
#!/bin/bash
#SBATCH --job-name=mytestjob # create a short name for your job
#SBATCH --nodes=2 # node count
#SBATCH --ntasks=2 # total number of tasks across all nodes
#SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default
#SBATCH --time=10:01:00 # total run time limit (HH:MM:SS)
#SBATCH --array=1-2 # job array with index values 1, 2
#Execution
bash download.sh
On the terminal : sbatch slurmsript.sh ( It dosen't work) no jobid
You can download multiple files at the same time with curl.
In your case, this should work:
# Create an empty bash array of urls.
urls=()
# Add each url to the array, such that '-O' and the url are separate
# items in the array. This is necessary so that the curl command will
# look like 'curl -O <url1> -O <url2> ...', since the -O command must
# be provided for each url.
for i in 11 08 15 16; do
urls+=( "-O" "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_1.fastq.gz" )
urls+=( "-O" "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_2.fastq.gz" )
done
# Simultaneously download from all urls.
curl --silent -C - "${urls[#]}"
To explain each of the curl options:
--silent is the equivalent of wget's -q. Disables curl's progress meter.
-C - is the equivalent of wget's -c. It tells curl to automatically find out where/how to resume a transfer.
-O tells curl to to write the output to a file with the same name as the remote file (this is the behavior of wget). This must be specified for each url.
Alternatively, you might want to consider installing and using aria2.

Limit cpu limit of process in a loop

I am trying to execute ffmpeg in a loop over multiple files. I only want one instance to run at a time, and to only use 50% of the cpu. I've been trying cpulimit but it isn't playing nice with the loop.
for i in {1..9}; do cpulimit -l 50 -- ffmpeg <all the options>; done
This spawns all nine jobs at once, and they are all owned by init so I have to open htop to kill them.
for i in {1..9}; do ffmpeg <all the options> & cpulimit -p $! -l 50; done
This hangs, ctrl+c continues to the next loop iteration. These instances can only be killed by SIGKILL.
Using a queue is the way to go. A simple solution that I use is Task Spooler. You can limit the number of cores ffmpeg uses with -threads also. Here's some code for you:
ts sh -c "ffmpeg -i INPUT.mp4 -threads 4 OUTPUT.mp4"
You can set the max number of simultaneous tasks to 1 with: ts -S 1
To see the current queue just run ts
You should run it in foreground. In this way the loop will work as expected.
$ cpulimit --help
...
-f --foreground launch target process in foreground and wait for it to exit
This works for me.
for file in *.mp4; do
cpulimit -f -l 100 -- ffmpeg -i "$file" <your options>
done
If you want the -threads option to have an effect on the encoder, you should put it after the -i argument, before the output filename - your current option only tells the decoding part to use a single thread. So to keep it all using a single thread, you want -threads 1 both before and after the -i option. so you can do it like:
ffmpeg -threads 1 -i INPUT.mp4 -threads 1 OUTPUT.mp4

why the job has been submitted using qsub is unknown?

In regard to create a PBS script file to run long-term jobs on a server with 256 GB of RAM and two CPUs, each with 12 cores and 24 threads, yielding 48 computing unit. I tried to do it, but I think there is something wrong.
I created a PBS script named run_trinity and submitted it to server using qsub command (qsub run_trinity.sh) within the same directory that contain my desired program (trinity) and data, and it returned something like 47.chpc. But when I tried to check the status of job using qstat command, it says: unknown job id 47.chpc. I'm a biology student and really new in this field, could you please help me to figure out what happened? here is my PBS script:
#!/bin/bash
#PBS -N run_trinity
#PBS -l nodes=1:ppn=6
#PBS -l walltime=100:00:00
#PBS -l mem=200gb
#PBS -j oe
#Set stack size to unlimited
ulimit -s unlimited
cd /home/mary/software/trinityrnaseq_r20140717
perl /home/mary/software/trinityrnaseq_r20140717/Trinity.pl --seqType fq --JM 200G --normalize_reads --left reads8_1.fq.gz --right reads8_2.fq.gz --SS_lib_type FR --CPU 6 --full_cleanup --output /home/mary/software/trinityrnaseq_r20140717
Looking forward to hearing your perfect solutions.

Resources