Youtube-dl -o format: can I use conditions - bash

I see so many questions on youtube-dl's -f flag. I want to know about the -o flag. I basically want to do:
-o '%(artist ? artist : "Unknown Artist")s/%(album ? album : "Unknown Album")s/%(track ? track : title)s.%(ext)s'
or if you come from some languages, this might make more sense:
-o '%(artist ?? "Unknown Artist")s/%(album ?? "Unknown Album")s/%(track ?? title)s.%(ext)s'
So how can I do this? The goal is to avoid youtube-dl from making default folders like NA, or to use a more appropriate field when applicable, or even pass in shell args to use as defaults when the tags don't exist.
EDIT
Since it was requested, here is what prompted this:
I am trying to download this playlist. Let's look at these two videos in particular:
https://youtu.be/1HTjrjjmBPU
https://youtu.be/6nllogf68FE
When I run
youtube-dl -o '%(artist)s/%(album)s/%(track)s.%(ext)s' <url>
the resulting output directories are
The Oh Hellos/Dear Wormwood/Prelude.f137.mp4
NA/NA/NA.f137.mp4
The first one is completely acceptable, but the second one is obviously not. I would like to be able to apply a conditional formatting so that the output of the first command stays the same, and the second command outputs either
The Oh Hellos/Dear Wormwood/Exeunt.f137.mp4
or
Unknown Artist/Unknown Album/The Oh Hellos - Exeunt.f137.mp4

You have three options.
Use --output-na-placeholder Unknown to make Unknown directories instead of NA directories.
Create symbolic links for NA to Unknown Artist/Unknown Album:
mkdir "Unknown Artist"; ln -s "Unknown Artist" NA; cd "Unknown Artist"; mkdir "Unknown Album"; ln -s "Unknown Album" NA
Use yt-dlp for defaults:
yt-dlp -o '%(artist|Unknown Artist)s/%(album|Unknown Album)s/%(track|Unknown Track)s.%(ext)s' <url>
Tested: [download] Destination: Unknown Artist/Unknown Album/Unknown Track.mp4

Related

How to download entire youtube playlist with embedded chapter information on ytdl / ytdlp?

I use this to download individual files with chapter information. ▶️
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" -o "%USERPROFILE%\Desktop\%(title)s-%(id)s.%(ext)s" --embed-chapters https://www.youtube.com/watch?v=
I tried doing 🤔
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" -o --yes-playlist "%USERPROFILE%\Desktop\%(title)s-%(id)s.%(ext)s" --embed-chapters https://www.youtube.com/watch?v=ywyQ_eNNCJU&list=PLI84Sf0aDgazRojpYTLTXFE6Iaf5bkYr_
But it gives out error 😔
ERROR: Fixed output name but more than one file to download:
--yes-playlist 'list' is not recognized as an internal or external command, operable program or batch file.
What is the command line to download the entire Youtube playlist with chapter information embedded in each video using ytdl / ytdlp ? ❓
Please help 🙏🏼
Wrong position for parameter --yes-playlist.
Put that before -o:
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" --yes-playlist -o "%USERPROFILE%\Desktop\%(title)s-%(id)s.%(ext)s" --embed-chapters https://www.youtube.com/watch?v=ywyQ_eNNCJU&list=PLI84Sf0aDgazRojpYTLTXFE6Iaf5bkYr_

how to fix featureCounts in miniconda (Linux) with error "featureCounts: invalid option -- 'r'"

featureCounts was called under minconda in Linux subsystem on a Windows 10 computer.
featureCounts -a /mnt/d/.../__.txt -F SAF -readExtensions3 200 -o ___.tsv -O file1.bam file2.bam file3p.bam file4.bam file5.bam file6.bam file7.bam file8.bam
This always results in an error message
featureCounts: invalid option -- 'r'
Version 2.0.1
Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...
## Mandatory arguments:
-a <string> Name of an annotation file. GTF/GFF format by default. See...
And then reprinting the required and optional arguments for featureCounts function.
Does anyone know what does the error message "invalid option --'r'" mean? And how can I fix it?
Is there any difference between calling featureCounts in command prompt (or Terminal on Mac) and calling it in Linux/miniconda3?
It seems that you mistyped option name, it should be --readExtension3 with two dashes in front of it and without 's' in the end. I had a similar problem with --fraction which led me here!

snakemake running nanopolish and making it wait until previous rule is done

Hi I can run the different steps of nanopolish with snakemake. But when I run it it will give an error that the index file created in the bwa rule isnt available yet. After it gives this error it creates the file it that the error was about. If I run snakemake again without removing files it works because the file is there. How can I tell snake make to wait with the next step until the first one is done? I have googled on any ways to solve this problem and all I could find was priority and ruleorder and I have used those but it still doesnt work. Here is the script that I use.
ruleorder: bwa > nanopolish
rule bwa:
input:
"nanopolish/assembly.fasta"
output:
"nanopolish/draft.fa"
conda:
"envs/nanopolish.yaml"
priority:
50
shell:
"bwa index {input} - > {output}"
rule nanopolish:
input:
"nanopolish/assembly.fasta",
"zipped/zipped.gz"
output:
"nanopolish/reads.sorted.bam"
conda:
"envs/nanopolish.yaml"
shell:
"bwa mem -x ont2d {input} | samtools sort -o {output} -T reads.tmp"
You should take a look again at the docs to properly understand the idea of SnakeMake.
Rules describe how to create output files from input files
A rule is not executed until all its input exists, so all you have to do is add the output of the bwa rule
rule nanopolish:
input:
"nanopolish/assembly.fasta",
"nanopolish/draft.fa", # <-- output of bwa
"zipped/zipped.gz"
Ruleorder and priority are not relevant solutions for your problem.

Getting GATK argument error and dont understand?

Hello bash programmers, I am using GATK and trying to loop through my bam files and do local realignment using my target_intervals and known indels. Below is my code I am trying. I am hoping someone can help with the error and correct my code.
# do the local realignment.
echo "local realignment..."
for file in `ls -d adp/map/*marked_duplicates.bam`
do
java -jar ~/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar \
-T IndelRealigner \
-R ~/flybase/fb-r5.57/dmel-all-chromosome-r5.57.fasta \
-I $file \
-known adp/map/*indel_intervals.vcf \
-targetIntervals adp/map/*target_intervals.list \
-o ${file}_realigned_reads.bam
done
wait
# Create a new index file.
echo "indexing the realigned bam file..."
for file in `ls -d adp/map/*realigned_reads.bam`
do
~/software/samtools-1.2/samtools index $file
done
ERROR: when looking this up, it appears to be a coding issue, and I am not seeing it.
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.3-0-g37228af):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Invalid argument value 'adp/map/360M_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 10.
##### ERROR Invalid argument value 'adp/map/517_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 11.
##### ERROR Invalid argument value 'adp/map/517M_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 12.
##### ERROR Invalid argument value 'adp/map/900_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 13.
##### ERROR Invalid argument value 'adp/map/900M_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 14
.
At least part of the problem is the * in your commands. GATK doesn't deal well with globs. To specify multiple values to an argument, specify the argument multiple times.
i.e. instead of
-known adp/map/*indel_intervals.vcf
you need to specify each file with a separate argument
-known adp/map/first_file.indel_intervals.vcf
-known adp/map/second_file.indel_intervals.vcf
There may be other issues as well. For instance, I'm not certain that -targetIntervals can take multiple files as input. Also, that's very old version of gatk, you might want to upgrade to 3.8.

what ps -e -o user:20, pid means?

I have a already written script which has this line PID = ps -e -o user:20,pid,cmd
Could anybody explain me the meaning of this line? I am bit confused with user:20 part
Thanks!
ps is a command name used to show processes running in the system currently.
-e is a "short" option which means that all processes should be listed.
-o user:20,pid,cmd is an option which sets expected format of lines to be printed on screen, i.e. we want the first column to contain usernames (who own the processes) padded to 20 characters, the second column to show process IDs and the third column to contain command names which have been used to start the processes. Just that.
Also, you can simply try to run this yourself in your terminal: ps -e -o user:20,pid,cmd and see what happens.
From ps's man page:
-o format
User-defined format. format is a single argument in the form of a blank-separated or comma-separated list, which offers a way to specify
individual output columns. The recognized keywords are described in the STANDARD FORMAT SPECIFIERS section below. Headers may be renamed (ps
-o pid,ruser=RealUser -o comm=Command) as desired. If all column headers are empty (ps -o pid= -o comm=) then the header line will not be
output. Column width will increase as needed for wide headers; this may be used to widen up columns such as WCHAN (ps -o pid,wchan=WIDE-
WCHAN-COLUMN -o comm). Explicit width control (ps opid,wchan:42,cmd) is offered too. The behavior of ps -o pid=X,comm=Y varies with
personality; output may be one column named "X,comm=Y" or two columns named "X" and "Y". Use multiple -o options when in doubt. Use the
PS_FORMAT environment variable to specify a default as desired; DefSysV and DefBSD are macros that may be used to choose the default UNIX or
BSD columns.
Explicit width control (ps opid,wchan:42,cmd) is offered too.
So you'll get a user column with 20-char's width.
ps - This command report a snapshot of the current processes.
-e , This options helps to select all processes.Identical to -A.
-o , This options helps to specify user-defined format.
user:20 ,
This will help to format the output of ps command.The user:20 will add some extra 20 space character betweens the columns. Below the example will help you to find the difference.
jdeveloper#jdeveloper ~ $ ps -e -o user:20,pid
USER PID
root 2926
jdeveloper 2948
root 3255
root 3570
root 3802
jdeveloper 3825
jdeveloper 3860
Now , lets try with 10 character space padding in response.
jdeveloper#jdeveloper ~ $ ps -e -o user:10,pid
USER PID
root 2926
jdeveloper 2948
root 3255
root 3570
root 3802
jdeveloper 3825
jdeveloper 3863
Find more about ps command using man command.Try
jdeveloper#jdeveloper ~ $ man ps
Hope it will help you.

Resources