I've seen many similar questions, but none really works for me.
I have a simple script that downloads latest 2 video episodes from one channel, so I can watch them later on mobile in offline mode. It uses yt-dlp command with -w switch not to overwrite if video exists. I would like to colorize "Destination" so I see more clearly file name, and "has already been downloaded" if file has already been downloaded.
So my script looks like this:
cd /tmp
EPI=$(curl -s https://rumble.com/c/robbraxman | grep -m 2 "href=/\v"| awk '{print $5}' |awk -F\> '{print $1}')
for i in ${EPI//href=/http://rumble.com}; do
printf "\e[1;38;5;212m Downloading \e[1;38;5;196m $i\e[m \n"
yt-dlp -f mp4-480p -w $i
done
output of the yt-dlp is like:
[Rumble] Extracting URL: http://rumble.com/v2820uk-let-me-show-you-brax.me-privacy-focused-app.html
[Rumble] v2820uk-let-me-show-you-brax.me-privacy-focused-app.html: Downloading webpage
[RumbleEmbed] Extracting URL: https://rumble.com/embed/v25g3g0
[RumbleEmbed] v25g3g0: Downloading JSON metadata
[info] v25g3g0: Downloading 1 format(s): mp4-480p
[download] Let Me Show You Brax.Me - Privacy Focused App [v25g3g0].mp4 has already been downloaded
[download] 100% of 20.00MiB
[Rumble] Extracting URL: http://rumble.com/v2820uk-let-me-show-you-brax.me-privacy-focused-app.html
[Rumble] v2820uk-let-me-show-you-brax.me-privacy-focused-app.html: Downloading webpage
[RumbleEmbed] Extracting URL: https://rumble.com/embed/v25g3g0
[RumbleEmbed] v25g3g0: Downloading JSON metadata
[info] v25g3g0: Downloading 1 format(s): mp4-480p
[download] Destination: Let Me Show You Brax.Me - Privacy Focused App [v25g3g0].mp4
[download] 3.8% of 525.67MiB at 5.79MiB/s ETA 01:27^C
So I would probably need to pipe output somehow, and then color it. yt-dlp actually colors progress when downloading. I've tried to put grc in front of yt-dlp but didn't color it.
You can colorize specific things you want using grep:
yt-dlp 'URL' | grep --color -P "^|(Destination.*|.*has already been downloaded)"
Basically it will match all lines with ^, so those lines will still be printed, however since this match contains no visible characters those lines will not be colored.
Then you just add in the parts you want colored after the regex |. The --color shouldn't be neccesary, just adding it to make sure that is not the issue.
Related
I want to import several number of files into my server using wget , the 492 files are here:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736
so I want to copy the URL of all files in "File Name" column to save them into a file and import them with wget.
So how can I copy all those URLs from that column ?
Thanks for reading :)
Since you've tagged bash, this should work.
wget -O- is used to output the data to the standard output, where it's greppable. (curl would do that by default.)
grep -oE is used to capture the URLs (which happily are in a regular enough format that a simple regexp works).
Then, wget -i is used to read URLs from the file generated. You might wish to add -nc or other suitable partial-fetch flags; those files are pretty hefty.
wget -O- https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 | grep -oE 'http://ftp.sra.ebi.ac.uk/[^"]+' > urls.txt
wget -i urls.txt
First, I recommend using a more specific and robust implementation...
but, in the case you are against a wall and in a hurry -
$: curl -s https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 |
sed -En '/href="http:\/\/.*clean.fastq.gz"/{s/^.*href="([^"]+)".*/\1/;p;}' |
while read url; do wget "$url"; done
This is a quick and dirty rough first pass, but it will give you something to work with.
If you aren't in a screaming hurry, try writing something more robust and step-wise in perl or python.
I have a directory full of .mp3 files with filenames that contain a youtube link in it.
All of the youtube watch URL parts in particular start with a - and end with a .mp3.
However, there is a problem.
Some youtube links have -'s in them, and some of the titles have -'s in them too.
I need to extract only this part of the video from the title:
https://www.youtube.com/watch?v= (dQw4w9WgXcQ)
The title of the video downloaded with youtube-dl is:
Rick Astley - Never Gonna Give You Up-dQw4w9WgXcQ.mp3
The title of the video is:
Rick Astley - Never Gonna Give You Up
What I was trying to accomplish is to get all the links that I had already downloaded and put it in a text file that tells youtube-dl to not re-download them (download archive)
How would I go about doing this? (preferably with a bash sed command, but at this point i am willing to try anything.)
it's easier than you think: the greedy .* followed by - will eat all the -s until the last one:
# first get the titles an ids into a tab-separated multiline string
both=`find * -name "*.mp3" | sed 's/\(.*\)-\(.*\)\.mp3/\1\t\2/'`
# then cut it into two multiline strings
titles=`echo "$both" | cut -f1`
ids=`echo "$both" | cut -f2`
# or process each title-id pair one-by-one
echo "$both" | while IFS=$'\t' read title id; do
echo "$title"
echo "$id"
done
Im trying to make youtube music player on my raspberry, and I've stuck on this moment:
Wget is downloading site for example: https://www.youtube.com/results?search_query=test to file output.html
Links in that site are saved in strings like that: <a href="/watch?v=DDzfeTTigKo"
Now when I am trying to grep them cat site | grep -B 0 -A 0 watch?v=
It prints me the wall of text from that file, and I just want that specific lines like i mention above. And i want it to be saved in file site2
Is this possible?
Try this with GNU grep:
grep -o '"/watch?v=[^"]*"' file.html
I want to extract just the first filename from a remote zip archive without downloading the entire zip. In particular, I'm trying to get the build number of dartium (link to zip file). Since the file is quite large, I don't want to download the entire thing.
If I download the entire thing, unzip -l reports the first file as being: 0 2013-04-07 12:18 dartium-lucid64-inc-21033.0/. I want to get just this filename so I can parse out the 21033 portion as the build number.
I was doing this (total hack):
_url="https://storage.googleapis.com/dartium-archive/continuous/dartium-lucid64.zip"
curl -s $_url | head -c 256 | sed -n "s:.*dartium-lucid64-inc-\([0-9]\+\).*:\1:p"
It was working when I had my shell in ASCII mode, but I recently switched it to UTF-8 and it seems sed is now honoring that, which breaks my script.
I thought about hacking it by doing:
export LANG=
curl -s ...
But that seemed like an even bigger hack.
Is there a better way?
Firstly, you can set bytes range using curl.
Next, use "strings" to extract all strings from binary stream.
Add "q" after "p" to quit after find only first occurrence.
curl -s $_url -r0-256 | strings | sed -n "s:.*dartium-lucid64-inc-\([0-9]\+\).*:\1:p;q"
Or this:
curl -s $_url -r0-256 | strings | sed -n "/dartium-lucid64/{s:.*-\([^-]\+\)\/.*:\1:p;q}"
It must be a bit faster and more reliable. Also it extracts full version, including subversion (if you need it).
To get the dimensions of a file, I can do:
$ mediainfo '--Inform=Video;%Width%x%Height%' ~/Desktop/lawandorder.mov
1920x1080
However, if I give a url instead of a file, it returns None:
$ mediainfo '--Inform=Url;%Width%x%Height%' 'http://url/lawandorder.mov'
(none)
How would I correctly pass a url to MediaInfo?
You can also use curl | head to partially download the file before running mediainfo.
Here's an example of getting the dimensions of a 12 MB file from the web, where only a small portion (less than 10 KB) from the start needs to be downloaded:
curl --silent http://www.jhepple.com/support/SampleMovies/MPEG-2.mpg \
| head --bytes 10K > temp.mpg
mediainfo '--Inform=Video;%Width%x%Height%' temp.mpg
To do this, I needed to re-compile from source using '--with-libcurl' option.
$ ./CLI_Compile.sh --with-libcurl
$ cd MediaInfo/Project/GNU/CLI
$ make install
Then I used this command to get video dimensions via http:
$ mediainfo '--Inform=Video;%Width%x%Height%' 'http://url/lawandorder.mov'
Note, this took a considerable amount of time to return the results. I'd recommend using ffmpeg if the file is not local.