get veehd url in bash/python? - bash

Can anybody figure out how to get the .avi URL of a veehd[dot]com video, by providing the page of the video in a script? It can be BASH, or Python, or common programs in Ubuntu.
They make you install a extension, and I've tried looking at the code but I can't figure it out.

This worked for me:
#!/bin/bash
URL=$1 # page with the video
FRAME=`wget -q -O - $URL | sed -n -e '/playeriframe.*do=d/{s/.*src : "//;s/".*//p;q}'`
STREAM=`wget -q -O - http://veehd.com$FRAME | sed -n -e '/<a/{s/.*href="//;s/".*//p;q}'`
echo $STREAM

Related

How to use curl get image right filename and extension

My links just like below
https://cdn.sspai.com/2022/06/22/article/a88df95f3401d5b6c9d716bf31eeef33?imageView2/2/w/1120/q/90/interlace/1/ignore-error/1
If I use chrome to open this like and cmd + s
I will get the right filename and right extension png.
But if I use bash below, then it will no extension:
curl -J -O https://cdn.sspai.com/2022/06/22/article/a88df95f3401d5b6c9d716bf31eeef33?imageView2/2/w/1120/q/90/interlace/1/ignore-error/1
I just want to download image with right filename and extension.
a88df95f3401d5b6c9d716bf31eeef33.png
Same error include different image links below:
https://cdn.sspai.com/article/fa848601-4cdf-38b0-b020-7afd6efc4a7e.jpg?imageMogr2/auto-orient/quality/95/thumbnail/!800x400r/gravity/Center/crop/800x400/interlace/1
You can get the name from the URL itself.
url="YOUR-URL"
file="`echo "${url}" | sed 's|\?.*|.jpg|' | xargs basename`"
curl -o "${file}.tmp" "${url}"
mv "${file}.tmp" "${file}"
Hope it helps

wget to display filename and percentage downloaded

I am trying to use wget in a bash and display a custom download percentage per file so that the user knows the process is running and what has been downloaded. The below seems to download the files but there is no percentage or filename being downloaded displayed. I am not sure why and cannot seem to figure it out. Thank you :).
list
xxxx://www.xxx.com/xxx/xxxx/xxx/FilterDuplicates.html
xxxx://www.xxx.com/xxx/xxxx/xxx/file1.bam
xxxx://www.xxx.com/xxx/xxxx/xxx/file2.bam
xxxx://www.xxx.com/xxx/xxxx/xxx/file1.vcf.gz
xxxx://www.xxx.com/xxx/xxxx/xxx/file2.vcf.gz
bash that uses list to download all fiiles
# download all from list
download() {
local url=$1
echo -n " "
wget --progress=dot $url 2>&1 | grep --line-buffered "%" | sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
echo -ne "\b\b\b\b"
echo " starting download"
}
cd "/home/user/Desktop/folder/subfolder"
wget -i /home/cmccabe/list --user=xxx--password=xxx --xxx \
xxxx://www.xxx.com/xxx/xxxx/xxx/ 2>&1 -o wget.log | download

Saving files to separate html file before using grep/sed

I'm working on a project that lets me navigate some urls. Right now I have:
#!/bin/bash
for file in $1
do
wget $1 >> output.html
cat output.html | grep -o '<a .*href=.*>' |
sed -e 's/<a /\n<a /g' |
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d' |
grep 'http'
done
I want the user to be able to run the script as follows:
./navigator google.com
which will save the source of url into a new html file, which will then run my grep/seds and then save to a new file.
Right now I'm struggling with saving the url into a new html file. Help!
To create a new file for each URL, use url in your output filename for wget -O option:
#!/bin/bash
for url; do
out="output-$url.html"
wget -q "$url" -O "$out"
grep -o '<a .*href=.*>' "$out" |
sed -e 's/<a /\n<a /g' |
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d' |
grep 'http'
done
PS: As per comments above, added -q in wget to make it totally quiet.

wget to parse a webpage in shell

I am trying to extract URLS from a webpage using wget. I tried this
wget -r -l2 --reject=gif -O out.html www.google.com | sed -n 's/.*href="\([^"]*\).*/\1/p'
It is displaiyng FINISHED
Downloaded: 18,472 bytes in 1 files
But not displaying the weblinks. If I try to do it seperately
wget -r -l2 --reject=gif -O out.html www.google.com
sed -n 's/.*href="\([^"]*\).*/\1/p' < out.html
Output
http://www.google.com/intl/en/options/
/intl/en/policies/terms/
It is not displaying all the links
ttp://www.google.com
http://maps.google.com
https://play.google.com
http://www.youtube.com
http://news.google.com
https://mail.google.com
https://drive.google.com
http://www.google.com
http://www.google.com
http://www.google.com
https://www.google.com
https://plus.google.com
And more over I want to get links from 2nd level and more can any one give a solution for this
Thanks in advance
The -O file option captures the output of wget and writes it to the specified file, so there is no output going through the pipe to sed.
You can say -O - to direct wget output to standard output.
If you don't want to use grep, you can try
sed -n "/href/ s/.*href=['\"]\([^'\"]*\)['\"].*/\1/gp"

Passing curl results to wget with bash

I have a small script that i'd like to use with cron.
Purpose: Get webpage with links, extract dates from link and download files.
Script below is not working 100% and i can't see the problem.
#!/bin/bash
for i in $(curl http://107.155.72.213/anarirecap.php 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | grep '_whole_1_3000.mp4'); do
GAMEDAY=$(echo "$i" | grep -Eo '[[:digit:]]{4}/[[:digit:]]{2}/[[:digit:]]{2}')
wget "$i" --output-document="$GAMEDAY.mp4"
done
It get's the webpage "curl http://...etc" - works
$DAY - extracts the date - works.
wget part not working when i add $DAY. I'm i blind ... what am i missing.
Look at your output format here:
wget "$i" -O 2015/05/12.mp4
This is looking for a directory named 2015 with a subdirectory named 05 in which to place the file 12.mp4. Those directories don't exist, so you get 2015/05/12.mp4: No such file or directory.
If you want to replace the /s with underscores:
wget -O "${GAMEDAY//\//_}" "$i"
Alternately, if you want to create the directories if they don't exist:
mkdir -p -- "(dirname "$GAMEDAY")"
wget -O "$GAMEDAY" "$i"

Resources