Passing curl results to wget with bash - bash

I have a small script that i'd like to use with cron.
Purpose: Get webpage with links, extract dates from link and download files.
Script below is not working 100% and i can't see the problem.
#!/bin/bash
for i in $(curl http://107.155.72.213/anarirecap.php 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | grep '_whole_1_3000.mp4'); do
GAMEDAY=$(echo "$i" | grep -Eo '[[:digit:]]{4}/[[:digit:]]{2}/[[:digit:]]{2}')
wget "$i" --output-document="$GAMEDAY.mp4"
done
It get's the webpage "curl http://...etc" - works
$DAY - extracts the date - works.
wget part not working when i add $DAY. I'm i blind ... what am i missing.

Look at your output format here:
wget "$i" -O 2015/05/12.mp4
This is looking for a directory named 2015 with a subdirectory named 05 in which to place the file 12.mp4. Those directories don't exist, so you get 2015/05/12.mp4: No such file or directory.
If you want to replace the /s with underscores:
wget -O "${GAMEDAY//\//_}" "$i"
Alternately, if you want to create the directories if they don't exist:
mkdir -p -- "(dirname "$GAMEDAY")"
wget -O "$GAMEDAY" "$i"

Related

how to pipe multi commands to bash?

I want to check some file on the remote website.
Here is bash command to generate commands that calculate the file md5
[root]# head -n 3 zrcpathAll | awk '{print $3}' | xargs -I {} echo wget -q -O - -i {}e \| md5sum\;
wget -q -O - -i https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce | md5sum;
wget -q -O - -i https://example.com/zrc/e1bd7171263adb95fb6f732864ceb556.zrce | md5sum;
wget -q -O - -i https://example.com/zrc/5300b80d194f677226c4dc6e17ba3b85.zrce | md5sum;
Then I pipe the outputed commands to bash, but only the first command was executed.
[root]# head -n 3 zrcpathAll | awk '{print $3}' | xargs -I {} echo wget -q -O - -i {}e \| md5sum\; | bash -v
wget -q -O - -i https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce | md5sum;
3d2f0e76e04444f4ec456ef9f11289ec -
[root]#
Would you please try the following instead:
while read -r _ _ url _; do
wget -q -O - "$url"e | md5sum
done < <(head -n 3 zrcpathAll)
we should not put -i in front of "$url" here.
[Explanation about -i option]
Manpage of wget says:
-i file
--input-file=file
Read URLs from a local or external file. [snip]
If this function is used, no URLs need be present on the command line. [snip]
If the file is an external one, the document will be automatically treated as html if the Content-Type matches text/html.
Furthermore, the file's location will be implicitly used as base
href if none was specified.
where the file will contain line(s) of url such as:
https://example.com/zrc/3d2f0e76e04444f4ec456ef9f11289ec.zrce
https://example.com/zrc/e1bd7171263adb95fb6f732864ceb556.zrce
https://example.com/zrc/5300b80d194f677226c4dc6e17ba3b85.zrce
Whereas if we use the option as -i url, wget first
downloads the url as a file which contains the lines of urls
as above. In our case, the url is the target to download itself,
not the list of urls, wget causes an error: No URLs found in url.
Even if the wget fails, why the command outputs just one line, not
three lines as the result of md5sum?
This seems to be because the head command immediately flushes the remaining
lines when the piped subprocess fails.

Bash: Parse Urls from file, process them and then remove them from the file

I am trying to automate a procedure where the system will fetch the contents of a file (1 Url per line), use wget to grab the files from the site (https folder) and then remove the line from the file.
I have made several tries but the sed part (at the end) cannot understand the string (I tried escaping characters) and remove it from that file!
cat File
https://something.net/xxx/data/Folder1/
https://something.net/xxx/data/Folder2/
https://something.net/xxx/data/Folder3/
My line of code is:
cat File | xargs -n1 -I # bash -c 'wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "#" -P /mnt/USB/ && sed -e 's|#||g' File'
It works up until the sed -e 's|#||g' File part..
Thanks in advance!
Dont use cat if it's posible. It's bad practice and can be problem with big files... You can change
cat File | xargs -n1 -I # bash -c
to
for siteUrl in $( < "File" ); do
It's be more correct and be simpler to use sed with double quotes... My variant:
scriptDir=$( dirname -- "$0" )
for siteUrl in $( < "$scriptDir/File.txt" )
do
if [[ -z "$siteUrl" ]]; then break; fi # break line if him empty
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "$siteUrl" -P /mnt/USB/ && sed -i "s|$siteUrl||g" "$scriptDir/File.txt"
done
#beliy answers looks good!
If you want a one-liner, you can do:
while read -r line; do \
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf \
--no-parent --restrict-file-names=nocontrol --user=test \
--password=pass --no-check-certificate "$line" -P /mnt/USB/ \
&& sed -i -e '\|'"$line"'|d' "File.txt"; \
done < File.txt
EDIT:
You need to add a \ in front of the first pipe
I believe you just need to use double quotes after sed -e. Instead of:
'...&& sed -e 's|#||g' File'
you would need
'...&& sed -e '"'s|#||g'"' File'
I see what you trying to do, but I dont understand the sed command including pipes. Maybe some fancy format that I dont understand.
Anyway, I think the sed command should look like this...
sed -e 's/#//g'
This command will remove all # from the stream.
I hope this helps!

wget to display filename and percentage downloaded

I am trying to use wget in a bash and display a custom download percentage per file so that the user knows the process is running and what has been downloaded. The below seems to download the files but there is no percentage or filename being downloaded displayed. I am not sure why and cannot seem to figure it out. Thank you :).
list
xxxx://www.xxx.com/xxx/xxxx/xxx/FilterDuplicates.html
xxxx://www.xxx.com/xxx/xxxx/xxx/file1.bam
xxxx://www.xxx.com/xxx/xxxx/xxx/file2.bam
xxxx://www.xxx.com/xxx/xxxx/xxx/file1.vcf.gz
xxxx://www.xxx.com/xxx/xxxx/xxx/file2.vcf.gz
bash that uses list to download all fiiles
# download all from list
download() {
local url=$1
echo -n " "
wget --progress=dot $url 2>&1 | grep --line-buffered "%" | sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
echo -ne "\b\b\b\b"
echo " starting download"
}
cd "/home/user/Desktop/folder/subfolder"
wget -i /home/cmccabe/list --user=xxx--password=xxx --xxx \
xxxx://www.xxx.com/xxx/xxxx/xxx/ 2>&1 -o wget.log | download

curl complex usage with pattern

I'm trying to get 2 files using curl based on some pattern but that doesn't seem to work:
Files:
SystemOut_15.04.01_21.12.36.log
SystemOut_15.04.01_15.54.05.log
curl -f -k -u "login:password" https://myserver/cgi-bin/logviewer/index.cgi?getlogfile=SystemOut_15.04.01_21.12.36.log'&'server=qwerty123.com'&'numlines=100000000'&'appenv=MBL%20-%20PROD'&'directory=/app/WAS/was85/profiles/node/logs/mbl-server1
I know there is -A key but it doesn't work since my file is inside the link.
How can I extract those 2 files using a pattern?
Did that myself. One curl gets the list of logs on the webpage. Another downloads those files.
The code looks like:
for file in $(curl -f -k -u "user:pwd" https://selfservice.pwj.com/cgi-bin/logviewer/index.cgi?listdirectory=/app/smx_client_mob/data/log'&'appenv=MBL%20-%20PROD'&'server=xshembl04pap.she.pwj.com | grep href | sed 's/.*href="//' | sed 's/".*//' | sed 's/javascript:getLog//g' | sed "s/['();]//g" | grep -i 'service' | grep '^[a-zA-Z].*'); do
curl -o $file -f -k -u "user:pwd" https://selfservice.pwj.com/cgi-bin/logviewer/index.cgi?getlogfile="$file"'&'server=xshembl04pap.she.pwj.com'&'numlines=100000000'&'appenv=MBL%20-%20PROD'&'directory=/app/smx_client_mob/data/log; done

get veehd url in bash/python?

Can anybody figure out how to get the .avi URL of a veehd[dot]com video, by providing the page of the video in a script? It can be BASH, or Python, or common programs in Ubuntu.
They make you install a extension, and I've tried looking at the code but I can't figure it out.
This worked for me:
#!/bin/bash
URL=$1 # page with the video
FRAME=`wget -q -O - $URL | sed -n -e '/playeriframe.*do=d/{s/.*src : "//;s/".*//p;q}'`
STREAM=`wget -q -O - http://veehd.com$FRAME | sed -n -e '/<a/{s/.*href="//;s/".*//p;q}'`
echo $STREAM

Resources