I have a line in bash file like ---
curl -L $domain/url1 options
domain is already read from another text file and
domains like
abc.com
google.com
yahoo.com
and i have another separate file which contains further URL (lot in number):
url1
url2
url3
....
url1000
I want to replace that url and append that like:
curl -L abc.com/url1 options
curl -L abc.com/url2 options
curl -L abc.com/url3 options
....
curl -L $abc.com/url1000 options
It is taking too much time manually, so I want to automate this process.
Use a proper loop in bash with Process-substitution,
while IFS= read -r url; do
curl -L abc.com/"$url" options
done <url_file
would just be sufficient (or) in a single-line of the same-loop,
while IFS= read -r url; do curl -L abc.com/"$url" options; done <url_file
For your updated requirement to loop on two files, you need to define multiple file descriptors and read from it,
while IFS= read -r domain <&3; do
while IFS= read -r url <&4; do
curl -L "$domain"/"$url" options
done 4<url.txt
done 3<domain.txt
The above should work fine on any POSIX shell not involving any bash-isms, you could just put the above in a script with a #!/bin/sh she-bang.
Related
I have a hash file containing several md5 hashes.
I want to create a bash script to curl virustotal to check if the hashes are known.
#!/bin/bash
for line in "hash.txt";
do
echo $line; curl -s -X GET --url 'https://www.virustotal.com/vtapi/v2/file/report?apikey=a54237df7c5c38d58d2240xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxcc0a0d7&resource='$line'';
done
but not working.
Could you help me please?
Better use a while loop. Your for loop would only run once, because bash interpret it as a value, not a file. Try this:
while read -r line; do
echo "$line"
curl -s -X GET --url "https://www.virustotal.com/vtapi/v2/file/report?apikey=a54237df7c5c38d58d2240xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxcc0a0d7&resource=$line"
done <"/path/to/hash.txt"
I'm building a .sh script to run curls based on the items (per line) placed in on a file fileWithItems.txt.
This is the script I built:
declare -a array
#assuming fileWithItems.txt contains one element per line to be used in the url is in the same folder as this .sh
mapfile -t array < fileWithItems.txt
host="localhost"
port="PORT"
i=0
while [ ${i} -lt ${#array[#]} ] ; do
curl -X PUT "$host:$port/path1/${array[$i]}/refresh" > log.txt
((i++))
done
Seem that the curl is not being built properly. How could it be optimized?
To elaborate further from my comments, you can do it like this:
host="localhost"
port="PORT"
while IFS= read -r line; do
curl -X PUT "$host:$port/path1/$line/refresh"
done < fileWithItems.txt > log.txt
Please note placement of > log.txt after done so that you don't overwrite same file every time.
I am trying :
1. wget -i url.txt
and
2. wget -O output.ext
How do I join both? Download urls listed in url.txt and save them with the names I specify, as seperate files.
In this situation, i think, you need two files with the same number of lines, to map each url with a corresponding name:
url.txt (source file containing your urls, example content given here):
https://svn.apache.org/repos/asf/click/trunk/examples/click-spring-cayenne/README.txt
https://svn.apache.org/repos/asf/click/trunk/examples/click-spring-cayenne/README.txt
output_names.txt (filenames you want to assign):
readme1.txt
readme2.txt
Then you iterate over both files and pass the contents to wget, e.g. with the following script:
#!/bin/bash
IFS=$'\n' read -d '' -r -a url < "$1"
IFS=$'\n' read -d '' -r -a output < "$2"
len=${#url[#]}
for ((i=0;i<$len;i++))
do
wget "${url[$i]}" -O "${output[$i]}"
done
Call:
./script url.txt output_names.txt
Define all the URLs in url.txt and give this a try to see if this is what you need:
for url in $(cat url.txt); do wget $url -O $url.out ; done
If your URLs consist of one or more URIs, this would replace slash with underscore:
for url in $(cat url.txt); do wget $url -O $(echo $url | sed "s/\//_/g").out ; done
What I want to do: I want to find all the products(URLs) which are not redirected.
To get the final URL after redirection I'm using curl command as follows:
curl -Ls -o /dev/null -w %{url_effective} "$URL"
This is working fine. Now I want to iterate over URLs to get which are the URLs that are not redirected and display them as output of program. I've the following code:
result=""
productIds=$1
for productId in $(echo $productIds | sed "s/,/ /g")
do
echo "Checking product: $productId";
URL="http://localhost/?go=$productId";
updatedURL=`curl -Ls -o /dev/null -w %{url_effective} "$URL"`
echo "URL : $URL, UpdatedUrl: $updatedURL";
if [ "$URL" == "$updatedURL" ]
then
result="$result$productId,";
fi
done
The curl command works only for the first product. But from 2nd to last product, all the URL and updatedURL are same. I can't understand the reason why? The productId is changing in every iteration, so I think it cannot be something related to caching.
Also, I've tried the following variant of curl also:
updatedURL=$(curl -Ls -o /dev/null -w %{url_effective} "$URL")
updatedURL="$(curl -Ls -o /dev/null -w %{url_effective} "$URL")"
Edit: After trying with debug mode and lot of different ways. I noticed a pattern i.e. If I manually hit the following on terminal:
curl -Ls -o /dev/null -w %{url_effective} "http://localhost/?go=32123"
Then in shell script these urls will work fine. But if I don't hit manually then curl will also not work for those products via shell script.
Just add #!/bin/bash to be the first line of shell. It does produce required output. Now invocation should be like this bash file.sh 123,456,789,221
Invocation via Bourne shell sh file.sh 123,456,789,221 does requires code changes.Do let me know if you require that too :)
I would suggest changing your loop to something like this:
IFS=, read -ra productIds <<<"$1"
for productId in "${productIds[#]}"; do
url=http://localhost/?go=$productId
num_redirects=$(curl -Ls -o /dev/null -w %{num_redirects} "$url")
if [ "$num_redirects" -eq 0 ]; then
printf 'productId %s has no redirects\n' "$productId"
fi
done
This splits the first argument passed to the script into an array, using a comma as the delimiter. The number of redirects is stored to a variable. When the number is zero, the message is printed.
I have to admit that I can't see anything inherently broken with your original approach so it's possible that there is something extra going on that we're not aware of. If you could provide a reproducible test case then we would be able to help you more effectively.
I have a list of URLs which I would like to feed into wget using --input-file.
However I can't work out how to control the --output-document value at the same time,
which is simple if you issue the commands one by one.
I would like to save each document as the MD5 of its URL.
cat url-list.txt | xargs -P 4 wget
And xargs is there because I also want to make use of the max-procs features for parallel downloads.
Don't use cat. You can have xargs read from a file. From the man page:
--arg-file=file
-a file
Read items from file instead of standard input. If you use this
option, stdin remains unchanged when commands are run. Other‐
wise, stdin is redirected from /dev/null.
how about using a loop?
while read -r line
do
md5=$(echo "$line"|md5sum)
wget ... $line ... --output-document $md5 ......
done < url-list.txt
In your question you use -P 4 which suggests you want your solution to run in parallel. GNU Parallel http://www.gnu.org/software/parallel/ may help you:
cat url-list.txt | parallel 'wget {} --output-document "`echo {}|md5sum`"'
You can do that like this :
cat url-list.txt | while read url;
do
wget $url -O $( echo "$url" | md5 );
done
good luck