Quick search to find active urls - bash

I'm trying to use cURL to find active rediractions and save results to file. I know the url is active, when it redirects at least once to specific website. So I came up with:
if (( $( curl -I -L https://mywebpage.com/id=00001&somehashnumber&si=0 | grep -c "/something/" ) > 1 )) ; then echo https://mywebpage.com/id=00001&somehashnumber&si=0 | grep -o -P 'id=.{0,5}' >> id.txt; else echo 404; fi
And it works, but how to modify it to check id range from 00001 to 99999?

you'll want to wrap the whole operation in a for loop and use a formatted sequence to print the ids you'd like to test. without know too much about the task at hand i would write something like this to test the ids
$ for i in $(seq -f "%06g" 1 100000); do curl --silent "example.com/id=$i" --write-out "$i %{response_code}\n" --output /dev/null; done

Related

Calling bash script from bash script

I have made two programms and I'm trying to call the one from the other but this is appearing on my screen:
cp: cannot stat ‘PerShip/.csv’: No such file or directory
cp: target ‘tmpship.csv’ is not a directory
I don't know what to do. Here are the programms. Could somebody help me please?
#!/bin/bash
shipname=$1
imo=$(grep "$shipname" shipsNAME-IMO.txt | cut -d "," -f 2)
cp PerShip/$imo'.csv' tmpship.csv
dist=$(octave -q ShipDistance.m 2>/dev/null)
grep "$shipname" shipsNAME-IMO.txt | cut -d "," -f 2 > IMO.txt
idnumber=$(cut -b 4-10 IMO.txt)
echo $idnumber,$dist
#!/bin/bash
rm -f shipsdist.csv
for ship in $(cat shipsNAME-IMO.txt | cut -d "," -f 1)
do
./FindShipDistance "$ship" >> shipsdist.csv
done
cat shipsdist.csv | sort | head -n 1
The code and error messages presented suggest that the second script is calling the first with an empty command-line argument. That would certainly happen if input file shipsNAME-IMO.txt contained any empty lines or otherwise any lines with an empty first field. An empty line at the beginning or end would do it.
I suggest
using the read command to read the data, and manipulating IFS to parse out comma-delimited fields
validating your inputs and other data early and often
making your scripts behave more pleasantly in the event of predictable failures
More generally, using internal Bash features instead of external programs where the former are reasonably natural.
For example:
#!/bin/bash
# Validate one command-line argument
[[ -n "$1" ]] || { echo empty ship name 1>&2; exit 1; }
# Read and validate an IMO corresponding to the argument
IFS=, read -r dummy imo tail < <(grep -F -- "$1" shipsNAME-IMO.txt)
[[ -f PerShip/"${imo}.csv" ]] || { echo no data for "'$imo'" 1>&2; exit 1; }
# Perform the distance calculation and output the result
cp PerShip/"${imo}.csv" tmpship.csv
dist=$(octave -q ShipDistance.m 2>/dev/null) ||
{ echo "failed to compute ship distance for '${imo}'" 2>&1; exit 1; }
echo "${imo:3:7},${dist}"
and
#!/bin/bash
# Note: the original shipsdist.csv will be clobbered
while IFS=, read -r ship tail; do
# Ignore any empty ship name, however it might arise
[[ -n "$ship" ]] && ./FindShipDistance "$ship"
done < shipsNAME-IMO.txt |
tee shipsdist.csv |
sort |
head -n 1
Note that making the while loop in the second script part of a pipeline will cause it to run in a subshell. That is sometimes a gotcha, but it won't cause any problem in this case.

How to run multiple curl requests in parallel with multiple variables

Set Up
I currently have the below script working to download files with curl, using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl is takes longer to generate, my script is now taking too much time to complete.
Objective
I want to be able to update this script so I have curl request and download multiple files as they are ready - as opposed to waiting for each file to be requested and downloaded sequentially.
I've had a look around and seen that I could use either xargs or parallel to achieve this however based on the past questions I've seen, youtube videos and other forum posts, I have haven't been able to find an example that explains if this is possible using more than one variable.
Can someone confirm if this is possible and which tool is better suited to achieve this? Is my current script in the right configuration or do I need to amend a lot of it to shoe horn these commands in?
I suspect this may be a questions that's been asked previously and I may have just not found the right one.
account-list.tsv
client1 account1 123 platform1 50
client2 account1 234 platform1 66
client3 account1 344 platform1 78
client3 account2 321 platform1 209
client3 account2 321 platform2 342
client4 account1 505 platform1 69
download.sh
#!/bin/bash
set -eu
user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)
curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account#$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
done < account-list.tsv
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
exit
Using GNU Parallel it looks something like this to fetch 100 entries in parallel:
#!/bin/bash
set -eu
user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)
curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
fetch_one() {
client="$1"
account="$2"
accountid="$3"
platform="$4"
platformid="$5"
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account#$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
}
export -f fetch_one
while true; do
cat account-list.tsv | parallel -j100 --colsep '\t' fetch_one
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
exit
One (relatively) easy way to run several processes in parallel is to wrap the guts of the call in a function and then call the function inside the while loop, making sure to put the function call in the background, eg:
# function definition
docurl () {
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account#$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
}
# call the function within OP's inner while loop
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
docurl & # put the function call in the background so we can continue loop processing while the function call is running
done < account-list.tsv
wait # wait for all background calls to complete
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
One issue with this approach is that for a large volume of curl calls it may be possible to bog down the underlying system and/or cause the remote system to reject 'too many' concurrent calls. In this case it'll be necessary to limit the number of concurrent curl calls.
One idea would be to keep a counter of the number of currently running (backgrounded) curl calls and when we hit a limit we wait for a background process to complete before spawning a new one, eg:
max=5 # limit of 5 concurrent/backgrounded calls
ctr=0
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
docurl &
ctr=$((ctr+1))
if [[ "${ctr}" -ge "${max}" ]]
then
wait -n # wait for a background process to complete
ctr=$((ctr-1))
fi
done < account-list.tsv
wait # wait for last ${ctr} background calls to complete
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done

Bash, loop unexpected stop

I'm having problems with this last part of my bash script. It receives input from 500 web addresses and is supposed to fetch the server information from each. It works for a bit but then just stops at like the 45 element. Any thoughts with my loop at the end?
#initializing variables
timeout=5
headerFile="lab06.output"
dataFile="fortune500.tsv"
dataURL="http://www.tech.mtu.edu/~toarney/sat3310/lab09/"
dataPath="/home/pjvaglic/Documents/labs/lab06/data/"
curlOptions="--fail --connect-timeout $timeout"
#creating the array
declare -a myWebsitearray
#obtaining the data file
wget $dataURL$dataFile -O $dataPath$dataFile
#getting rid of the crap from dos
sed -n "s/^m//" $dataPath$dataFile
readarray -t myWebsitesarray < <(cut -f3 -d$'\t' $dataPath$dataFile)
myWebsitesarray=("${myWebsitesarray[#]:1}")
websitesCount=${#myWebsitesarray[*]}
echo "There are $websitesCount websites in $dataPath$dataFile"
#echo -e ${myWebsitesarray[200]}
#printing each line in the array
for line in ${myWebsitesarray[*]}
do
echo "$line"
done
#run each website URL and gather header information
for line in "${myWebsitearray[#]}"
do
((count++))
echo -e "\\rPlease wait... $count of $websitesCount"
curl --head "$curlOptions" "$line" | awk '/Server: / {print $2 }' >> $dataPath$headerFile
done
#display results
echo "Results: "
sort $dataPath$headerFile | uniq -c | sort -n
It would certainly help if you actually passed the --connect-timeout option to curl. As written, you are currently passing the single argument --fail --connect-timeout $timeout rather than 3 distinct arguments --fail, --connect-timeout, and $timeout. This is one instance where you should not quote the variable. IOW, use:
curl --head $curlOptions "$line"

Efficiently find PIDs of many processes started by services

I have a file with many service names, some of them are running, some of them aren't.
foo.service
bar.service
baz.service
I would like to find an efficient way to get the PIDs of the running processes started by the services (for the not running ones a 0, -1 or empty results are valid).
Desired output example:
foo.service:8484
bar.server:
baz.service:9447
(bar.service isn't running).
So far I've managed to do the following: (1)
cat t.txt | xargs -I {} systemctl status {} | grep 'Main PID' \
| awk '{print $3}'
With the following output:
8484
9447
But I can't tell which service every PID belongs to.
(I'm not bound to use xargs, grep or awk.. just looking for the most efficient way).
So far I've managed to do the following: (2)
for f in `cat t.txt`; do
v=`systemctl status $f | grep 'Main PID:'`;
echo "$f:`echo $v | awk '{print \$3}'`";
done;
-- this gives me my desired result. Is it efficient enough?
I ran into similar problem and fount leaner solution:
systemctl show --property MainPID --value $SERVICE
returns just the PID of the service, so your example can be simplified down to
for f in `cat t.txt`; do
echo "$f:`systemctl show --property MainPID --value $f`";
done
You could also do:
while read -r line; do
statuspid="$(sudo service $line status | grep -oP '(?<=(process|pid)\s)[0-9]+')"
appendline=""
[[ -z $statuspid ]] && appendline="${line}:${statuspid}" || appendline="$line"
"$appendline" >> services-pids.txt
done < services.txt
To use within a variable, you could also have an associative array:
declare -A servicearray=()
while read -r line; do
statuspid="$(sudo service $line status | grep -oP '(?<=(process|pid)\s)[0-9]+')"
[[ -z $statuspid ]] && servicearray[$line]="statuspid"
done < services.txt
# Echo output of array to command line
for i in "${!servicearray[#]}"; do # Iterating over the keys of the associative array
# Note key-value syntax
echo "service: $i | pid: ${servicearray[$i]}"
done
Making it more efficient:
To list all processes with their execution commands and PIDs. This may give us more than one PID per command, which might be useful:
ps -eo pid,comm
So:
psidcommand=$(ps -eo pid,comm)
while read -r line; do
# Get all PIDs with the $line prefixed
statuspids=$(echo $psidcommand | grep -oP '[0-9]+(?=\s$line)')
# Note that ${statuspids// /,} replaces space with commas
[[ -z $statuspids ]] && appendline="${line}:${statuspids// /,}" || appendline="$line"
"$appendline" >> services-pids.txt
done < services.txt
OUTPUT:
kworker:5,23,28,33,198,405,513,1247,21171,22004,23749,24055
If you're confident your file has the full name of the process, you can replace the:
grep -oP '[0-9]+(?=\s$line)'
with
grep -oP '[0-9]+(?=\s$line)$' # Note the extra "$" at the end of the regex
to make sure it's an exact match (in the grep without trailing $, line "mys" would match with "mysql"; in the grep with trailing $, it would not, and would only match "mysql").
Building up on Yorik.sar's answer, you first want to get the MainPID of a server like so:
for SERVICE in ...<service names>...
do
MAIN_PID=`systemctl show --property MainPID --value $SERVICE`
if test ${MAIN_PID} != 0
than
ALL_PIDS=`pgrep -g $MAIN_PID`
...
fi
done
So using systemctl gives you the PID of the main process controlled by your daemon. Then the pgrep gives you the daemon and a list of all the PIDs of the processes that daemon started.
Note: if the processes are user processes, you have to use the --user on the systemctl command line for things to work:
MAIN_PID=`systemctl --user show --property MainPID --value $SERVICE`
Now you have the data you are interested in the MAIN_PID and ALL_PIDS variables, so you can print the results like so:
if test -n "${ALL_PID}"
then
echo "${SERVICE}: ${ALL_PIDS}"
fi

shell if statement always returning true

I want to check if my VPN is connected to a specific country. The VPN client has a status option but sometimes it doesn't return the correct country, so I wrote a script to check if I'm for instance connected to Sweden. My script looks like this:
#!/bin/bash
country=Sweden
service=expressvpn
while true; do
if ((curl -s https://www.iplocation.net/find-ip-address | grep $country | grep -v "grep" | wc -l) > 0 )
then
echo "$service connected!!!"
else
echo "$service not connected!"
$service connect $country
fi;
sleep 5;
done
The problem is, it always says "service connected", even when it isn't. When I enter the curl command manually, wc -l returns 0 if it didn't find Sweden and 1 when it does. What's wrong with the if statement?
Thank you
Peter
(( )) enters a math context -- anything inside it is interpreted as a mathematical expression. (You want your code to be interpreted as a math expression -- otherwise, > 0 would be creating a file named 0 and storing wc -l's output in that file, not comparing the output of wc -l to 0).
Since you aren't using )) on the closing side, this is presumably exactly what's happening: You're storing the output of wc -l in a file named 0, and then using its exit status (successful, since it didn't fail) to decide to follow the truthy branch of the if statement. [Just adding more parens on the closing side won't fix this, either, since curl -s ... isn't valid math syntax].
Now, if you want to go the math approach, what you can do is run a command substitution, which replaces the command with its output; that is a math expression:
# smallest possible change that works -- but don't do this; see other sections
if (( $(curl -s https://www.iplocation.net/find-ip-address | grep $country | grep -v "grep" | wc -l) > 0 )); then
...if your curl | grep | grep | wc becomes 5, then after the command substitution this looks like:
if (( 5 > 0 )); then
...and that does what you'd expect.
That said, this is silly. You want to know if your target country is in curl's output? Just check for that directly with shell builtins alone:
if [[ $(curl -s https://www.iplocation.net/find-ip-address) = *"$country"* ]]; then
echo "Found $country in output of curl" >&2
fi
...or, if you really want to use grep, use grep -q (which suppresses output), and check its exit status (which is zero, and thus truthy, if and only if it successfully found a match):
if curl -s https://www.iplocation.net/find-ip-address | grep -q -e "$country"; then
echo "Found $country in output of curl with grep" >&2
fi
This is more efficient in part because grep -q can stop as soon as it finds a match -- it doesn't need to keep reading more content -- so if your file is 16KB long and the country name is in the first 1KB of output, then grep can stop reading from curl (and curl can stop downloading) as soon as that first match 1KB in is seen.
The result of the curl -s https://www.iplocation.net/find-ip-address | grep $country | grep -v "grep" | wc -l statement is text. You compare text and number, that is why your if statement does not work.
This might solve your problem;
if [ $(curl -s https://www.iplocation.net/find-ip-address | grep $country | grep -v "grep" | wc -l) == "0" ] then ...
That worked, thank you for your help, this is what my script looks now:
#!/bin/bash
country=Switzerland
service=expressvpn
while true; do
if curl -s https://www.iplocation.net/find-ip-address | grep -q -e "$country"; then
echo "Found $country in output of curl with grep" >&2
echo "$service not connected!!!"
$service connect Russia
else
echo "$service connected!"
fi;
sleep 5;
done

Resources