So I have been struggling with this task for eternity and still don't get what went wrong. This program doesn't seem to download ANY pdfs. At the same time I checked the file that stores final links - everything stored correctly. The $PDFURL also checked, stores correct values. Any bash fans ready to help?
#!/bin/sh
#create a temporary directory where all the work will be conducted
TMPDIR=`mktemp -d /tmp/chiheisen.XXXXXXXXXX`
echo $TMPDIR
#no arguments given - error
if [ "$#" == "0" ]; then
exit 1
fi
# argument given, but wrong format
URL="$1"
#URL regex
URL_REG='(https?|ftp|file)://[-A-Za-z0-9\+&##/%?=~_|!:,.;]*[-A-Za-z0-9\+&##/%=~_|]'
if [[ ! $URL =~ $URL_REG ]]; then
exit 1
fi
# go to directory created
cd $TMPDIR
#download the html page
curl -s "$1" > htmlfile.html
#grep only links into temp.txt
cat htmlfile.html | grep -o -E 'href="([^"#]+)\.pdf"' | cut -d'"' -f2 > temp.txt
# iterate through lines in the file and try to download
# the pdf files that are there
cat temp.txt | while read PDFURL; do
#if this is an absolute URL, download the file directly
if [[ $PDFURL == *http* ]]
then
curl -s -f -O $PDFURL
err="$?"
if [ "$err" -ne 0 ]
then
echo ERROR "$(basename $PDFURL)">&2
else
echo "$(basename $PDFURL)"
fi
else
#update url - it is always relative to the first parameter in script
PDFURLU="$1""/""$(basename $PDFURL)"
curl -s -f -O $PDFURLU
err="$?"
if [ "$err" -ne 0 ]
then
echo ERROR "$(basename $PDFURLU)">&2
else
echo "$(basename $PDFURLU)"
fi
fi
done
#delete the files
rm htmlfile.html
rm temp.txt
P.S. Another minor problem I have just spotted. Maybe the problem is with the if in regex? I pretty much would like to see something like that there:
if [[ $PDFURL =~ (https?|ftp|file):// ]]
but this doesn't work. I don't have unwanted parentheses there, so why?
P.P.S. I also ran this script on URLs beginning with http, and the program gave the desired output. However, it still doesn't pass the test.
Related
I am a beginner and trying to write a script that takes a config file (example below) and sets the rights for the users, if that user or group doesn´t exist, they get added.
For every line in the file, I am cutting out the user or the group and check if they exist.
Right now I only check for users.
#!/bin/bash
function SetRights()
{
if [[ $# -eq 1 && -f $1 ]]
then
for line in $1
do
var1=$(cut -d: -f2 $line)
var2=$(cat /etc/passwd | grep $var1 | wc -l)
if [[ $var2 -eq 0 ]]
then
sudo useradd $var1
else
setfacl -m $line
fi
done
else
echo Enter the correct path of the configuration file.
fi
}
SetRights $1
The config file looks like this:
u:TestUser:- /home/temp
g:TestGroup:rw /home/temp/testFolder
u:TestUser2:r /home/temp/1234.txt
The output:
grep: TestGroup: No such file or directory
grep: TestUser: No such file or directory
"The useradd help menu"
If you could give me a hint what I should look for in my research, I would be very grateful.
Is it possible to reset var1 and var2? Using unset didn´t work for me and I couldn´t find variables could only be set once.
It's not clear how you are looping over the contents of the file -- if $1 contains the file name, you should not be seeing the errors you report.
But anyway, here is a refactored version which hopefully avoids your problems.
# Avoid Bash-only syntax for function definition
SetRights() {
# Indent function body
# Properly quote "$1"
if [[ $# -eq 1 && -f "$1" ]]
then
# Read lines in file
while read -r acl file
do
# Parse out user
user=${acl#*:}
user=${user%:*}
# Avoid useless use of cat
# Anchor regex correctly
if ! grep -q "^$user:" /etc/passwd
then
# Quote user
sudo useradd "$user"
else
setfacl -m "$acl" "$file"
fi
done <"$1"
else
# Error message to stderr
echo Enter the correct path of the configuration file. >&2
# Signal failure to the caller
return 1
fi
}
# Properly quote argument
SetRights "$1"
Some weeks ago I found in this site a very useful bash script that downloads images from google image results (download images from google with command line)
Although the script is quite complicate for me, I did some simple modifications so as not to rename the results so as to keep the original names.
However, since the last week, the script stopped working... probably Google updated the code or something, and the regexes of the script don't parse the results any more. I don't know enough about google's codes, web programing or regexing to see what is wrong, although I did some educated guesses, but still didn't work.
My (unworking) tweaked script is this
#! /bin/bash
# function to create all dirs til file can be made
function mkdirs {
file="$1"
dir="/"
# convert to full path
if [ "${file##/*}" ]; then
file="${PWD}/${file}"
fi
# dir name of following dir
next="${file#/}"
# while not filename
while [ "${next//[^\/]/}" ]; do
# create dir if doesn't exist
[ -d "${dir}" ] || mkdir "${dir}"
dir="${dir}/${next%%/*}"
next="${next#*/}"
done
# last directory to make
[ -d "${dir}" ] || mkdir "${dir}"
}
# get optional 'o' flag, this will open the image after download
getopts 'o' option
[[ $option = 'o' ]] && shift
# parse arguments
count=${1}
shift
query="$#"
[ -z "$query" ] && exit 1 # insufficient arguments
# set user agent, customize this by visiting http://whatsmyuseragent.com/
useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'
# construct google link
link="www.google.cz/search?q=${query}\&tbm=isch"
# fetch link for download
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)
imagelink="${imagelink%\%*}"
# get file extention (.png, .jpg, .jpeg)
ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")
# set default save location and file name change this!!
dir="$PWD"
file="google image"
# get optional second argument, which defines the file name or dir
if [[ $# -eq 2 ]]; then
if [ -d "$2" ]; then
dir="$2"
else
file="${2}"
mkdirs "${dir}"
dir=""
fi
fi
# construct image link: add 'echo "${google_image}"'
# after this line for debug output
google_image="${dir}/${file}"
# construct name, append number if file exists
if [[ -e "${google_image}${ext}" ]] ; then
i=0
while [[ -e "${google_image}(${i})${ext}" ]] ; do
((i++))
done
google_image="${google_image}(${i})${ext}"
else
google_image="${google_image}${ext}"
fi
# get actual picture and store in google_image.$ext
wget --max-redirect 0 -q "${imagelink}"
# if 'o' flag supplied: open image
[[ $option = "o" ]] && gnome-open "${google_image}"
# successful execution, exit code 0
exit 0
one way to invetigate : provide -x option to bash so to have the trace of your script; that is change /bin/bash to /bin/bash -x in your script -or- simply invoke your script with
bash -x <yourscript>
You can also annotate your script with echo commands to track some variables.
I am trying to do the following in bash:
get my external IP
read first line of a file
compare both values
if it is not the same, delete the file and recreate it with the current address
I really don't know why this fails, all my script does is to output my current address and the first line of the file (which by the way is simply "asd" for testing)
#!/bin/bash
IP= curl http://ipecho.net/plain
OLD= head -n 1 /Users/emse/Downloads/IP/IP.txt
if [ "$IP" = "$OLD" ]; then
exit
else
rm /Users/emse/Downloads/IP/IP.txt
$IP> /Users/emse/Downloads/IP/IP.txt
exit
fi
Some obvious problems in your script:
Don't put spaces on either side of equal sign if you want to do assignment
You want the output of curl, head so wrap them in backticks (`)
You want to write $IP into the file, not to execute the content of it as a command, so echo it
The script becomes:
#!/bin/bash
IP=`curl http://ipecho.net/plain`
OLD=`head -n 1 /Users/emse/Downloads/IP/IP.txt`
if [ "$IP" = "$OLD" ]; then
exit
else
rm /Users/emse/Downloads/IP/IP.txt
echo $IP > /Users/emse/Downloads/IP/IP.txt
exit
fi
Excellent answer qingbo, just a tad bit of refinement:
#!/bin/bash
IP=`curl http://ipecho.net/plain`
OLD=`head -n 1 /Users/emse/Downloads/IP/IP.txt`
if [ "$IP" != "$OLD" ]; then
echo $IP > /Users/emse/Downloads/IP/IP.txt # > creates/truncates/replaces IP.txt
fi
I want to have two youtube-dl processes (or as much as possible )to run in parallel. Please show me how. thanks in advance.
#!/bin/bash
#package: youtube-dl axel
#file that contains youtube links
FILE="/srv/backup/temp/youtube.txt"
#number of lines in FILE
COUNTER=`wc -l $FILE | cut -f1 -d' '`
#download destination
cd /srv/backup/transmission/completed
if [[ -s $FILE ]]; then
while [ $COUNTER -gt 0 ]; do
#get video link
URL=`head -n 1 $FILE`
#get video name
NAME=`youtube-dl --get-filename -o "%(title)s.%(ext)s" "$URL" --restrict-filenames`
#real video url
vURL=`youtube-dl --get-url $URL`
#remove first link
sed -i 1d $FILE
#download file
axel -n 10 -o "$NAME" $vURL &
#update number of lines
COUNTER=`wc -l $FILE | cut -f1 -d' '`
done
else
break
fi
This ought to work with GNU Parallel:
cd /srv/backup/transmission/completed
parallel -j0 'axel -n 10 -o $(youtube-dl --get-filename -o "%(title)s.%(ext)s" "{}" --restrict-filenames) $(youtube-dl --get-url {})' :::: /srv/backup/temp/youtube.txt
Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Solution
You need to run your command in a subshell, i.e. put your command into ( cmd ) &.
Definition
A shell script can itself launch subprocesses. These subshells let the
script do parallel processing, in effect executing multiple subtasks
simultaneously.
Code
For you it will look like this I guess (I add quote to $vURL) :
( axel -n 10 -o "$NAME" "$vURL" ) &
I don't know if it is the best way, you can define a function and then call it in background
something like this:
#!/bin/bash
#package: youtube-dl axel
#file that contains youtube links
FILE="/srv/backup/temp/youtube.txt"
# define a function
download_video() {
sleep 3
echo $1
}
while read -r line; do
# call it in background, with &
download_video $line &
done < $FILE
script ends quick but function still runs in background, after 3 seconds it will show echos
also used read and while loop to simplify the file reading
Here's my take on it. By avoiding several commands you should see some minor improvement in speed though it might not be noticeable. I did add error checking which can save you time on broken URLs.
#file that contains youtube links
FILE="/srv/backup/temp/youtube.txt"
while read URL ; do
[ -z "$URL" ] && continue
#get video name
if NAME=$(youtube-dl --get-filename -o "%(title)s.%(ext)s" "$URL" --restrict-filenames) ; then
#real video url
if vURL=$(youtube-dl --get-url $URL) ; then
#download file
axel -n 10 -o "$NAME" $vURL &
else
echo "Could not get vURL from $URL"
fi
else
echo "Could not get NAME from $URL"
fi
done << "$FILE"
By request, here's my proposal for paralleling the vURL and NAME fetching as well as the download. Note: Since the download depends on both vURL and NAME there is no point in creating three processes, two gives you about the best return. Below I've put the NAME fetch in its own process, but if it turned out that vURL was consistently faster, there might be a small payoff in swapping it with the NAME fetch. (That way the while loop in the download process won't waste even a second sleeping.) Note 2: This is fairly crude, and untested, it's just off the cuff and probably needs work. And there's probably a much cooler way in any case. Be afraid...
#!/bin/bash
#file that contains youtube links
FILE="/srv/backup/temp/youtube.txt"
GetName () { # URL, filename
if NAME=$(youtube-dl --get-filename -o "%(title)s.%(ext)s" "$1" --restrict-filenames) ; then
# Create a sourceable file with NAME value
echo "NAME='$NAME'" > "$2"
else
echo "Could not get NAME from $1"
fi
}
Download () { # URL, filename
if vURL=$(youtube-dl --get-url $1) ; then
# Wait to see if GetName's file appears
timeout=300 # Wait up to 5 minutes, adjust this if needed
while (( timeout-- )) ; do
if [ -f "$2" ] ; then
source "$2"
rm "$2"
#download file
if axel -n 10 -o "$NAME" "$vURL" ; then
echo "Download of $NAME from $1 finished"
return 0
else
echo "Download of $NAME from $1 failed"
fi
fi
sleep 1
done
echo "Download timed out waiting for file $2"
else
echo "Could not get vURL from $1"
fi
return 1
}
filebase="tempfile${$}_"
filecount=0
while read URL ; do
[ -z "$URL" ] && continue
filename="$filebase$filecount"
[ -f "$filename" ] && rm "$filename" # Just in case
(( filecount++ ))
( GetName "$URL" "$filename" ) &
( Download "$URL" "$filename" ) &
done << "$FILE"
I have cronjob to run a script every day in specific time. The script is for conversion a large file (about 2GB) in specific folder. The problem is that not every day my coleague put the file in the folder before the time, written as cronjob.
Please help me to add commands in the script or to write second script for:
Check if the file exists in the folder.
If the previous action is true, check the file size every minute. (I would like to avoid conversion of still incomming large file).
If filesize stays unchanged for 2 minutes, start the script for conversion.
I give you the important lines of the script so far:
cd /path-to-folder
for $i in *.mpg; do avconv -i "$i" "out-$i.mp4" ; done
10x for the help!
NEW CODE AFTER COMMENTS:
There is file in the folder!
#! /bin/bash
cdate=$(date +%Y%m%d)
dump="/path/folder1"
base=$(ls "$dump")
if [ -n "$file"]
then
file="$dump/$base"
size=$(stat -c '%s' "$file")
count=0
while sleep 10
do
size0=$(stat -c '%s' "$file")
if [ $size=$size0 ]
then $((count++))
count=0
fi
if [ $count = 2 ]
then break
fi
done
# file has been stable for two minutes. Start conversion.
CONVERSION CODE
fi
MESSAGE IN TERMINAL: Maybe error???
script.sh: 17: script.sh: arithmetic expression: expecting primary: "count++"
file=/work/daily/dump/name_of_dump_file
if [ -f "$file" ]
then
# size=$(ls -l "$file" | awk '{print $5}')
size=$(stat -c '%s' "$file")
count=0
while sleep 60
do
size0=$(stat -c '%s' "$file")
if [ $size = $size0 ]
then : $((count++))
else size=$size0
count=0
fi
if [ $count = 2 ]
then break
fi
done
# File has been stable for 2 minutes — start conversion
fi
Given the slightly revised requirements (described in the comments), and assuming that the file names do not contain spaces or newlines or other similarly awkward characters, then you can use:
dump="/work/daily/dump" # folder 1
base=$(ls "$dump")
if [ -n "$file" ]
then
file="$dump/$base"
...code as before...
# File has been stable for 2 minutes - start conversion
dir2="/work/daily/conversion" # folder 2
file2="$dir2/$(basename $base .mpg).xyz"
convert -i "$file" -o "$file2"
mv "$file" "/work/daily/originals" # folder 3
ncftpput other.coast.example.com /work/daily/input "$file2"
mv "$file2" "/work/daily/converted" # folder 4
fi
If there's nothing in the folder, the process exits. If you want it to wait until there is a file to convert, then you need a loop around the file test:
while file=$(ls "$dump")
[ -z "$file" ]
do sleep 60
done
This uses a little-known feature of shell loops; you can stack the commands in the control, but it is the exit status of the last one that controls the loop.
Well, I finally made some working code as follows:
#!/bin/bash
cdate=$(date +%Y%m%d)
folder1="/path-to-folder1"
cd $folder1
while file=$(ls "$folder1")
[ -z "$file" ]
do sleep 5 && echo "There in no file in the folder at $cdate."
done
echo "There is a file in folder at $cdate"
size1=$(stat -c '%s' "$file")
echo "The size1 is $size1 at $cdate"
size2=$(stat -c '%s' "$file")
echo "The size2 is $size2 at $cdate"
if [ $size1 = $size2 ]
then
echo "file is stable at $cdate. Do conversion."
Is the next line the right one to loop the same script???
else sh /home/user/bin/exist-stable.sh
fi
The right code after comments below is
else exec /home/user/bin/exist-stable.sh
fi