How to parse the output of `ls -l` into multiple variables in bash? - bash

There are a few answers on this topic already, but pretty much all of them say that it's bad to parse the output of ls -l, and therefore suggest other methods.
However, I'm using ncftpls -l, and so I can't use things like shell globs or find – I think I have a genuine need to actually parse the ls -l output. Don't worry if you're not familiar with ncftpls, the output returns in exactly the same format as if you were just using ls -l.
There is a list of files at a public remote ftp directory, and I don't want to burden the remote server by re-downloading each of the desired files every time my cronjob fires. I want to check, for each one of a subset of files within the ftp directory, whether the file exists locally; if not, download it.
That's easy enough, I just use
tdy=`date -u '+%Y%m%d'`_
# Today's files
for i in $(ncftpls 'ftp://theftpserver/path/to/files' | grep ${tdy}); do
if [ ! -f $i ]; then
ncftpget "ftp://theftpserver/path/to/files/${i}"
fi
done
But I came upon the issue that sometimes the cron job will download a file that hasn't finished uploading, and so when it fires next, it skips the partially downloaded file.
So I wanted to add a check to make sure that for each file that I already have, the local file size matches the size of the same file on the remote server.
I was thinking along the lines of parsing the output of ncftpls -l and using awk, something like
for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
...
x=filesize # somehow get the file size and the filename
y=filename # from $i on each iteration and store in variables
...
done
but I can't seem to get both the filename and the filesize from the server into local variables on the same iteration of the loop; $i alternates between $9 and $5 in the awk string with each iteration.
If I could manage to get the filename and filesize into separate variables with each iteration, I could simply use stat -c "%s" $i to get the local size and compare it with the remote size. Then its a simple ncftpget on each remote file that I don't already have. I tinkered with syncing programs like lftp too, but didn't have much luck and would rather do it this way.
Any help is appreciated!

for loop splits when it sees any whitespace like space, tab, or newline. So, IFS is needed before loop, (there are a lot of questions about ...)
IFS=$'\n' && for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
echo $i | awk '{print $NF}' # filesize
echo $i | awk '{NF--; print}' # filename
# you may have spaces in filenames, so is better to use last column for awk
done
The better way I think is to use while not for, so
ls -l | while read i
do
echo $i | awk '{print $9, $5}'
#split them if you want
x=echo $i | awk '{print $5}'
y=echo $i | awk '{print $9}'
done

Related

while-read loop broken on ssh-command

I have a bash-script that moves backup-files to the remote location. On few occasions the temporary HDDs on the remote server had no space left, so I added a md5 check to compare local and remote files.
The remote ssh breaks however the while-loop (i.e. it runs only for first item listed in dir_list file).
# populate /tmp/dir_list
(while read dirName
do
# create archive files for sub-directories
# populate listA variable with archive-file names
...
for fileName in $listA; do
scp /PoolZ/__Prepared/${dirName}/$fileName me#server:/archiv/${dirName}/
md5_local=`md5sum /PoolZ/__Prepared/${dirName}/${fileName} | awk '{ print $1 }'`
tmpRemoteName=`printf "%q\n" "$fileName"` # some file-names have strange characters
md5_remote=`ssh me#server 'md5sum /archiv/'${dirName}'/'$tmpRemoteName | awk '{ print $1 }'`
if [[ $md5_local == $md5_remote ]]; then
echo "Checksum of ${fileName}: on local ${md5_local}, on remote ${md5_remote}."
mv -f /PoolZ/__Prepared/${dirName}/$fileName /PoolZ/__Backuped/${dirName}/
else
echo "Checksum of ${fileName}: on local ${md5_local}, on remote ${md5_remote}."
# write eMail
fi
done
done) < /tmp/dir_list
When started the script gives the same md5-sums for the first directory listed in dir_list. The files are also copied both local and remote to expected directories and then script quits.
If I remove the line:
md5_remote=`ssh me#server 'md5sum /archiv/'${dirName}'/'$tmpRemoteName | awk '{ print $1 }'`
then apparently the md5-comaprison is not working but the whole script goes through whole list from dir_list.
I also tried to use double-quotes:
md5_remote=`ssh me#server "md5sum /archiv/${dirName}/${tmpRemoteName}" | awk '{ print $1 }'`
but there was no difference (broken dirName-loop).
I went so far, that I replaced the md5_remote... line with a remote ls-command without any shell-variables, and eventually I even tried a line without setting value to the md5_remote variable, i.e.:
ssh me#server "ls /dir/dir/dir/ | head -n 1"
Every solution that has a ssh-command breaks the while-loop. I have no idea why ssh should break bash-loop. Any suggestion are welcomed.
I'm plainly stupid. I found just answer on — what a surprise — stackoverflow.com.
ssh breaks out of while-loop in bash
As suggested I added a pipe to /dev/null and it works now:
md5_remote=`ssh me#server 'md5sum /archiv/'${dirName}'/'$tmpRemoteName < /dev/null | awk '{ print $1 }'`

Save multiple variables from bash script to text file

I have a simple bash script I have written to count the number of lines in a collection of text files, and I store each number of lines as a variable using a for loop. I would like to print each variable to the same text file, so that I may access all the line counts at once, from the same file.
My code is:
for f in *Daily.txt; do
lines=$(cat $f | wc -l);
lines=$(($num_lines -1));
echo $lines > /destdrive/linesTally2014.txt;
done
When I run this, the only output I receive is of the final file, not all the other files.
If anyone could help me with this I would really appreciate it. I am new to bash scripting, so please excuse this novice question.
You create the file on each iteration. Move the I/O redirection after the done. Use:
for f in *Daily.txt
do
echo $(( $(wc -l < $f) - 1))
done > /destdrive/linesTally2014.txt
This avoids the variable; if you have a need for it, you can use a fixed version of the original code (use $lines throughout, instead of using $num_lines once). Note that the code in the question has a UUoC (Useless Use of cat) that this version avoids.
You can avoid the loop with
wc -l *Daily.txt | awk '{ print $1 }' > /destdrive/linesTally2014.txt
or (when you want 1 less)
wc -l *Daily.txt | awk '{ print $1 -1 }' > /destdrive/linesTally2014.txt
The above suggestions are probably better, but the problem you're having with your script is your use of the > for redirection, which overwrites the file. Use >> and it will append to the file.
echo $lines >> /destdrive/linesTally2014.txt

Bash script working with second column from txt but keep first column in result as relevant

I am trying to write a bash script to ease a process with IP information gathering.
Right now I have made a script which runs throught the one column of IP address in multiple files, looks for geo and host information and stores it to a new file.
What would be nice is also to have a script that generates a result from files with a 3 columns - date, time, ip address. Separator is space.
I tried this an that but no. I am a total newbie :)
This is my original script:
#!/usr/bin/env bash
find *.txt -print0 | while read -d $'\0' file;
do
for i in $( cat "$file")
do echo -e "$i,"$( geoiplookup -f "/usr/share/GeoIP/GeoLiteCity.dat" $i | cut -d' ' -f6,8-9)" "$(nslookup $i | grep name | awk '{print $4}')"" >> "res/res-"$file".txt";
done
done
Input file example
2014-03-06 12:13:27 213.102.145.172
2014-03-06 12:18:24 83.177.253.118
2014-03-25 15:42:01 213.102.155.173
2014-03-25 15:55:47 213.101.185.223
2014-03-26 15:21:43 90.130.182.2
Can you please help me on this?
It's not entirely clear what the current code is attempting to do, but here is a hopefully useful refactoring which could be at least a starting point.
#!/usr/bin/env bash
find *.txt -print0 | while read -d $'\0' file;
do
while read date time ip; do
geo=$(geoiplookup -f "/usr/share/GeoIP/GeoLiteCity.dat" "$ip" |
cut -d' ' -f6,8-9)
addr=$(nslookup "$ip" | awk '/name/ {print $4}')
#addr=$(dig +short -x "$ip")
echo "$ip $geo $addr"
done <"$file" >"res/res-$file.txt"
done
My copy of nslookup does not output four fields but I assume that part of your script is correct. The output from dig +short is better suitable for machine processing, so maybe switch to that instead. Perhaps geoiplookup also offers an option to output machine-readable results, or maybe there is an alternative interface which does.
I assume it was a mistake that your script would output partially comma-separated, partially whitespace-separated results, so I changed that, too. Maybe you should use CSV or JSON instead if you intend for other tools to be able to read this output.
Trying to generate a file named res/res-$file.txt will only work if file is not in any subdirectory, so I'm guessing you will want to fix that with basename; or perhaps the find loop should be replaced with a simple for file in *.txt instead.

Using the output of awk as the list of names in a for loop

How can I pass the output of awk to a for file in loop?
for file in awk '{print $2}' my_file; do echo $file done;
my_file contains the name of the files whose name should be displayed (echoed).
I get just a
>
instead of my normal prompt.
Use backticks or $(...) to substitute the output of a command:
for file in $(awk '{print $2}' my_file)
do
echo "$file"
done
for file in $(awk '{print $2}' my_file); do echo "$file"; done
The notation to use is $(...) or Command Substitution.
for file in $(awk '{print $2}' my_file)
do
echo $file
done
Where I assume that you do more in the body of the loop than just echo since you could then leave the loop out altogether:
awk '{print $2}' my_file
Or, if you miss typing semicolons and don't like to spread code over multiple lines for readability, then you can use:
for file in $(awk '{print $2}' my_file); do echo $file; done
You will also find in (mostly older) code the backticks used:
for file in `awk '{print $2}' my_file`
do
echo $file
done
Quite apart from being difficult to use in the Markdown used to format comments (and questions and answers) on Stack Overflow, the backticks are not as friendly, especially when nested, so you should recognize them and understand them but not use them.
Incidentally, the reason you got the > prompt is that this command line:
for file in awk '{print $2}' my_file; do echo $file done;
is missing a semicolon before the done. The shell was still waiting for the done. Had you typed done and return, you would have seen the output:
awk done
{print $2} done
my_file done
Using backticks or $(awk ...) for command substitution is an acceptable solution for a small number of files; however, consider using xargs for single commands or pipes or a simple while read ... for more complex tasks (but it will work for simple ones too)
awk '...' |while read FILENAME; do
#do work with each file here using $FILENAME
done
This will allow processing to be done as each filename is processed instead of having to wait for the whole awk script to complete and allow for a larger set of filenames (you can only give so many args to a for x in ...; do) This will typically speed up your scripts and allow the same kinds of operations you would get in a for in loop without its limitations.

awk execute same command on different files one by one

Hi I have 30 txt files in a directory which are containing 4 columns.
How can I execute a same command on each file one by one and direct output to different file.
The command I am using is as below but its being applied on all the files and giving single output. All i want is to call each file one by one and direct outputs to a new file.
start=$1
patterns=''
for i in $(seq -43 -14); do
patterns="$patterns /cygdrive/c/test/kpi/SIGTRAN_Load_$(exec date '+%Y%m%d' --date="-${i} days ${start}")*"; done
cat /cygdrive/c/test/kpi/*$patterns | sed -e "s/\t/,/g" -e "s/ /,/g"| awk -F, 'a[$3]<$4{a[$3]=$4} END {for (i in a){print i FS a[i]}}'| sed -e "s/ /0/g"| sort -t, -k1,2> /cygdrive/c/test/kpi/SIGTRAN_Load.csv
Sth like this
for fileName in /path/to/files/foo*.txt
do
mangleFile "$fileName"
done
will mangle a list of files you give via globbing. If you want to generate the file name patterns as in your example, you can do it like this:
for i in $(seq -43 -14)
do
for fileName in /cygdrive/c/test/kpi/SIGTRAN_Load_"$(exec date '+%Y%m%d' --date="-${i} days ${start}")"*
do
mangleFile "$fileName"
done
done
This way the code stays much more readable, even if shorter solutions may exist.
The mangleFile of course then will be the awk call or whatever you would like to do with each file.
Use the following idiom:
for file in *
do
./your_shell_script_containing_the_above.sh $file > some_unique_id
done
You need to run a loop on all the matching files:
for i in /cygdrive/c/test/kpi/*$patterns; do
tr '[:space:]\n' ',\n' < "$i" | awk -F, 'a[$3]<$4{a[$3]=$4} END {for (i in a){print i FS a[i]}}'| sed -e "s/ /0/g"| sort -t, -k1,2 > "/cygdrive/c/test/kpi/SIGTRAN_Load-$i.csv"
done
PS: I haven't tried much to refactor your piped commands that can probably be shortened too.

Resources