Simple bash script using "set" command - bash

I am supposed to make a script that prints all sizes and file-names in the current directory, ordered by size, using the "set" command.
#!/bin/bash
touch /tmp/unsorted
IFS='#'
export IFS
ls -l | tr -s " " "#" | sed '1d' > /tmp/tempLS
while read line
do
##set probably goes here##
echo $5 $9 >> /tmp/unsorted
done < /tmp/tempLS
sort -n /tmp/unsorted
rm -rf /tmp/unsorted
By logic, this is the script that should work, but it produces only blank lines.
After discussion with my classmates, we think that the "set" command must go first in the while loop. The problem is that we cant understand what the "set" command does, and how to use it. Please help. Thank you.

ls -l | while read line; do
set - $line
echo $5 $9
done | sort -n
or simply
ls -l | awk '{print $5, $9}' | sort -n

Set manipulates shell variables. This allows you to adjust your current environment for specific situations, for example, to adjust current globbing rules.
Sometimes it is necessary to adjust the environment in a script, so that it will have an option set correctly later on. Since the script runs in a subshell, the options you adjust will have no effect outside of the script.
This link has a vast amount of info on the various commands and options available.

Related

give a file without changing the name in script [duplicate]

This question already has answers here:
How to pass parameters to a Bash script?
(4 answers)
Closed 1 year ago.
At the beginning I have a file.txt, which contains several informations that I will take using the grep command as you see in the script.
What I want is to give the script the file I want instead of file.txt but without changing the file name each time in the script for example if the file is named Me.txt I don’t want to go into the script and write Me.txt in each grep command especially if I have dozens of orders.
Is there a way to do this?
#!/bin/bash
grep teste file.txt > testline.txt
awk '{print $2}' testline.txt > test.txt
echo '#'
echo '#'
grep remote file.txt > remoteline.txt
awk '{print $3}' remoteline.txt > remote.txt
echo '#'
echo '#'
grep adresse file.txt > adresseline.txt
awk '{print $2}' adresseline.txt > adresse.txt
Using a parameter, as many contributors here suggested, is of course the obvious approach, and the one which is usually taken in such case, so I want to extend this idea:
If you do it naively as
filename=$1
you have to supply the name on every invocation. You can improve on this by providing a default value for the case the parameter is missing:
filename=${1:-file.txt}
But sometimes you are in a situation, where for some time (working on a specific task), you always need the same filename over and over, and the default value happens to be not the one you need. Another possibility to pass information to a program is via the environment. If you set the filename by
filename=${MOOFOO:-file.txt}
it means that - assuming your script is called myscript.sh - if you invoke your script by
MOOFOO=myfile.txt myscript.sh
it uses myfile.txt, while if you call it by
myscript.sh
it uses the default file.txt. You can also set MOOFOO in your shell, as
export MOOFOO=myfile.txt
and then, even a lone execution of
myscript.sh
with use myfile.txt instead of the default file.txt
The most flexible approach is to combine both, and this is what I often do in such a situation. If you do in your script a
filename=${1:-${MOOFOO:-file.txt}}
it takes the name from the 1st parameter, but if there is no parameter, takes it from the variable MOOFOO, and if this variable is also undefined, uses file.txt as the last fallback.
You should pass the filename as a command line parameter so that you can call your script like so:
script <filename>
Inside the script, you can access the command line parameters in the variables $1, $2,.... The variable $# contains the number of command line parameters passed to the script, and the variable $0 contains the path of the script itself.
As with all variables, you can choose to put the variable name in curly brackets which has advantages sometimes: ${1}, ${2}, ...
#!/bin/bash
if [ $# = 1 ]; then
filename=${1}
else
echo "USAGE: $(basename ${0}) <filename>"
exit 1
fi
grep teste "${filename}" > testline.txt
awk '{print $2}' testline.txt > test.txt
echo '#'
echo '#'
grep remote "${filename}" > remoteline.txt
awk '{print $3}' remoteline.txt > remote.txt
echo '#'
echo '#'
grep adresse "${filename}" > adresseline.txt
awk '{print $2}' adresseline.txt > adresse.txt
By the way, you don't need two different files to achieve what you want, you can just pipe the output of grep straight into awk, e.g.:
grep teste "${filename}" | awk '{print $2}' > test.txt
but then again, awk can do the regex match itself, reducing it all to just one command:
awk '/teste/ {print $2}' "${filename}" > test.txt

How to parse the output of `ls -l` into multiple variables in bash?

There are a few answers on this topic already, but pretty much all of them say that it's bad to parse the output of ls -l, and therefore suggest other methods.
However, I'm using ncftpls -l, and so I can't use things like shell globs or find – I think I have a genuine need to actually parse the ls -l output. Don't worry if you're not familiar with ncftpls, the output returns in exactly the same format as if you were just using ls -l.
There is a list of files at a public remote ftp directory, and I don't want to burden the remote server by re-downloading each of the desired files every time my cronjob fires. I want to check, for each one of a subset of files within the ftp directory, whether the file exists locally; if not, download it.
That's easy enough, I just use
tdy=`date -u '+%Y%m%d'`_
# Today's files
for i in $(ncftpls 'ftp://theftpserver/path/to/files' | grep ${tdy}); do
if [ ! -f $i ]; then
ncftpget "ftp://theftpserver/path/to/files/${i}"
fi
done
But I came upon the issue that sometimes the cron job will download a file that hasn't finished uploading, and so when it fires next, it skips the partially downloaded file.
So I wanted to add a check to make sure that for each file that I already have, the local file size matches the size of the same file on the remote server.
I was thinking along the lines of parsing the output of ncftpls -l and using awk, something like
for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
...
x=filesize # somehow get the file size and the filename
y=filename # from $i on each iteration and store in variables
...
done
but I can't seem to get both the filename and the filesize from the server into local variables on the same iteration of the loop; $i alternates between $9 and $5 in the awk string with each iteration.
If I could manage to get the filename and filesize into separate variables with each iteration, I could simply use stat -c "%s" $i to get the local size and compare it with the remote size. Then its a simple ncftpget on each remote file that I don't already have. I tinkered with syncing programs like lftp too, but didn't have much luck and would rather do it this way.
Any help is appreciated!
for loop splits when it sees any whitespace like space, tab, or newline. So, IFS is needed before loop, (there are a lot of questions about ...)
IFS=$'\n' && for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
echo $i | awk '{print $NF}' # filesize
echo $i | awk '{NF--; print}' # filename
# you may have spaces in filenames, so is better to use last column for awk
done
The better way I think is to use while not for, so
ls -l | while read i
do
echo $i | awk '{print $9, $5}'
#split them if you want
x=echo $i | awk '{print $5}'
y=echo $i | awk '{print $9}'
done

Loop over list of files to merge according their names

Files in the directory look like that:
A_1_email.txt
A_1_phone.txt
A_2_email.txt
A_2_phone.txt
B_1_email.txt
B_1_phone.txt
B_2_email.txt
B_2_phone.txt
What I want:
To merge files A_1_email.txt and A_1_phone.txt; to merge files B_1_email.txt and B_1_phone.txt and so on.
What I mean by that: if first to flags of files names matches (for example A to A; 1 to 1) than merge files.
How I tried to do this:
ls * | cut -d "_" -f 1-2 | sort | uniq -c | awk '{print $2}' > names && for name in
$(cat names); do
And I am lost here, really don't know how should I go on further.
The following are based on #MichaelJ.Barber's answer (which had the excellent idea of using join), but with the specific intention to avoid the dangerous practice of parsing the output of ls:
# Simple loop: avoids subshells, pipelines
for file in *_email.txt; do
if [[ -r "$file" && -r "${file%_*}_phone.txt" ]]; then
join "$file" "${file%_*}_phone.txt"
fi
done
or
##
# Use IFS and a function to avoid contaminating the global environment.
joinEmailPhone() {
local IFS=$'\n'
local -x LC_COLLATE=C # Ensure glob expansion sorting makes sense.
# According to `man (1) bash`, globs expand sorted "alphabetically".
# If we use LC_COLLATE=C, we don't need to sort again.
# Use an awk test (!seen[$0]++) to ensure uniqueness and a parameter expansion instead of cut
awk '!seen[$0]++{ printf("join %s_email.txt %s_phone.txt\n", $1, $1) }' <<< "${*%_*}" | sh
}
joinEmailPhone *
But in all probability (again assuming LC_COLLATE=C) you can get away with:
printf 'join %s %s\n' * | sh
I'll assume that the files all have tab-separated name-value pairs, where the value is email or phone as appropriate. If that's not the case, do some pre-sorting or otherwise modify as appropriate.
ls *_{email,phone}.txt |
cut -d "_" -f1-2 | # could also do this with variable expansion
sort -u |
awk '{ printf("join %s_email.txt %s_phone.txt\n", $1, $1) }' |
sh
What this does is to identify the unique prefixes for the files and use 'awk' to generate shell commands for joining the pairs, which are then piped into sh to actually run the commands.
You may use printf '%s\n' *_{email,phone}.txt | ... instead of ls *-... in the given scenario, i. e. no newline chars in file path names are to be expected. At least one external command less!
Use a for loop to iterate over the email files, using the read command
with the proper value of IFS to split the file name into the necessary parts.
Note that this does use one non-POSIX feature that bash provides, namely
using a here-string (<<<) to pass the value of $email to the read command.
for email in *_email.txt; do
IFS=_ read fst snd <<< $email
phone=${fst}_${snd}_phone.txt
# merge $email and $phone
done

Reading a file line by line in ksh

We use some package called Autosys and there are some specific commands of this package. I have a list of variables which i like to pass in one of the Autosys commands as variables one by one.
For example one such variable is var1, using this var1 i would like to launch a command something like this
autosys_showJobHistory.sh var1
Now when I launch the below written command, it gives me the desired output.
echo "var1" | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
But if i put the var1 in a file say Test.txt and launch the same command using cat, it gives me nothing. I have the impression that command autosys_showJobHistory.sh does not work in that case.
cat Test.txt | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
What I am doing wrong in the second command ?
Wrote all of below, and then noticed your grep statement.
Recall that ksh doesn't support .. as an indicator for 'expand this range of values'. (I assume that's your intent). It's also made ambiguous by your lack of quoting arguments to grep. If you were using syntax that the shell would convert, then you wouldn't really know what reg-exp is being sent to grep. Always better to quote argments, unless you know for sure that you need the unquoted values. Try rewriting as
grep '1[1-6]:[0-9][0-9]' | grep '24.12.2012'
Also, are you deliberately using the 'match any char' operator '.' OR do you want to only match a period char? If you want to only match a period, then you need to escape it like \..
Finally, if any of your files you're processing have been created on a windows machine and then transfered to Unix/Linux, very likely that the line endings (Ctrl-MCtrl-J) (\r\n) are causing you problems. Cleanup your PC based files (or anything that was sent via ftp) with dos2unix file [file2 ...].
If the above doesn't help, You'll have to "divide and conquer" to debug your problem.
When I did the following tests, I got the expected output
$ echo "var1" | while read line ; do print "line=${line}" ; done
line=var1
$ vi Test.txt
$ cat Test.txt
var1
$ cat Test.txt | while read line ; do print "line=${line}" ; done
line=var1
Unrelated to your question, but certain to cause comment is your use of the cat commnad in this context, which will bring you the UUOC award. That can be rewritten as
while read line ; do print "line=${line}" ; done < Test.txt
But to solve your problem, now turn on the shell debugging/trace options, either by changing the top line of the script (the shebang line) like
#!/bin/ksh -vx
Or by using a matched pair to track the status on just these lines, i.e.
set -vx
while read line; do
print -u2 -- "#dbg: Line=${line}XX"
autosys_showJobHistory.sh $line \
| grep 1[1..6]:[0..9][0..9] \
| grep 24.12.2012 \
| tail -1
done < Test.txt
set +vx
I've added an extra debug step, the print -u2 -- .... (u2=stderror, -- closes option processing for print)
Now you can make sure no extra space or tab chars are creeping in, by looking at that output.
They shouldn't matter, as you have left your $line unquoted. As part of your testing, I'd recommend quoting it like "${line}".
Then I'd comment out the tail and the grep lines. You want to see what step is causing this to break, right? So does the autosys_script by itself still produce the intermediate output you're expecting? Then does autosys + 1 grep produce out as expected, +2 greps, + tail? You should be able to easily see where you're loosing your output.
IHTH

how to print user1 from user1#10.129.12.121 using shell scripting or sed

I wanted to print the name from the entire address by shell scripting. So user1#12.12.23.234 should give output "user1" and similarly 11234#12.123.12.23 should give output 11234
Reading from the terminal:
$ IFS=# read user host && echo "$user"
<user1#12.12.23.234>
user1
Reading from a variable:
$ address='user1#12.12.23.234'
$ cut -d# -f1 <<< "$address"
user1
$ sed 's/#.*//' <<< "$address"
user1
$ awk -F# '{print $1}' <<< "$address"
user1
Using bash in place editing:
EMAIL='user#server.com'
echo "${EMAIL%#*}
This is a Bash built-in, so it might not be very portable (it won't run with sh if it's not linked to /bin/bash for example), but it is probably faster since it doesn't fork a process to handle the editing.
Using sed:
echo "$EMAIL" | sed -e 's/#.*//'
This tells sed to replace the # character and as many characters that it can find after it up to the end of line with nothing, ie. removing everything after the #.
This option is probably better if you have multiple emails stored in a file, then you can do something like
sed -e 's/#.*//' emails.txt > users.txt
Hope this helps =)
I tend to use expr for this kind of thing:
address='user1#12.12.23.234'
expr "$address" : '\([^#]*\)'
This is a use of expr for its pattern matching and extraction abilities. Translated, the above says: Please print out the longest prefix of $address that doesn't contain an #.
The expr tool is covered by Posix, so this should be pretty portable.
As a note, some historical versions of expr will interpret an argument with a leading - as an option. If you care about guarding against that, you can add an extra letter to the beginning of the string, and just avoid matching it, like so:
expr "x$address" : 'x\([^#]*\)'

Resources