Bash comparing values - bash

I'm getting the size of a file from a remote webserver and saving the results to a var called remote I get this using:
remote=`curl -sI $FILE | grep -i Length | awk '/Content/{print $(NF-0)}'`
Once I've downloaded the file I'm getting the local files size with:
local=`stat --print="%s" $file`
If I echo remote and local they contain the same value.
I'm trying to run an if statement for this
if [ "$local" -ne "$remote" ]; then
But it always shows the error message, and never advises they match.
Can someone advise what I'm doing wrong.
Thanks

curl's output uses the network format for text, meaning that lines are terminated by a carriage return followed by linefeed; unix tools (like the shell) expect lines to end with just linefeed, so they treat the CR as part of the content of the line, and often get confused. In this case, what's happening is that the remote variable is getting the content length and a CR, which isn't valid in a numeric expression, hence errors. There are many ways to strip the CR, but in this case it's probably easiest to have awk do it along with the field extraction:
remote=$(curl -sI "$remotefile" | grep -i Length | awk '/Content/{sub("\r","",$NF); print $NF}')
BTW, I also took the liberty of replacing backticks with $( ) -- this is easier to read, and doesn't have some oddities with escapes that backticks have, so it's the preferred syntax for capturing command output. Oh, and (NF-0) is equivalent to just NF, so I simplified that. As #Jason pointed out in a comment, it's safest to use lower- or mixed-case for variable names, and put double-quotes around references to them, so I did that by changing $FILE to "$remotefile". You should do the same with the local filename variable.
You could also drop the grep command and have awk search for /^Content-Length:/ to simplify it even further.

Related

How can I make awk print and awk printf work the way they used to? [duplicate]

This question already has answers here:
Why does my tool output overwrite itself and how do I fix it?
(3 answers)
Closed 6 months ago.
I'm migrating many bash shell scripts from old versions of raspbian and ubuntu to the current raspbian version. I've made a brand new system installation, including various configuration (text) files that I've created for these scripts. I found to my horror that awk-print and awk-printf APPEAR to have changed in the latest version, as evidenced by bash variable-type errors when the values are used. What's going on ?
Now that I know the answer, I can explain what happened so others can avoid it. That's why I said, awk-print APPEARS to have changed. It didn't, as I discovered when I checked the version of awk on all three machines. Running:
awk -W version
on all three systems gave the same version, mawk 1.3.3 Nov 1996.
When a text file is small, I find it the simplest to cat the file to a variable, grep that variable for a keyword that identifies a particular line and by extension a particular variable, and use 'tr' and 'awk print' to split the line and assign the value to a variable. Here's an example line from which I want to assign '5' to a variable:
"keyword=5"<line terminator>
That line is one of several read from a text file, so there's at least one line terminator after each line. That line terminator is the key to the problem.
I execute the following commands to read the file, find the line with 'keyword', split the line at '=', and assign the value from that line to bar:
file_contents="$(cat "$filename")"
bar="$(echo -e "$file_contents" | grep "keyword" | tr "=" " " | awk '{print $2}')"
Here's the subtle part. Unknownst to me, in the process of creating a new system, the line terminators in some of my text files changed from linux format, with a single line terminator (\n), to DOS format, with two line terminators (\n\r), for each line, when I set up the new system. When, working from the keyboard, I grepped the text file to get the desired line, this caused the value that awk-print assigned to 'bar' to have a line terminator (\r). This terminator does NOT appear on screen because bash supplies one. It's only evident if one executes:
echo ${#bar}
to get the length of the string, or does:
echo -e "$bar"
The hidden terminator shows up as one additional character.
So, the solution to the problem was either to use 'fromdos' to remove the second line terminator before processing the files, or to remove the unwanted '\r' that was being assigned to each variable. One helpful comment noted that 'cat -vE $file" would show every character in the file. Sure enough, the dual terminators were present.
Another helpful comment noted that using I was causing multiple sub-processes to run when I parsed each line, slowing execution time, and that a bashism:
${foo//*=/}
could avoid it. That bashism helped parse each line but did not remove the offending '\r'. A second bashism:
${foo//$'\r'/}
removed that '\r'.
CASE SOLVED
#!/bin/sh -x
echo "value=5" | tr "=" "\n" > temp
echo "1,2p" | ed -s temp
I have come to view Ed as UNIX's answer to the lightsaber.
I found a format string, '"%c", $2' to use with printf in the current
awk, but I have to use '"%s", $2 in the old version. Note '%c' vs
'%s'.
%c behavior does depend on type of argument you feed - if it is numeric you will get character corresponding to given ASCII code, if it is string you will get first character of it, example
mawk 'BEGIN{printf "%c", 42}' emptyfile
does give output
*
and
mawk 'BEGIN{printf "%c", "HelloWorld"}' emptyfile
does give output
H
Apparently your 2nd field is digit and some junk characters, which is considered to be string, thus second option is used. But is taking first character correct action in all your use-cases? Is behavior compliant with requirement for multi-digit numbers, e.g. 555?
(tested in mawk 1.3.3)
I found the problem thanks to several of the responses. It's rudimentary, I know, but I grepped a text file to extract a line with a keyword. I used tr to split the line and awk print to extract one argument, a numeric value, from that. That text file, once copied to the new machine, had a CR LF at the end of each line. Originally, it just had a newline character, which worked fine. But with the CR LF pair, every numeric value that I assigned to a variable using awk print had a newline character. This was not obvious onscreen, caused every arithmetic statement and numeric IF statement using it to fail, and caused the issues I reported about awk print.

How do I trim whitespace, but not newlines, from subshell output in bash?

There are many tens, maybe a hundred or more previous questions that seem "identical" to this already here, but after extensive search, I found NOTHING that even came close to working - though I did learn quite a lot - and so I decided to just RTFM and figure this out on my own.
The Problem
I wanted to search the output of a ps auxwww command to find processes of interest, and the issue was that I can't just simply use cut to find the exact data from them that I wanted. ps, it turns out, tries to columnate the output, adding either extra spaces or tabs that get in the way of using cut to get the correct data.
So, since I'm not a master at bash, I did a search... The answers I found were all focused on either variables - a "backup strategy" from my point of view that itself didn't solve the whole problem - or they only trimmed leading or trailing space or all "whitespace" including newlines. NOPE, Won't Work For Cut! And, neither will removing trailing newlines and so forth.
So, restated, the question is, how do we efficiently end up with the white space defined as simply a single space between other characters without eliminating newlines?
Below, I will give my answer, but I welcome others to give theirs - who knows, maybe someone has a better answer?!
Answer:
At least MY answer - please leave your own, too! - was to do this:
ps auxwww | grep <program> | tr -s [:blank:] | cut -d ' ' -f <field_of_interest>
This worked great!
Obviously, there are many ways to adapt this to other needs.
As an alternative to all of the pipes and grep with cut, you could simply use awk. The benefit of using awkwith the default field-separator (FS) being set to break on whitespace is that it considers any number of whitespace between fields as a single separator.
So using awk will do away with needing to use tr -s to "squeeze" whitespace to define fields. Further, awk gives far greater control over field matching using regular expressions rather than having to rely on grep of a full line and cut to locate a pre-determined field numbers. (though to some extent you will still have to tell awk what field out of the ps command you are interested in)
Using bash, you can also eliminate the pipe | by using process substitution to send the output of ps auxwww to awk on stdin using redirection, e.g. awk ... < <(ps auxwww) for a single tidy command line.
To get your "program" and "file_of_interest" into awk you have two options. You can initialize awk variables using the -v var=value option (there can be multiple -v otions given), or you can use the BEGIN rule to initialize the variables. The only difference being with -v you can provide a shell variable for value and there is no whitespace allowed surrounding the = sign, while within BEGIN any whitespace is ignored.
So in your case a couple of examples to get the virtual memory size for firefox processes, you could use:
awk -v prog="firefox" -v fnum="5" '
$11 ~ prog {print $fnum}
' < <(ps auxwww)
(above if you had myprog=firefox as a shell variable, you could use -v prog="$myprog" to initialize the prog variable for awk)
or using the BEGIN rule, you could do:
awk 'BEGIN {prog = "firefox"; fnum = "5"}
$11 ~ prog {print $fnum }
' < <(ps auxwww)
In each command above, it locates the COMMAND field from ps (field 11) and checks whether it contains firefox and if so it outputs field no. 5 the virtual memory size used by each process.
Both work fine as one-liners as well, e.g.
awk -v prog="firefox" -v fnum="5" '$11 ~ prog {print $fnum}' < <(ps auxwww)
Don't get me wrong, the pipeline is perfectly fine, it will just be slow. For short commands with limited output there won't be much difference, but when the output is large, awk will provide orders of magnitude improvement over having to tr and grep and cut reading over the same records three times.
The reason being, the pipes and the process on each side requires separate processes be spawned by the shell. So minimizes their use, improves the efficiency of what your script is doing. Now if the data is small as are the processes, there isn't much of a difference. However if you are reading a 3G file 3 times over -- that's is the difference in orders of magnitude. Hours verses minutes or seconds.
I had to use single quotes on CentosOS Linux to get tr working like described above:
ps -o ppid= $$ | tr -d '[:space:]'
You can reduce the number of pipes using this Perl one-liner, which uses Perl regexes instead of a separate grep process. This combines grep, tr and cut in a single command, with an easy way to manipulate the output (#F is the array of fields, 0-indexed):
Examples:
# Start an example process to provide the input for `ps` in the next commands:
/Applications/Emacs.app/Contents/MacOS/Emacs-x86_64-10_14 --geometry 109x65 /tmp/foo &
# Print single space-delimited output of `ps` for all emacs processes:
ps auxwww | perl -lane 'print "#F" if $F[10] =~ /emacs/i'
# Prints:
# bar 72144 0.0 0.5 4610272 82320 s006 SN 11:15AM 0:01.31 /Applications/Emacs.app/Contents/MacOS/Emacs-x86_64-10_14 --geometry 109x65 /tmp/foo
# Print emacs PID and file name opened with emacs:
ps auxwww | perl -lane 'print join "\t", #F[1, -1] if $F[10] =~ /emacs/i'
# Prints:
# 72144 /tmp/foo
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)

bash for with awk command inside

i have this piece of code of a bash script
for file in "$(ls | grep .*.c)"
do
cat $file |awk '/.*open/{print $0}'|awk -v nomeprog=$file 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
done
this give me this error :
*awk: cmd. line:1: file.c
awk: cmd. line:1: ^ syntax error
*i have this error when i have more of a file c into the folder , with just 1 file it works
Overall, you should probably follow Charles Duffy's recommendation to use more appropriate tools for the task. But I'd like to go over why the current script isn't working and how to fix it, as a learning exercise.
Also, two quick recommendations for shell script checking & troubleshooting: run your scripts through shellcheck.net to point out common mistakes, and when debugging put set -x before the problem section (and set +x after), so the shell will print out what it thinks is going on as the script runs.
The problem is due to how you're using the file variable. Let's look at what this does:
for file in "$(ls | grep .*.c)"
First, ls prints a list of files in the current directory, one per line. ls is really intended for interactive use, and its output can be ambiguous and hard to parse correctly; in a script, there are almost always better ways to get lists of filenames (and I'll show you one in a bit).
The output of ls is piped to grep .*.c, which is wrong in a number of ways. First, since that pattern contains a wildcard character ("*"), the shell will try to expand it into a list of matching filenames. If the directory contains any hidden (with a leading ".") .c files, it'll replace it with a list of those, and nothing is going to work at all right. Always quote the pattern argument to grep to prevent this.
But the pattern itself (".*.c") is also wrong; it searches for any number of arbitrary characters (".*"), followed by a single arbitrary character ("." -- this is in a regex, so "." is not treated literally), followed by a "c". And it searches for this anywhere in the line, so any filename that contains a "c" somewhere other than the first position will match. The pattern you want would be something like '[.]c$' (note that I wrapped it in single-quotes, so the shell won't try to treat $ as a variable reference like it would in double-quotes).
Then there's another problem, which is (part of) the problem you're actually experiencing: the output of that ls | grep is expanded in double-quotes. The double-quotes around it tell the shell not to do its usual word-split-and-wildcard-expand thing on the result. The common (but still wrong) thing to do here is to leave off the double-quotes, because word-splitting will probably break the list of filenames up into individual filenames, so you can iterate over them one-by-one. (Unless any filenames contain funny characters, in which case it can give weird results.) But with double-quotes it doesn't split them, it just treats the whole thing as one big item, so your loop runs once with file set to "src1.c\nsrc2.c\nsrc3.c" (where the \n's represent actual newlines).
This is the sort of trouble you can get into by parsing ls. Don't do it, just use a shell wildcard directly:
for file in *.c
This is much simpler, avoids all the confusion about regex pattern syntax vs wildcard pattern syntax, ambiguity in ls's output, etc. It's simple, clear, and it just works.
That's probably enough to get it to work for you, but there are a couple of other things you really should fix if you're doing something like this. First, you should double-quote variable references (i.e. use "$file" instead of just $file). This, is another part of the error you're getting; look at the second awk command:
awk -v nomeprog=$file 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
With file set to "src1.c\nsrc2.c\nsrc3.c", the shell will do its word-split-and-wildcard-expand thing on it, giving:
awk -v nomeprog=src1.c src2.c src3.c 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
awk will thus set its nomeprog variable to "src1.c", and then try to run "src2.c" as an awk command (on input files named "src3.c" and "BEGIN{FS=..."). "src2.c" is, of course, not a valid awk command, so you get syntax error.
This sort of confusion is typical of the chaos that can result from unquoted variable references. Double-quote your variable references.
The other thing, which is much less important, is that you have a useless use of cat. Anytime you have the pattern:
cat somefile | somecommand
(and it's just a single file, not several that need to be catenated together), you should just use:
somecommand <somefile
and in some cases like awk and grep, the command itself can take input filename(s) directly as arguments, so you can just use:
somecommand somefile
so in your case, rather than
cat "$file" | awk '/.*open/{print $0}' | awk -v nomeprog="$file" 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
I'd just use:
awk '/.*open/{print $0}' "$file" | awk -v nomeprog="$file" 'BEGIN{FS="(";printf "the file e %s with the open call:", nameprog}//{ print $2}'
(Although, as Charles Duffy pointed out, even that can be simplified quite a lot.)

grep pipe searching for one word, not line

For some reason I cannot get this to output just the version of this line. I suspect it has something to do with how grep interprets the dash.
This command:
admin#DEV:~/TEMP$ sendemail
Yields the following:
sendemail-1.56 by Brandon Zehm
More output below omitted
The first line is of interest. I'm trying to store the version to variable.
TESTVAR=$(sendemail | grep '\s1.56\s')
Does anyone see what I am doing wrong? Thanks
TESTVAR is just empty. Even without TESTVAR, the output is empty.
I just tried the following too, thinking this might work.
sendemail | grep '\<1.56\>'
I just tried it again, while editing and I think I have another issue. Perhaps im not handling the output correctly. Its outputting the entire line, but I can see that grep is finding 1.56 because it highlights it in the line.
$ TESTVAR=$(echo 'sendemail-1.56 by Brandon Zehm' | grep -Eo '1.56')
$ echo $TESTVAR
1.56
The point is grep -Eo '1.56'
from grep man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
Your regular expression doesn't match the form of the version. You have specified that the version is surrounded by spaces, yet in front of it you have a dash.
Replace the first \s with the capitalized form \S, or explicit set of characters and it should work.
I'm wondering: In your example you seem to know the version (since you grep for it), so you could just assign the version string to the variable. I assume that you want to obtain any (unknown) version string there. The regular expression for this in sed could be (using POSIX character classes):
sendemail |sed -n -r '1 s/sendemail-([[:digit:]]+\.[[:digit:]]+).*/\1/ p'
The -n suppresses the normal default output of every line; -r enables extended regular expressions; the leading 1 tells sed to only work on line 1 (I assume the version appears in the first line). I anchored the version number to the telltale string sendemail- so that potential other numbers elsewhere in that line are not matched. If the program name changes or the hyphen goes away in future versions, this wouldn't match any longer though.
Both the grep solution above and this one have the disadvantage to read the whole output which (as emails go these days) may be long. In addition, grep would find all other lines in the program's output which contain the pattern (if it's indeed emails, somebody might discuss this problem in them, with examples!). If it's indeed the first line, piping through head -1 first would be efficient and prudent.
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail
sendemail-1.56 by Brandon Zehm
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail | cut -f2 -d "-" | cut -f1 -d" "
1.56

How to take string from a file name and use it as an argument

If a file name is in this format
assignment_number_username_filename.extension
Ex.
assignment_01_ssaha_homework1.txt
I need to extract just the username to use it in the rest of the script.
How do I take just the username and use it as an argument.
This is close to what I'm looking for but not exactly:
Extracting a string from a file name
if someone could explain how sed works in that scenario that would be just as helpful!
Here's what I have so far; I haven't used cut in a while so I'm getting error messages while trying to refresh myself.
#!/bin/sh
a = $1
grep $a /home | cut -c 1,2,4,5 echo $a`
You probably need command substitution, plus echo plus sed. You need to know that sed regular expressions can remember portions of the match. And you need to know basic regular expressions. In context, this adds up to:
filename="assignment_01_ssaha_homework1.txt"
username=$(echo "$file" | sed 's/^[^_]*_[^_]*_\([^_]*\)_[^.]*\.[^.]*$/\1/')
The $(...) notation is command substitution. The commands in between the parentheses are run and the output is captured as a string. In this case, the string is assigned to the variable username.
In the sed command, the overall command applies a particular substitution (s/match/replace/) operation to each line of input (here, that will be one line). The [^_]* components of the regular expression match a sequence of (zero or more) non-underscores. The \(...\) part remembers the enclosed regex (the third sequence of non-underscores, aka the user name). The switch to [^.]* at the end recognizes the change in delimiter from underscore to dot. The replacement text \1 replaces the entire name with the remembered part of the pattern. In general, you can have several remembered subsections of the pattern. If the file name does not match the pattern, you'll get the input as output.
In bash, there are ways of avoiding the echo; you might well be able to use some of the more esoteric (meaning 'not available in other shells') mechanisms to extract the data. That will work on the majority of modern POSIX-derived shells (Korn, Bash, and others).
filename="assignment_01_ssaha_homework1.txt"
username=$(echo "$file" | awk -F_ '{print $3}')
Just bash:
filename="assignment_01_ssaha_homework1.txt"
tmp=${filename%_*}
username=${tmp##*_}
http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion

Resources