backslash not last character on line - tcsh

I cannot run simple regular expression commands in a Linux lab. Because I do not understand the root of the problem, I can only provide command-line examples. The following two examples work:
% fgrep -h TestVar */config_file.txt
% sed -n /TestVar/p */config_file.txt
However the following two examples do not work:
% awk '/TestVar/ { print }' */config_file.txt
% perl -ne 'if (/TestVar/) { print }' */config_file.txt
Both errors indicate a backslash in the wrong place. Specifically, the very simple AWK command outputs the following:
awk: ^ backslash not last character on line
If I simplify the AWK command further, it will work as follows:
% awk /TestVar/ */config_file.txt
If I temporarily switch my login shell from TCSH (the default) to BASH, all the commands work. Last, if I run the AWK or PERL commands on a single file, the REGEX command also works in the original TCSH shell.
The simple example script is something I would never use in my day-to-day operations. I am typically parsing through or editing multiple files using much more complex REGEX commands. The issue is I have been unable to get any variation of my day-to-day REGEX commands to work in AWK or PERL, using the lab default TCSH shell. My REGEX commands work in all other labs, which also use TCSH. So there must be something specific to this lab environment that is causing the issue.
Any help would be greatly appreciated!

Related

sed substitution: substitute string is a variable needing expansion AND contains slashes

I am fighting with sed to do a substitution where the substitute string contains slashes. This general topic has been discussed on stack overflow before. But, AFAICT, I have anew wrinkle that hasn't been addressed in previous questions.
Let's say I have a file, ENVIRO.tpml, which has several lines, one of which is
Loaded modules: SUPPLY_MODULES_HERE
I want to replace SUPPLY_MODULES_HERE in an automated fashion with a list of loaded modules. (At this point, if anyone has a better way to do this than sed, please let me know!) My first effort here is to define an environment variable and use sed to put it into the file:
> modules=$(module list 2>&1)
> sed "s/SUPPLY_MODULES_HERE/${modules}/" ENVIRO.tmpl > ENVIRO.txt
(The 2>&1 being needed because module list sends its output to STDERR, for reasons I can't begin to understand.) However, as is often the case, the modules have slashes in them. For example
> echo ${modules}
gcc/9.2.0 mpt/2.20
The slashes kill my command because sed can't understand the expression and thinks my substitution command is "unterminated".
So I do the usual thing and use some other character for the command delimiter:
> modules=$(module list 2>&1)
> sed "s|SUPPLY_MODULES_HERE|${modules}|" ENVIRO.tmpl > ENVIRO.txt
and I still get an "unterminated 's'" error.
So I replace double quotes with single quotes:
> sed 's|SUPPLY_MODULES_HERE|${modules}|' ENVIRO.tmpl > ENVIRO.txt
and now I get no error, but the line in ENVIRO.txt looks like
Loaded modules: ${modules}
Not what I was hoping for.
So, AFAICT, I need double quotes to expand the variable, but I need single quotes to make the alternative delimiters work. But I need both at the same time. How do I get this?
UPDATE: Gordon Davisson's comment below got to the root of the matter: "echo ${modules} can be highly misleading". Examining $modules with declare -p shows that it actually has a newline (or, more generally, some kind of line break) in it. What I did was add an extra step to extract newlines out of the variable. With that change, everything worked fine. An alternative would be to convince sed to expand the variable with line breaks and substitute it as such into the text, but I haven't been able to make that work. Any takers?
sed is not the best tool here due to use of regex and delimiters.
Better to use awk command that doesn't require any regular expression.
awk -v kw='SUPPLY_MODULES_HERE' -v repl="$(module list 2>&1)" '
n = index($0, kw) {
$0 = substr($0, 1, n-1) repl substr($0, n+length(kw))
} 1
' file
index function uses plain string search in awk.
substr function is used to get substring before and after the search keyword.

How does : # use perl eval 'exec perl -S $0 ${1+"$#"}' set the perl version?

I have to debug an old perl script which, apparently, is setting the version of perl in a way which I do not understand...
: # use perl
eval 'exec perl -S $0 ${1+"$#"}'
if 0;
perl -h says that -S means "look for programfile using PATH environment variable". $0 is the current program. And I've read that $# means "The Perl syntax error message from the last eval command." But why are they adding 1 to that ? And how does this all fit together ? What is it doing?
Part of what I have to debug has to do with the fact that it's picking an older version of perl that I don't want. For everything else, we use #!/usr/bin/env perl which, I suspect, may be doing the same thing. I also suspect that my solution may lie in fixing $PATH (or preventing the code that's goofing it up from goofing it up). But I'd like to go at this with a better understanding of how it's picking the version now.
Thanks for any help !
That's intended to run whatever version of perl is first in your path by treating the script first as a shell script that then executes perl. In this context, ${1+"$#"} is the arguments (if any) passed to the script.
From the bash manual:
${parameter:+word}
If parameter is null or unset, nothing is substituted, otherwise the expansion of word is substituted.
and
Omitting the colon results in a test only for a parameter that is unset
There's a similar example in perlrun:
This example works on many platforms that have a shell compatible with Bourne shell:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -wS $0 ${1+"$#"}'
if $running_under_some_shell;
The system ignores the first line and feeds the program to /bin/sh, which proceeds to try to execute the Perl program as a shell script. The shell executes the second line as a normal shell command, and thus starts up the Perl interpreter. On some systems $0 doesn't always contain the full pathname, so the -S tells Perl to search for the program if necessary. After Perl locates the program, it parses the lines and ignores them because the variable $running_under_some_shell is never true. If the program will be interpreted by csh, you will need to replace ${1+"$#"} with $*, even though that doesn't understand embedded spaces (and such) in the argument list. To start up sh rather than csh, some systems may have to replace the #! line with a line containing just a colon, which will be politely ignored by Perl.
Using /usr/bin/env is another way to do the same thing, yes.

Bash grep different outputs

I have a funny issue with grep. Basically, I am trying to match certain control characters in a file and get the count.
grep -ocbUaE $"\x07\|\x08\|\x0B\|\x0C\|\x1A\|\x1B" <file>
Funny enough, in CLI it matches all control characters and returns the correct count, but if I use it in a bash script, it doesn't match anything.
Any ideas what I am doing wrong?
Tested on: MacOS and CentOS - same issue.
Thank you for your help!
I think you should change your command to:
grep -cUaE $'[\x07\x08\x0B\x0C\x1A\x1B]' file
I removed the extra output flags, which get ignored when -c is present. I assume that you include -U and -a for a reason.
The other changes are to use $'' with single quotes (you don't want a double-quoted string here), and replace your series of ORs with a bracket expression, which matches if any one of the characters match.
Note that C-style strings $'' don't work in all shells, so if you want to use bash you should call your script like bash script.sh and/or include the shebang #!/bin/bash if it is executable. sh script.sh does not behave in the same way as bash script.sh.

Error with AWK command in ubuntu terminal

I am new in linux and AWK as well. I have a file in my home folder called testing.txt and i am trying to read the file using this awk command:
**arjun#arjun-Aspire-4741:~$ awk ´{print $1}´ testing.txt¨**
And I am getting this as output
**¨awk: ´{print
awk: ^ invalid char '�' in expression
arjun#arjun-Aspire-4741:~$ ¨**
The problem is that you've used forward ticks instead of quotes (in this case only single quotes are appropriate):
awk '{print $1}' testing.txt
instead of
awk ´{print $1}´ testing.txt
In shell, strings in double quotes " can contain expressions with special meaning (such as backticks, variables) which will be expanded before the string is processed as part of the full shell command. Strings in single quotes ' are fully escaped; to put it another way, the string is passed literally without any interpretation. That's why you should use single quotes when writing awk scripts, because the awk variable dereference operator $ is the same as in shell. There are no other valid string-delimiting characters*.
I initially thought you'd used backticks (thanks to Andras Deak for spotting my error).
Backticks have a special meaning in shell (equivalent to wrapping something in $(...)): execute this string as a command, and evaluate to its output (stdout). This is done before your main command is executed.
So, if I do
cat `echo myfile`
this turns into
cat myfile
which then executes.
You can read more about shell behaviour in a few places:
http://www.grymoire.com/Unix/Sh.html
http://www.tldp.org/LDP/Bash-Beginners-Guide/html/index.html
* ignoring that spaces are also technically string-delimiters

How to use multiple arguments for awk with a shebang (i.e. #!)?

I'd like to execute an gawk script with --re-interval using a shebang. The "naive" approach of
#!/usr/bin/gawk --re-interval -f
... awk script goes here
does not work, since gawk is called with the first argument "--re-interval -f" (not splitted around the whitespace), which it does not understand. Is there a workaround for that?
Of course you can either not call gawk directly but wrap it into a shell script that splits the first argument, or make a shell script that then calls gawk and put the script into another file, but I was wondering if there was some way to do this within one file.
The behaviour of shebang lines differs from system to system - at least in Cygwin it does not split the arguments by whitespaces. I just care about how to do it on a system that behaves like that; the script is not meant to be portable.
The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.
There is a rough consensus about what it does: take everything between the ! and the \n and exec it. The assumption is that everything between the ! and the \n is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.
Some operating systems simply treat the entire thing as the path. After all, in most operating systems, whitespace or dashes are legal in a path.
Some operating systems split at whitespace and treat the first part as the path to the interpreter and the rest as individual arguments.
Some operating systems split at the first whitespace and treat the front part as the path to the interpeter and the rest as a single argument (which is what you are seeing).
Some even don't support shebang lines at all.
Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.
And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to env so that it can determine the executable's location; e.g.:
#!/usr/bin/env gawk
[Obviously, this still assumes a particular path for env, but there are only very few systems where it lives in /bin, so this is generally safe. The location of env is a lot more standardized than the location of gawk or even worse something like python or ruby or spidermonkey.]
Which means that you cannot actually use any arguments at all.
Although not exactly portable, starting with coreutils 8.30 and according to its documentation you will be able to use:
#!/usr/bin/env -S command arg1 arg2 ...
So given:
$ cat test.sh
#!/usr/bin/env -S showargs here 'is another' long arg -e "this and that " too
you will get:
% ./test.sh
$0 is '/usr/local/bin/showargs'
$1 is 'here'
$2 is 'is another'
$3 is 'long'
$4 is 'arg'
$5 is '-e'
$6 is 'this and that '
$7 is 'too'
$8 is './test.sh'
and in case you are curious showargs is:
#!/usr/bin/env sh
echo "\$0 is '$0'"
i=1
for arg in "$#"; do
echo "\$$i is '$arg'"
i=$((i+1))
done
Original answer here.
This seems to work for me with (g)awk.
#!/bin/sh
arbitrary_long_name==0 "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$#"
# The real awk program starts here
{ print $0 }
Note the #! runs /bin/sh, so this script is first interpreted as a shell script.
At first, I simply tried "exec" "/usr/bin/gawk" "--re-interval" "-f" "$0" "$#", but awk treated that as a command and printed out every line of input unconditionally. That is why I put in the arbitrary_long_name==0 - it's supposed to fail all the time. You could replace it with some gibberish string. Basically, I was looking for a false-condition in awk that would not adversely affect the shell script.
In the shell script, the arbitrary_long_name==0 defines a variable called arbitrary_long_name and sets it equal to =0.
I came across the same issue, with no apparent solution because of the way the whitespaces are dealt with in a shebang (at least on Linux).
However, you can pass several options in a shebang, as long as they are short options and they can be concatenated (the GNU way).
For example, you can not have
#!/usr/bin/foo -i -f
but you can have
#!/usr/bin/foo -if
Obviously, that only works when the options have short equivalents and take no arguments.
Under Cygwin and Linux everything after the path of the shebang gets parsed to the program as one argument.
It's possible to hack around this by using another awk script inside the shebang:
#!/usr/bin/gawk {system("/usr/bin/gawk --re-interval -f " FILENAME); exit}
This will execute {system("/usr/bin/gawk --re-interval -f " FILENAME); exit} in awk.
And this will execute /usr/bin/gawk --re-interval -f path/to/your/script.awk in your systems shell.
#!/bin/sh
''':'
exec YourProg -some_options "$0" "$#"
'''
The above shell shebang trick is more portable than /usr/bin/env.
In the gawk manual (http://www.gnu.org/manual/gawk/gawk.html), the end of section 1.14 note that you should only use a single argument when running gawk from a shebang line. It says that the OS will treat everything after the path to gawk as a single argument. Perhaps there is another way to specify the --re-interval option? Perhaps your script can reference your shell in the shebang line, run gawk as a command, and include the text of your script as a "here document".
Why not use bash and gawk itself, to skip past shebang, read the script, and pass it as a file to a second instance of gawk [--with-whatever-number-of-params-you-need]?
#!/bin/bash
gawk --re-interval -f <(gawk 'NR>3' $0 )
exit
{
print "Program body goes here"
print $1
}
(-the same could naturally also be accomplished with e.g. sed or tail, but I think there's some kind of beauty depending only on bash and gawk itself;)
Just for fun: there is the following quite weird solution that reroutes stdin and the program through file descriptors 3 and 4. You could also create a temporary file for the script.
#!/bin/bash
exec 3>&0
exec <<-EOF 4>&0
BEGIN {print "HALLO"}
{print \$1}
EOF
gawk --re-interval -f <(cat 0>&4) 0>&3
One thing is annoying about this: the shell does variable expansion on the script, so you have to quote every $ (as done in the second line of the script) and probably more than that.
For a portable solution, use awk rather than gawk, invoke the standard BOURNE shell (/bin/sh) with your shebang, and invoke awk directly, passing the program on the command line as a here document rather than via stdin:
#!/bin/sh
gawk --re-interval <<<EOF
PROGRAM HERE
EOF
Note: no -f argument to awk. That leaves stdin available for awk to read input from. Assuming you have gawk installed and on your PATH, that achieves everything I think you were trying to do with your original example (assuming you wanted the file content to be the awk script and not the input, which I think your shebang approach would have treated it as).

Resources