syntax error near unexpected token `{​ - bash - bash

​​​​​defineColumns() {
​​​​​​​ shift
local dirfun=${​​​​​​​​1:-"/var/log/was/dial"}​​​​​​​​
local basefun=${​​​​​​​​2:-"$logdir/party_info.$(date +%y%m%d%H%M%S)"}​​​​​​​​
touch $logbase_bcdb
info $dirfun $basefun
# Run steps sequentially
loadData
}
I am writing a shell script code in which I have written multiple functions. The above function is throwing error as :
syntax error near unexpected token `{​
What is wrong in the code?

There are non-printing characters in the code: Unicode U+200B ZERO WIDTH SPACE. Remove them and you should be fine.
Firstly to see them, you could use cat -A but it shows these characters as M-bM-^#M-^K, which is confusing IMO. I'd rather read the Python ascii() representation, so here's a quick script:
import fileinput
for line in fileinput.input():
print(ascii(line))
Save that as ascii_lines.py then run with the name of your script:
$ python3 ascii_lines.py filename.sh
'\u200b\u200b\u200b\u200b\u200bdefineColumns() {\n'
'\u200b\u200b\u200b\u200b\u200b\u200b\u200b shift\n'
' local dirfun=${\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b1:-"/var/log/was/dial"}\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\n'
' local basefun=${\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b2:-"$logdir/party_info.$(date +%y%m%d%H%M%S)"}\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\n'
' touch $logbase_bcdb\n'
' info $dirfun $basefun\n'
' # Run steps sequentially\n'
' loadData\n'
'}\n'
Then to remove them, you could use sed, though it doesn't know Unicode, so I'm using a Bash $'' string here to solve that.
$ sed -i $'s/\u200b//g' filename.sh
Afterwards:
$ python3 ascii_lines.py filename
'defineColumns() {\n'
' shift\n'
' local dirfun=${1:-"/var/log/was/dial"}\n'
' local basefun=${2:-"$logdir/party_info.$(date +%y%m%d%H%M%S)"}\n'
' touch $logbase_bcdb\n'
' info $dirfun $basefun\n'
' # Run steps sequentially\n'
' loadData\n'
'}\n'

Related

How do I escape an argument of a bash script in awk?

I have the following issue. I want to run a script with 2 parameters like:
./Myscript.sh $1 $2
$1 is a number, nothing special but $2 It's actually a message that looks like this:
“My message`-12355, this is a message !56432-I am sure it`s a message-46583”.
This message was actually extracted with an awk from some log files. Myscript.sh executes a curl http post and uses the $1 and $2 as parameters for creating the json in curl command like
-d '{"number":"$1","message":"$2"}'
My question is how do I “escape” the argument $2 since the message contains special characters?
Thanks
I'm calling Myscript.sh from another script in a awk command using:
system(./Myscript.sh “$1” \”$2\”)
I was thinking to use backslashes to “escape” but this not seems to work. Any ideas or help would be great. Thanks a lot!
I suggest to use " and not “.
system("./Myscript.sh \"" $1 "\" \"" $2 "\"")
you have exclamation mark ("!") in your shell $2 - I'd safely single-quote them if I were you ::
sh: ./Myscript.sh: No such file or directory
system() command :::
'./Myscript.sh' '1114111'
'My message`-12355, this is a
message !56432-I'\''m sure it`s a message-46583'
# gawk profile, created Mon Jan 9 05:37:09 2023
# Rule(s)
1 ' {
1 print "system() command :::\f\f\r\t",
system_cmd = escSQ($1) (_ = " ")
escSQ($2)_ escSQ($3)
1 system(system_cmd)
}
3 function escSQ(__,_) {
3 _ = "\47"
3 gsub(_,"&\\" (_)_,__)
3 return (_)(__)_
}'
ps : this approach is okay only if none of the command items have items that need to be interpreted by the shell, e.g. ~ (as prefix for script itself) or $? etc

Bash: decode string with url escaped hex codes [duplicate]

I'm looking for a way to turn this:
hello < world
to this:
hello < world
I could use sed, but how can this be accomplished without using cryptic regex?
Try recode (archived page; GitHub mirror; Debian page):
$ echo '<' |recode html..ascii
<
Install on Linux and similar Unix-y systems:
$ sudo apt-get install recode
Install on Mac OS using:
$ brew install recode
With perl:
cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'
With php from the command line:
cat foo.html | php -r 'while(($line=fgets(STDIN)) !== FALSE) echo html_entity_decode($line, ENT_QUOTES|ENT_HTML401);'
An alternative is to pipe through a web browser -- such as:
echo '!' | w3m -dump -T text/html
This worked great for me in cygwin, where downloading and installing distributions are difficult.
This answer was found here
Using xmlstarlet:
echo 'hello < world' | xmlstarlet unesc
A python 3.2+ version:
cat foo.html | python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'
This answer is based on: Short way to escape HTML in Bash? which works fine for grabbing answers (using wget) on Stack Exchange and converting HTML to regular ASCII characters:
sed 's/ / /g; s/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/#'/\'"'"'/g; s/“/\"/g; s/”/\"/g;'
Edit 1: April 7, 2017 - Added left double quote and right double quote conversion. This is part of bash script that web-scrapes SE answers and compares them to local code files here: Ask Ubuntu -
Code Version Control between local files and Ask Ubuntu answers
Edit June 26, 2017
Using sed was taking ~3 seconds to convert HTML to ASCII on a 1K line file from Ask Ubuntu / Stack Exchange. As such I was forced to use Bash built-in search and replace for ~1 second response time.
Here's the function:
LineOut="" # Make global
HTMLtoText () {
LineOut=$1 # Parm 1= Input line
# Replace external command: Line=$(sed 's/&/\&/g; s/</\</g;
# s/>/\>/g; s/"/\"/g; s/'/\'"'"'/g; s/“/\"/g;
# s/”/\"/g;' <<< "$Line") -- With faster builtin commands.
LineOut="${LineOut// / }"
LineOut="${LineOut//&/&}"
LineOut="${LineOut//</<}"
LineOut="${LineOut//>/>}"
LineOut="${LineOut//"/'"'}"
LineOut="${LineOut//'/"'"}"
LineOut="${LineOut//“/'"'}" # TODO: ASCII/ISO for opening quote
LineOut="${LineOut//”/'"'}" # TODO: ASCII/ISO for closing quote
} # HTMLtoText ()
On macOS, you can use the built-in command textutil (which is a handy utility in general):
echo '👋 hello < world 🌐' | textutil -convert txt -format html -stdin -stdout
outputs:
👋 hello < world 🌐
To support the unescaping of all HTML entities only with sed substitutions would require too long a list of commands to be practical, because every Unicode code point has at least two corresponding HTML entities.
But it can be done using only sed, grep, the Bourne shell and basic UNIX utilities (the GNU coreutils or equivalent):
#!/bin/sh
htmlEscDec2Hex() {
file=$1
[ ! -r "$file" ] && file=$(mktemp) && cat >"$file"
printf -- \
"$(sed 's/\\/\\\\/g;s/%/%%/g;s/&#[0-9]\{1,10\};/\&#x%x;/g' "$file")\n" \
$(grep -o '&#[0-9]\{1,10\};' "$file" | tr -d '&#;')
[ x"$1" != x"$file" ] && rm -f -- "$file"
}
htmlHexUnescape() {
printf -- "$(
sed 's/\\/\\\\/g;s/%/%%/g
;s/&#x\([0-9a-fA-F]\{1,8\}\);/\&#x0000000\1;/g
;s/&#x0*\([0-9a-fA-F]\{4\}\);/\\u\1/g
;s/&#x0*\([0-9a-fA-F]\{8\}\);/\\U\1/g' )\n"
}
htmlEscDec2Hex "$1" | htmlHexUnescape \
| sed -f named_entities.sed
Note, however, that a printf implementation supporting \uHHHH and \UHHHHHHHH sequences is required, such as the GNU utility’s. To test, check for example that printf "\u00A7\n" prints §. To call the utility instead of the shell built-in, replace the occurrences of printf with env printf.
This script uses an additional file, named_entities.sed, in order to support the named entities. It can be generated from the specification using the following HTML page:
<!DOCTYPE html>
<head><meta charset="utf-8" /></head>
<body>
<p id="sed-script"></p>
<script type="text/javascript">
const referenceURL = 'https://html.spec.whatwg.org/entities.json';
function writeln(element, text) {
element.appendChild( document.createTextNode(text) );
element.appendChild( document.createElement("br") );
}
(async function(container) {
const json = await (await fetch(referenceURL)).json();
container.innerHTML = "";
writeln(container, "#!/usr/bin/sed -f");
const addLast = [];
for (const name in json) {
const characters = json[name].characters
.replace("\\", "\\\\")
.replace("/", "\\/");
const command = "s/" + name + "/" + characters + "/g";
if ( name.endsWith(";") ) {
writeln(container, command);
} else {
addLast.push(command);
}
}
for (const command of addLast) { writeln(container, command); }
})( document.getElementById("sed-script") );
</script>
</body></html>
Simply open it in a modern browser, and save the resulting page as text as named_entities.sed. This sed script can also be used alone if only named entities are required; in this case it is convenient to give it executable permission so that it can be called directly.
Now the above shell script can be used as ./html_unescape.sh foo.html, or inside a pipeline reading from standard input.
For example, if for some reason it is needed to process the data by chunks (it might be the case if printf is not a shell built-in and the data to process is large), one could use it as:
nLines=20
seq 1 $nLines $(grep -c $ "$inputFile") | while read n
do sed -n "$n,$((n+nLines-1))p" "$inputFile" | ./html_unescape.sh
done
Explanation of the script follows.
There are three types of escape sequences that need to be supported:
&#D; where D is the decimal value of the escaped character’s Unicode code point;
&#xH; where H is the hexadecimal value of the escaped character’s Unicode code point;
&N; where N is the name of one of the named entities for the escaped character.
The &N; escapes are supported by the generated named_entities.sed script which simply performs the list of substitutions.
The central piece of this method for supporting the code point escapes is the printf utility, which is able to:
print numbers in hexadecimal format, and
print characters from their code point’s hexadecimal value (using the escapes \uHHHH or \UHHHHHHHH).
The first feature, with some help from sed and grep, is used to reduce the &#D; escapes into &#xH; escapes. The shell function htmlEscDec2Hex does that.
The function htmlHexUnescape uses sed to transform the &#xH; escapes into printf’s \u/\U escapes, then uses the second feature to print the unescaped characters.
I like the Perl answer given in https://stackoverflow.com/a/13161719/1506477.
cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'
But, it produced an unequal number of lines on plain text files. (and I dont know perl enough to debug it.)
I like the python answer given in https://stackoverflow.com/a/42672936/1506477 --
python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'
but it creates a list [ ... for l in sys.stdin] in memory, that is forbidden for large files.
Here is another easy pythonic way without buffering in memory: using awkg.
$ echo 'hello < : " world' | \
awkg -b 'from html import unescape' 'print(unescape(R0))'
hello < : " world
awkg is a python based awk-like line processor. You may install it using pip https://pypi.org/project/awkg/:
pip install awkg
-b is awk's BEGIN{} block that runs once in the beginning.
Here we just did from html import unescape.
Each line record is in R0 variable, for which we did
print(unescape(R0))
Disclaimer:
I am the maintainer of awkg
I have created a sed script based on the list of entities so it must handle most of the entities.
sed -f htmlentities.sed < file.html
My original answer got some comments, that recode does not work for UTF-8 encoded HTML files. This is correct. recode supports only HTML 4. The encoding HTML is an alias for HTML_4.0:
$ recode -l | grep -iw html
HTML-i18n 2070 RFC2070
HTML_4.0 h h4 HTML
The default encoding for HTML 4 is Latin-1. This has changed in HTML 5. The default encoding for HTML 5 is UTF-8. This is the reason, why recode does not work for HTML 5 files.
HTML 5 defines the list of entities here:
https://html.spec.whatwg.org/multipage/named-characters.html
The definition includes a machine readable specification in JSON format:
https://html.spec.whatwg.org/entities.json
The JSON file can be used to perform a simple text replacement. The following example is a self modifying Perl script, which caches the JSON specification in its DATA chunk.
Note: For some obscure compatibility reasons, the specification allows entities without a terminating semicolon. Because of that the entities are sorted by length in reverse order to make sure, that the correct entities are replaced first so that they do not get destroyed by entities without the ending semicolon.
#! /usr/bin/perl
use utf8;
use strict;
use warnings;
use open qw(:std :utf8);
use LWP::Simple;
use JSON::Parse qw(parse_json);
my $entities;
INIT {
if (eof DATA) {
my $data = tell DATA;
open DATA, '+<', $0;
seek DATA, $data, 0;
my $entities_json = get 'https://html.spec.whatwg.org/entities.json';
print DATA $entities_json;
truncate DATA, tell DATA;
close DATA;
$entities = parse_json ($entities_json);
} else {
local $/ = undef;
$entities = parse_json (<DATA>);
}
}
local $/ = undef;
my $html = <>;
for my $entity (sort { length $b <=> length $a } keys %$entities) {
my $characters = $entities->{$entity}->{characters};
$html =~ s/$entity/$characters/g;
}
print $html;
__DATA__
Example usage:
$ echo '😊 & ٱلْعَرَبِيَّة' | ./html5-to-utf8.pl
😊 & ٱلْعَرَبِيَّة
With Xidel:
echo 'hello < : " world' | xidel -s - -e 'parse-html($raw)'
hello < : " world

zsh sed expanding a variable with special characters and keeping them

I'm trying to store a string in a variable, then expand that variable in a sed command.
Several of the values I'm going to put in the variable before calling the command will have parentheses (with and without slashes before the left parentheses, but never before the right), new lines and other special characters. Also, the string will have double quotes around it in the file that's being searched, and I'd like to use those to limit only to the string I'm querying.
The command needs to be able to match with those special characters in the file. Using zsh / Mac OS, although if the command was compatible with bash 4.2 that'd be a nice bonus. echoing to xargs is fine too. Also, if awk would be better for this, I have no requirement to use sed.
Something like...
sed 's/"\"$(echo -E - ${val})\""/"${key}.localized"/g' "${files}"
Given that $val is the variable I described above, $key has no spaces (but underscores) & $files is an array of file paths (preferably compatible with spaces, but not required).
Example Input values for $val...
... "something \(customStringConvertible) here" ...
... "something (notVar) here" ...
... "something %# here" ...
... "something # 100% here" ...
... "something for $100.00" ...
Example Output:
... "some_key".localized ...
I was using the sed command to replace the examples above. The text I'm overwriting it with is pretty straight forward.
The key problem I'm having is getting the command to match with the special characters instead of expanding them and then trying to match.
Thanks in advance for any assistance.
awk is better since it provides functions that work with literal strings:
$ val='something \(customStringConvertible) here' awk 'index($0,ENVIRON["val"])' file
... "something \(customStringConvertible) here" ...
$ val='something for $100.00' awk 'index($0,ENVIRON["val"])' file
... "something for $100.00" ...
The above was run on this input file:
$ cat file
... "something \(customStringConvertible) here" ...
... "something (notVar) here" ...
... "something %# here" ...
... "something # 100% here" ...
... "something for $100.00" ...
With sed you'd have to follow the instructions at Is it possible to escape regex metacharacters reliably with sed to try to fake sed out.
It's not clear what your real goal is so edit your question to provide concise, testable sample input and expected output if you need more help. Having said that, it looks like you're doing a substitution so maybe this is what you want:
$ old='"something for $100.00"' new='here & there' awk '
s=index($0,ENVIRON["old"]) { print substr($0,1,s-1) ENVIRON["new"] substr($0,s+length(ENVIRON["old"])) }
' file
... here & there ...
or if you prefer:
$ old='"something for $100.00"' new='here & there' awk '
BEGIN { old=ENVIRON["old"]; new=ENVIRON["new"]; lgth=length(old) }
s=index($0,old) { print substr($0,1,s-1) new substr($0,s+lgth) }
' file
or:
awk '
BEGIN { old=ARGV[1]; new=ARGV[2]; ARGV[1]=ARGV[2]=""; lgth=length(old) }
s=index($0,old) { print substr($0,1,s-1) new substr($0,s+lgth) }
' '"something for $100.00"' 'here & there' file
... here & there ...
See How do I use shell variables in an awk script? for info on how I'm using ENVIRON[] vs ARGV[] above.

Bash Script printing different output of same command running on "cmd line window" and "reading from csv file"

I have a shell script which reads input command(with arguments) from a csv file and execute.
The command is /path/ABCD_CALL GI 30-JUN-2010 '' 98994-01
here '' is single quote without space,
In csv file I am using /opt/isis/infosys/src/gq19iobl/IOBL_CALL GI 30-JUN-2010 \'' \'' 98994-01
to escape single qoutes
Below is the shell script
IFS=","
cat modules.csv | while read line;
do
d="${line}"
eval "$d"
done
This command shows hundreds of records as an output on console window.
The issue I am facing is, the same command when I type manually and run from command window, I am able to see all the output records; but the same command when I am running from csv using shell script mentioned above I am getting only 1 record which shows error array.
I applied debugging using
set -x
trap read debug
There I can see below output
+ cat modules.csv
+ read line
' d='/path/ABCD_CALL GI 30-JUN-2010 '\'''\'' '\'''\'' 98994-01
' eval '/path/ABCD_CALL GI 30-JUN-2010 '\'''\'' '\'''\'' 98994-01
++ /path/ABCD_CALL GI 30-JUN-2010 '' '' $'98994-01\r'
------------- ABCD RESULT SUMMARY -------------
ABCD return message : MESSAGE ARRAY must be checked for ERRORS and WARNINGS. and ABCD returned a 1
Total balances : 0.
Total errors : 1.
error_array[0]
and so on with other details of error.
What should I do to see the same output when reading the same data from csv?

Net::OpenSSH command remote with multi pipeline

i have a problem when attempt run multi command in remote linux, with Perl and Module Net::OpenSSH. .
use Net::OpenSSH;
my $new_connect = Net::OpenSSH->new($D_CONNECT{'HOST'}, %OPT);
my $file = "file.log.gz"
my ($output2, $error2) = $new_connect->capture({ timeout => 10 }, "gunzip -c /path/to/file/$file | tail -n1 | awk '/successfully completed/ {print \$NF}'");
the output that i get is:
bash: -c: line 1: syntax error near unexpected token |'
bash: -c: line 1: |tail -n1 |awk '/successfully completed/ {print $NF}''
;;;
any idea or suggestion, thanks.
Fcs
Probably a quoting error. Just let Net::OpenSSH take care of the quoting for you:
my ($output2, $error2) = $new_connect->capture({ timeout => 10 },
'gunzip', '-c', "/path/to/file/$file", \\'|',
'tail', '-n1', \\'|',
'awk', '/successfully completed/ {print $NF}');
Note how the pipes (|) are passed as a double reference so that they are passed unquoted to the remote shell. The module documentation has a section on quoting.
That looks like the error message you'd get if you had a newline at the end of your $file string, causing the pipe character to be at the start of a second line (interpreted as the start of a second command).
This test deomstrates the same error:
bash -c 'echo foo
| cat'
So I guess your bug doesn't really occur with $file = "file.log.gz" and whatever your real $file is, you need to chomp it.
A bigger mystery is why bash says the error is on line 1 of the -c. ash, ksh, and zsh all correctly report it on line 2.

Resources