Shell script, escape newlines but emit others? - bash

Given a filename, I want to write a shell-script which emits the following, and pipes it into a process:
Content-Length:<LEN><CR><LF>
<CR><LF>
{ "jsonrpc":"2.0", "params":{ "text":"<ESCAPED-TEXT>" } }
where <ESCAPED-TEXT> is the content of the file but its CRs, LFs and quotation marks have been escaped as \r and \n and \" (and I guess all other JSON escapes will eventually be needed as well), and where <LEN> is the length of final JSON line that includes the escaped text.
Here's my current bash-script solution. It works but is ugly as heck.
(
TXT=`cat ~/a.py | sed -E -e :a -e '$!N; s/\n/\\\n/g; ta' | sed 's/"/\\\"/g'`
CMD='{"jsonrpc":"2.0", "params":{ "text":{"'${TXT}'"}} }'
printf "Content-Length: ${#CMD}\r\n\r\n"
echo -n "${CMD}"
) | pyls
Can anyone suggest how to do this cleaner, please?
This sed script only replaces LFs, not CRs. It accumulates each line into the buffer and then does a s//g to replace all LFs in it. I couldn't figure out anything cleaner that still worked on both Linux and OSX/BSD.
I used both printf and echo. First printf because I do want to emit the CRLFCRLF after the Content-Length header, and you apparently need printf for that because the behavior of echo with escapes isn't uniform across platforms. Next echo because I don't want the \r and \n literals inside TXT to be unescaped, which printf would do.
Context: there's a standard called "Language Server Protocol". Basically you run something like the pyls I'm running here, and you pipe in JsonRPC to it over stdin, and it pipes back stuff. Different people have written language servers for Python (the pyls I'm using here), and C#, and C++, and Typescript, and PHP, and OCaml, and Go, and Java, and each person tends to write their language server in their own language.
I want to write a test-harness which can send some example JsonRPC packets into any such server.
I figured it'd be better to write my test-harness in just the common basic shell-scripting stuff that's available on all platforms out of the box. That way everyone can use my test-harness against their language server. (If I wrote it on Python instead, say, it'd be easier for me to write, but it would force the C# folks to learn+install python just to run it, and likewise the Typescript, PHP, OCaml, Go and other folks.)

a.py:
print("alfa")
print("bravo")
Awk script:
{
gsub("\r", "\\r")
gsub("\42", "\\\42")
z = z $0 "\\n"
}
END {
printf "Content-Length: %d\r\n", length(z) + 42
printf "\r\n"
printf "{\42jsonrpc\42: \0422.0\42, \42params\42: {\42text\42: \42%s\42}}", z
}
Result:
Content-Length: 81
{"jsonrpc": "2.0", "params": {"text": "print(\"alfa\")\r\nprint(\"bravo\")\r\n"}}

Can anyone suggest how to do this cleaner, please?
I guess all other JSON escapes will eventually be needed as well
If I already had Python at my disposal, I'd try really, really hard to use the standard Python JSON encoder, at least for the string escaping part. Why hack together something that kind of works when you can use something known to work that you already are halfway familiar with?
If I didn't have Python, I like Steve Penny's solution. Rules of thumb:
to process sets of files, use the shell
to process data in a file, use awk
if sed can't do it trivially, see rule #2
If you know a little awk, his solution is easy to understand almost at a glance. I would call that "cleaner". If you don't know awk, this would seem to be an excellent opportunity to become acquainted.

I think the main problem with your script is not using format strings with printf. The usual way that printf is used is with various special characters in the format string (like %s, %b, etc) and a list of additional arguments that are substituted into the format string.
That is, when you say "[I used] echo because I don't want the \r and \n literals to be unescaped, which printf would do", the problem is just not using printf "%s" "$string".
Anyway, here's an idea of how to use this stuff to get everything done in bash with no external tools:
escapes=('\n' '\r' '\"') # the escapes we want to put into the output
txt=$(< ~/a.py); # read the file into a variable
for esc in "${escapes[#]}"; do
# escapes are evaluated in a %b string w/ printf
# using -v puts the result into a variable
printf -v lit '%b' "$esc"
# use built-in ${string//pattern/replacement} expansion
txt=${txt//$lit/$esc}
done
txt='{"jsonrpc":"2.0", "params":{ "text":{"'$txt'"}} }'
# escapes in the format string are expanded
# but escapes in the argument substituted for %s are not
printf 'Content-Length: %s\r\n\r\n%s' "${#txt}"
"$txt"

Related

insert text allocated in a variable before the first empty line

I have some text files $f resembling the following
function
%blah
%blah
%blah
code here
I want to append the following text before the first empty line:
%
%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike
%3.0 Unported License. See notes at the end of this file for more information.
I tried the following:
top=$(cat ./PATH/text.txt)
top="${top//$'\n'/\\n}"
sed -i.bak 's#^$#'"$top"'\\n#' $f
where the second line (I think) preserves the new line in the text and the third line (I think) substitutes the first empty line with the text plus a new empty line.
Two problems:
1- My code appends the following text:
%n%This work is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike n%3.0 Unported License. See notes
at the end of this file for more information.\n
2- It appends it at end of the file.
Can someone please help me understand the problems with my code?
If you are using GNU sed, following would work.
Use ^$ to find the empty line and then use sed to replace/put the text that you want.
# Define your replacement text in a variable
a="%\n%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike\n%3.0 Unported License. See notes at the end of this file for more information."
Note, $a should include those \n that will be directly interpreted by sed as newlines.
$ sed "0,/^$/s//$a/" inputfile.txt
In the above syntax, 0 represents the first occurrence.
Output:
function
%blah
%blah
%
%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike
%3.0 Unported License. See notes at the end of this file for more information.
%blah
code here
test
You've included bash and sed tags in your question. Since I can't seem to come up with a way of doing this in sed, here's a bash-only solution. It's likely to perform the worst of all working solutions you might find.
The following works with your sample input:
$ while read -r x; do [[ -z "$x" ]] && cat boilerplate; printf '%s\n' "$x"; done < src
This will however insert the boilerplate before EVERY blank line, which is probably not what you're after. Instead, we should probably make this more than a one-liner:
#!/usr/bin/env bash
y=true
while read -r x; do
if [[ -z "$x" ]] && $y; then
cat boilerplate
y=false
fi
printf '%s\n' "$x"
done < src
Note that unlike the code in your question, this doesn't store your boilerplate in a variable, it just cats it "at the right time".
Note that this sends the combined output to stdout. If your goal is to modify the original file, you'll need to wrap this in something that moves around temporary files. (Note that sed's -i option also doesn't really edit files in place, it only hides the moving-around-temp-files from you.)
The following alternatives are probably a better idea.
A similar solution to the bash one might be achieved with better performance using awk:
awk 'NR==FNR{b=b $0 ORS;next} /^$/&&!y{printf "%s",b;y++} 1' boilerplate src
This awk solution obviously reads your boilerplate into a variable, though it's not a shell variable.
Notwithstanding non-standard platform-specific extensions, awk does not have any facility for editing files "in place" either. A portable solution using awk would still need to push temp files around.
And of course, the following old standard of ed is great to keep in your back pocket:
printf 'H\n/^$/\n-\n.r boilerplate\nw\nq\n' | ed src
In bash, of course, you could always use heretext, which might be clearer:
$ ed src <<< $'H\n/^$/\n-\n.r boilerplate\nw\nq\n'
The ed command is non-stream version of sed. Or rather, sed is the stream version of ed, which has been around since before the dinosaurs and is still going strong.
The commands we're using are separated by newlines and fed to ed's standard input. You can discard stdout if you feel the urge. The commands shown here are:
H - instruct ed to print more useful errors, if it gets any.
/^$/ - search for the first occurrence of a newline.
- - GO BACK ONE LINE. Awesome, right?
.r boilerplate - Read your boilerplate at the current line,
w - and write the file.
q - Quit.
Note that this does not keep a .bak file. You'll need to do that yourself if you really want one.
And if, as you suggested in comments, the filename you're reading is to be constructed from a variable, note that variable expansion does not happen inside format quoting ($' .. '). You can either switch quoting mechanisms mid-script:
ed "$file" <<< $'H\n/^$/\n-\n.r ./TATTOO_'"$currn"$'/top.txt\nw\nq\n'
Or you could put ed script in a variable constructed by printf
printf -v scr 'H\n/^$/\n-\n.r ./TATTOO_%s/top.txt\nw\nq\n' "$currn"
ed "$file" <<< "$scr"`
Adding the text to a variable so you can interpolate the variable is wasteful and an unnecessary complication. sed can easily read the contents of a file by itself.
sed -i.bak '1r./PATH/text.txt' "$f"
Unfortunately, this part of sed is poorly standardized, so you may have to experiment a little bit. Some dialects require a newline (perhaps, or perhaps not, preceded by a backslash) before the filename.
sed -i.bak '1r\
./PATH/text.txt' "$f"
(Notice also the double quotes around the file name. You generally always want double quotes around variables which contain file names. More here.)
Adapting the recipe from here we can extend this to apply to the first empty line instead of the first line.
sed -i.bak -e '/^$/!b' -e 'r./PATH/text.txt' -e :a -e '$!{' -e n -e ba -e } "$f"
This adds the boilerplate after the first empty line but perhaps that's acceptable. Refactoring it to replace it or add an empty line after should not be too challenging anyway. (Maybe use sed -n and instead explicitly print everything except the empty line.)
In brief terms, this skips to the end (simply prints) up until we find the first empty line. Then, we read and print the file, and go into a loop which prints the remainder of the file without returning to the beginning of the script.
sed that I think works. Uses files for the extra bit to be inserted.
b='##\n## comment piece\n##'
sed --posix -ne '
1,/^$/ {
/^$/ {
x;
/^true$/ !{
x
s/^$/true/
i\
'"$b"'
};
x;
s/^.*$//
}
}
p
' file1
with the examples using ranges of 1,/^$/, an empty first line would result in the disclaimer being printed twice. To avoid this, I've set it up to put a flag in the hold space ( x; s/^$/true/ ) that I can swap to the pattern space to check whether its the first blank. Once theres a match for blank line, i\ inserts the comment ($b) in front of the pattern space.
Thanks to ghoti for the initial plan.

Replace File Path Text After Specific String

I have an audacity.cfg file in which I want to script the substitution of two plugin paths. The paths were previously different, so I need to inset the updated ones. I will provide one below.
First, I want to locate this text, which begins the line in question:
FFmpegLibPath
Next, I want to replace that entire line with:
FFmpegLibPath=/Library/Application Support/audacity/libs/libavformat.55.dylib
That's it. It should not be so difficult, but it is. I have done lots of experimenting using sed and awk, but have not been able to get anything to work. While there are LOTS of examples of this online and in this forum, none of them have worked. They all produce errors relating to escape characters, as well as some random other things. I have spent hours experimenting and researching, but have not made any headway.
I realize that the slashes and spaces are likely causing issues, and I have spent considerable time attempting to solve this. I've tried all sorts of things, but as I've said, nothing works.
Does anyone have any ideas about this?
Thanks in advance for your help.
Edit:
I am running MacOS 10.10.5, and one of the things I saw in my research was using GNU sed, because some arguments do not work without it. While I am sure that would produce a better result, I cannot use it because my users would not have it. I think this is part of the reason why this is so difficult, because many of the solutions I have seen are utilizing arguments that I cannot use.
If everything other fails, you always can use the old-school ed solution. :) :)
#!/bin/bash
{
printf 'H\n'
printf '/^FFmpegLibPath[ \t=]/\n'
printf '%s\n' c 'FFmpegLibPath=/Library/Application Support/audacity/libs/libavformat.55.dylib' . w q
} | ed -s "/path/to/audacity.cfg" >/dev/null
The quotes, spaces are mandatory.
The above searching for the line starting with FFmpegLibPath and followed by space or tab or =. So it tries avoid collisions with similar prefixes like: FFmpegLibPath2.
If such collisions are not possible, the above could be simply written as:
ed -s "/path/to/audacity.cfg" >/dev/null <<'EOF'
H
/^FFmpegLibPath/
c
FFmpegLibPath=/Library/Application Support/audacity/libs/libavformat.55.dylib
.
w
q
EOF
or
printf '%s\n' H '/^FFmpegLibPath/' c 'FFmpegLibPath=/Library/Application Support/audacity/libs/libavformat.55.dylib' . w q |
ed -s "/path/to/audacity.cfg" >/dev/null
You can escape the special character (forward slash) and assign it to a variable:
REPL=$(sed 's/[\/]/\\&/g' <<< "/Library/Application Support/audacity/libs/libavformat.55.dylib")
& is sed's meta-character to represent the pattern that was matched.
sed -E "s/(FFmpegLibPath=).+/\1$REPL/" audacity.cfg
Option -E is used to support extended regular expressions
output:
etc
FFmpegLibPath=/Library/Application Support/audacity/libs/libavformat.55.dylib
etc
etc
If you preferred to maintain the updates in a separate text file:
cfg_update.txt
key_name1=value
key_name2=value
key_name3=value
# define delimiter
IFS=\=
cat cfg_update.txt | while read KEY VALUE; do
sed -i -E "s/($KEY=).+/\1$VALUE/" audacity.cfg
done
option -i is used to edit file in place
Finally, be sure to make a backup before your tests, good luck!

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Bash: Pass Command Substitution to External Program (or Function) without Word Splitting

In the below code, I would like to have the output of pyfg() passed exactly as echoed (i.e. with the space between -htns and crl being interpreted literally, not as whitespace, by aoeu()) to aoeu(). Of course, the problem is that in aoeu(), $1 is -htns, $2 is crl, and $3, which I don't want at all in this case, is qjkx. I know this example is thoroughly useless, but the real application to which I'm trying to apply this calls an external program in place of the below aoeu(), so I do need something like what's below.
#!/bin/bash
# pass_space_function.sh
aoeu() {
echo "$1" "$2"
}
pyfg() {
echo "-htns crl" "qjkx"
}
aoeu $(pyfg)
My running the above outputs:
$ ./pass_space_function.sh
-htns crl
My desired output is:
$ ./pass_space_function.sh
-htns crl qjkx
To be clear, I do understand exactly why my code isn't working, but that about which I'm not so sure is how to make it do what I want it to do.
EDIT:
#!/bin/bash
aoeu() {
echo 1:"$1" 2:"$2" 3:"$3"
}
pyfg() {
# These variables might be user-provided.
wvz="/usr/lib/scarychacacters_\"##$:%^&:*(){}[]; a o ;u ;::e i y f.so.4"
bm="/space space space"
snt="/var/cache/normalpath"
printf "%q %q %q" "$wvz" "$bm" "$snt"
}
aoeu $(pyfg)
That code returns, for me, 1:/usr/lib/scarychacacters_\"##\$:%\^\&:\*\(\)\{\}\[\]\;\ 2:a\ 3:o\. It's obviously splitting at the whitespace in $wvz.
The key to correct quoting lies in the understanding what happens.
That echo "-htns crl" "qjkx" for example will print just a byte stream to its stdout, so it will be just -htns crl qjkx in the end. The information that -htns crl were grouped more closely than qjkx is lost.
To avoid this loss you can use printf "%q":
pyfg() {
printf "%q %q" "-htns crl" "qjkx"
}
This will generate quoted output: -htns\ crl qjkx which means to the shell the same as "-htns crl" "qjkx" (whether the space is escaped with a backslash or quoted with double quotes does not make a difference).
The next aspect is the use of $() to pass the output of one program to the next.
The typical way is to put that in double quotes:
aoeu "$(pyfg)"
This way everything is passed without interpretation which is desirable in most cases.
In your case, however, you might want to make the output of pyfg quoted instead of quote the output of pyfg; notice the important difference: The first means that pyfg produces quoted output (as shown above), the second means that pyfg produces output which gets quoted later. The second does not help if the output of pyfg already lost the information which parts belong together.
If you now just leave away the double quotes, the output unfortunately just gets split at the spaces (i. e. first character of $IFS) even if this space is escaped with a backslash. So, instead, you need to use eval in this case to force the shell to interpret the value of $(pyfg) with the normal shell evaluation mechanism:
eval aoeu "$(pyfg)"
EDIT: This works
#!/bin/bash
# pass_space_function.sh
aoeu() {
echo $1 x $2
}
pyfg() {
echo "'-htns crl' 'qjkx'"
}
eval aoeu $(pyfg)

Replacing HTML ascii codes via a bash script?

I need a way to replace HTML ASCII codes like ! with their correct character in bash.
Is there a utility I could run my output through to do this, or something along those lines?
$ echo '!' | recode html/..
!
$ echo '<∞>' | recode html/..
<∞>
I don't know of an easy way, here is what I suppose I would do...
You might be able to script a browser into reading the file in and then saving it as text. If lynx supports html character entities then it might be worth looking in to. If that doesn't work out...
The general solution to something like this is done with sed. You need a "higher order" edit for this, as you would first start with an entity table and then you would edit that table into an edit script itself with a multiple-step procedure. Something like:
. . .
s/&Dagger;/‡/g<br />
s/&#8221;/”/g<br />
. . .
Then, encapsulate this as html, read it in to a browser, and save it as text in the character set you are targeting. If you get it to produce lines like:
s/</</g
then you win. A bash script that calls sed or ex can be driven by the substitute commands in the file.
Here is my solution with the standard Linux toolbox.
$ foo="This is a line feed
And e acute:é with a grinning face 😀."
$ echo "$foo"
This is a line feed
And e acute:é with a grinning face 😀.
$ eval "$(printf '%s' "$foo" | sed 's/^/printf "/;s/&#0*\([0-9]*\);/\$( [ \1 -lt 128 ] \&\& printf "\\\\$( printf \"%.3o\\201\" \1)" || \$(which printf) \\\\U\$( printf \"%.8x\" \1) )/g;s/$/\\n"/')" | sed "s/$(printf '\201')//g"
This is a line feed
And e acute:é with a grinning face 😀.
You see that it works for all kinds of escapes, even Line Feed, e acute (é) which is a 2 byte UTF-8 and even the new emoticons which are in the extended plane (4 bytes unicode).
This command works ALSO with dash which is a trimmed down shell (default shell on Ubuntu) and is also compatible with bash and shells like ash used by the Synology.
If you don't mind sticking with bash and dropping the compatibility, you can make is much simpler.
Bits used should be in any decent Linux box (or OS X?)
- which
- printf (GNU and builtin)
- GNU sed
- eval (shell builtin)
The bash only version don't need which nor the GNU printf.

Resources