How do you stop sh from interpretting the '\' character? [duplicate] - bash

This question already has answers here:
How to keep backslash when reading from a file?
(4 answers)
sh read command eats backslashes in input?
(2 answers)
Closed 3 years ago.
`I have a script where I am attempting to read from a manifest file, translate DOS paths in that manifest to UNIX paths, and then operate on those files. Here is a snippet of code that I am trying to debug:
while read line
do
srcdir=$(printf '%s' "$line" | awk -F \\ -v OFS=/ '{ gsub(/\r|^[ \t]+|[ \t]+$/, "") } !NF { next } /^\\\\/ { sub(/^.*\\prj\\/, "\\prj\\") } { $1 = $1 } 1')
done < manifest.txt
My input file looks like this:
$ cat manifest.txt
\\server\mount\directory
When I debug my little shell snippet, I get the following:
+ read line
++ printf %s '\servermountdirectory
'
++ awk -F '\' -v OFS=/ '{ gsub(/\r|^[ \t]+|[ \t]+$/, "") } !NF { next } /^\\\\/ { sub(/^.*\\prj\\/, "\\prj\\") } { $1 = $1 } 1'
+ srcdir=\servermountdirectory
So... Either at read or at printf, the \ characters are being interpreted as escape characters -- how do I work around that?
Note... I know I could just run the while loop in awk... the thing is that in my real program, I have other things inside that while loop that need to be done with "$srcdir" -- and for this, sh is the right tool... So I really need a solution in sh.

From posix read:
By default, unless the -r option is specified, < backslash> shall act as an escape character. An unescaped < backslash> shall preserve the literal value of the following character, with the exception of a < newline>. If a < newline> follows the < backslash>, the read utility shall interpret this as line continuation. The < backslash> and < newline> shall be removed before splitting the input into fields. All other unescaped < backslash> characters shall be removed after splitting the input into fields.
and:
-r
Do not treat a character in any special way. Consider each to be part of the input line.
Just:
while read -r line; do
Also remember that without IFS= this will not preserve trailing and leading whitespaces.
Remember to always do read -r. Here is a good read: bashfaq How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?.
Also remember that reading file line by line is very inefficient in bash. It's way better to process the whole file using commands, tools, streams and pipes. If you have to read the file line by line, let the "preprocessing" stage parse the whole file, then read it line by line:
awk .... manifest.txt |
while read -r srcdir; do
echo "$srcdir"
done
or with command redirection, if you need the loop to run in the same shell:
while read -r srcdir; do
echo "$srcdir"
done < <(awk ... manifest.txt)

Related

convert a file content using shell script

Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me.
The file's content like this
/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5
to this:
a2>a3
b2>b3
c2>c3
The script should ignore the first line and print the second and third values separated by '>'
I'm half way there, and here is my code
#!/bin/bash
#while Loops
i=1
while IFS=\" read t1 t2 t3
do
test $i -eq 1 && ((i=i+1)) && continue
echo $t1|cut -d\, -f2 | { tr -d '\n'; echo \>$t2; }
done < $1
The problem in my code is that the last line isnt printed unless the file finishes with an empty line \n
And I want the echo to be printed inside a new CSV file(I tried to set the standard output to my new file but only the last echo is printed there).
Can someone please help me out? Thanks in advance.
Rather than treating the double quotes as a field separator, it seems cleaner to just delete them (assuming that is valid). Eg:
$ < input tr -d '"' | awk 'NR>1{print $2,$3}' FS=, OFS=\>
a2>a3
b2>b3
c2>c3
If you cannot just strip the quotes as in your sample input but those quotes are escaping commas, you could hack together a solution but you would be better off using a proper CSV parsing tool. (eg perl's Text::CSV)
Here's a simple pipeline that will do the trick:
sed '1d' data.txt | cut -d, -f2-3 | tr -d '"' | tr ',' '>'
Here, we're just removing the first line (as desired), selecting fields 2 & 3 (based on a comma field separator), removing the double quotes and mapping the remaining , to >.
Use this Perl one-liner:
perl -F',' -lane 'next if $. == 1; print join ">", map { tr/"//d; $_ } #F[1,2]' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

Multiline CSV: output on a single line, with double-quoted input lines, using a different separator

I'm trying to get a multiline output from a CSV into one line in Bash.
My CSV file looks like this:
hi,bye
hello,goodbye
The end goal is for it to look like this:
"hi/bye", "hello/goodbye"
This is currently where I'm at:
INPUT=mycsvfile.csv
while IFS=, read col1 col2 || [ -n "$col1" ]
do
source=$(awk '{print;}' | sed -e 's/,/\//g' )
echo "$source";
done < $INPUT
The output is on every line and I'm able to change the , to a / but I'm not sure how to put the output on one line with quotes around it.
I've tried BEGIN:
source=$(awk 'BEGIN { ORS=", " }; {print;}'| sed -e 's/,/\//g' )
But this only outputs the last line, and omits the first hi/bye:
hello/goodbye
Would anyone be able to help me?
Just do the whole thing (mostly) in awk. The final sed is just here to trim some trailing cruft and inject a newline at the end:
< mycsvfile.csv awk '{print "\""$1, $2"\""}' FS=, OFS=/ ORS=", " | sed 's/, $//'
If you're willing to install trl, a utility of mine, the command can be simplified as follows:
input=mycsvfile.csv
trl -R '| ' < "$input" | tr ',|' '/,'
trl transforms multiline input into double-quoted single-line output separated by ,<space> by default.
-R '| ' (temporarily) uses |<space> as the separator instead; this assumes that your data doesn't contain | instances, but you can choose any char. that you know not be part of your data.
tr ',|' '/,' then translates all , instances (field-internal to the input lines) into / instances, and all | instances (the temporary separator) into , instances, yielding the overall result as desired.
Installation of trl from the npm registry (Linux and macOS)
Note: Even if you don't use Node.js, npm, its package manager, works across platforms and is easy to install; try
curl -L https://git.io/n-install | bash
With Node.js installed, install as follows:
[sudo] npm install trl -g
Note:
Whether you need sudo depends on how you installed Node.js and whether you've changed permissions later; if you get an EACCES error, try again with sudo.
The -g ensures global installation and is needed to put trl in your system's $PATH.
Manual installation (any Unix platform with bash)
Download this bash script as trl.
Make it executable with chmod +x trl.
Move it or symlink it to a folder in your $PATH, such as /usr/local/bin (macOS) or /usr/bin (Linux).
$ awk -F, -v OFS='/' -v ORS='"' '{$1=s ORS $1; s=", "; print} END{printf RS}' file
"hi/bye", "hello/goodbye"
There is no need for a bash loop, which is invariably slow.
sed and tr can do this more efficiently:
input=mycsvfile.csv
sed 's/,/\//g; s/.*/"&", /; $s/, $//' "$input" | tr -d '\n'
s/,/\//g uses replaces all (g) , instances with / instances (escaped as \/ here).
s/.*/"&", / encloses the resulting line in "...", followed by ,<space>:
regex .* matches the entire pattern space (the potentially modified input line)
& in the replacement string represent that match.
$s/, $// removes the undesired trailing ,<space> from the final line ($)
tr -d '\n' then simply removes the newlines (\n) from the result, because sed invariably outputs each line with a trailing newline.
Note that the above command's single-line output will not have a trailing newline; simply append ; printf '\n' if it is needed.
In awk:
$ awk '{sub(/,/,"/");gsub(/^|$/,"\"");b=b (NR==1?"":", ")$0}END{print b}' file
"hi/bye", "hello/goodbye"
Explained:
$ awk '
{
sub(/,/,"/") # replace comma
gsub(/^|$/,"\"") # add quotes
b=b (NR==1?"":", ") $0 # buffer to add delimiters
}
END { print b } # output
' file
I'm assuming you just have 2 lines in your file? If you have alternating 2 line pairs, let me know in comments and I will expand for that general case. Here is a one-line awk conversion for you:
# NOTE: I am using the octal ascii code for the
# double quote char (\42=") in my printf statement
$ awk '{gsub(/,/,"/")}NR==1{printf("\42%s\42, ",$0)}NR==2{printf("\42%s\42\n",$0)}' file
output:
"hi/bye", "hello/goodbye"
Here is my attempt in awk:
awk 'BEGIN{ ORS = " " }{ a++; gsub(/,/, "/"); gsub(/[a-z]+\/[a-z]+/, "\"&\""); print $0; if (a == 1){ print "," }}{ if (a==2){ printf "\n"; a = 0 } }'
Works also if your Input has more than two lines.If you need some explanation feel free to ask :)

bash script to modify and extract information

I am creating a bash script to modify and summarize information with grep and sed. But it gets stuck.
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
#Extract lines starting with ">#HWI"
ONLY=`grep -v ^\>#HWI`
#replaces A and G with R in lines
ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`
grep R $ONLYR | wc -l
The correct way to write a shell script to do what you seem to be trying to do is:
awk '
!/^>#HWI/ {
gsub(/[AG]/,"R")
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
Just put that in the file myscript.sh and execute it as you do today.
To be clear - the bulk of the above code is an awk script, the shell script part is the first and last lines where the shell just calls awk and passes it the input file names.
If you WANT to have intermediate variables then you can create/print them with:
awk '
!/^>#HWI/ {
only = $0
onlyR = only
gsub(/[AG]/,"R",onlyR)
print "only:", only
print "onlyR:", onlyR
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
The above will work robustly, portably, and efficiently on all UNIX systems.
First of all, and as #fedorqui commented - you're not providing grep with a source of input, against which it will perform line matching.
Second, there are some problems in your script, which will result in unwanted behavior in the future, when you decide to manipulate some data:
Store matching lines in an array, or a file from which you'll later read values. The variable ONLY is not the right data structure for the task.
By convention, environment variables (PATH, EDITOR, SHELL, ...) and internal shell variables (BASH_VERSION, RANDOM, ...) are fully capitalized. All other variable names should be lowercase. Since
variable names are case-sensitive, this convention avoids accidentally overriding environmental and internal variables.
Here's a better version of your script, considering these points, but with an open question regarding what you were trying to do in the last line : grep R $ONLYR | wc -l :
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
input_file=$1
# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>#HWI' "$input_file")
#replaces A and G with R in lines
for((i=0;i<${#only[#]};i++)); do
only[i]="${only[i]//[AG]/R}"
done
# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[#]}"
# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l

sed from pattern till end of file inside a for loop

I'm writing a bash script that would allow me to take a certain amount of text from a file and add some other text before that for a list of files.
directory=$(pwd)
for f in *test.txt
do
filename=$(basename $f .txt)
printf "%%sum=4 \n"> input.temp
printf "file=$directory"/"$filename".txt" \n">> input.temp
printf "some commands \n">> input.temp
printf "\n" >> input.temp
printf "description \n">> input.temp
sed -n "/0 1/,$p" "$f" >> input.temp;
mv input.temp $filename.temp
done
I have a problem with the sed command inside the for loop. I looked around and people suggest adding double quotes which I did but to no avail. I think it might be the $p.
I hope this is clear enough. If it's not, I'll try to explain better.
sed -n "/0 1/,$p" "$f" >> input.temp; does not work
sed -n '/0 1/,$p' "$f" >> input.temp; does not work
sed -n "/0 1/,\$p" "$f" >> input.temp; does not work
FYI I'm not trying to find something else that works. I want to fix this exact input. I sound like an asshole I'm sure.
Sample input
%sum=8
file=otherpath/filename.txt
some other commands
another description
0 1
0.36920852 -0.56246512
0.77541848 0.05756533
2.05409026 0.62333039
2.92655258 0.56906375
2.52034254 -0.05096652
1.24167014 -0.61673008
-0.60708600 -0.99443872
0.10927459 0.09899803
3.90284624 1.00103940
3.18648588 -0.09239788
0.93151968 -1.09013674
2.50047427 1.30468389
2.19361322 2.54108378
3.18742399 0.34152442
3.38679424 1.11276220
1.56936488 3.27250306
1.81754180 4.19564055
1 2 1.5 6
2 3 1.5
3 4
4 5 1.5
5 6 1.5
6 11 1.0
7
8
9
10
11
12
13 16
14
15
16 17
17
My desired output is basically this file from "0 1" till the end preceded by the stuff I put inside the printf.
UPDATE: If you're interested, the two scripts tripleee and Ed Morton provided work perfectly well. The problem in my script was me leaving out the -i option from the sed line (for inplace).
sed -n "/0 1/,$p" "$f" >> input.temp
should be replaced by
sed -ni '/0 1/,$p' "$f"
I see you updated your question and provided some additional information in your comments so try this, uses GNU awk 4.* for -i inplace:
awk -i inplace -v directory="$(pwd)" '
FNR==1 {
print "%%sum=4 "
print "file=" directory "/" FILENAME
print "some commands "
print ""
print "description "
found = 0
}
/0 1/ { found = 1 }
found
' *text.txt
If you don't have GNU awk then the technically correct way to do it is using xargs but it's simpler using a shell loop for the file manipulation (moving) part:
for file in *test.txt
do
awk -v directory="$(pwd)" '
FNR==1 {
print "%%sum=4 "
print "file=" directory "/" FILENAME
print "some commands "
print ""
print "description "
found = 0
}
/0 1/ { found = 1 }
found
' "$file" > tmp && mv tmp "$file"
done
Like others have already commented, you basically just need to use single quotes instead of double, because $p in double quotes gets replaced with the value of the shell variable p by the shell, before sed executes (in practice, probably an empty string).
However, you might also want to investigate doing it all in sed. You might then instead stick with the double quotes (because there are other variables you do want to substitute) and instead escape the dollar sign in $p with a backslash to protect it from the shell.
directory=$(pwd) # just do this once before the loop; the value doesn't change
for f in *text.txt; do
# no braces
filename=$(basename "$f" .txt)
sed -n "1i\\
%sum=4\\
file=$directory/$filename.txt\\
some commands\\
\\
description
/0 1/,\$p" "$f" >inputout.temp2 # no pointless separate temp file
done
In practice, I imagine you would like for the output file to be different in each iteration (maybe "$filename.temp" instead?) but what you do about that is up to you, obviously. As it is now, the file will contain the output from the last iteration.

Resources