bash script: write string with double quotes and blanks to file - bash

I try to use sed to read a line from an ASCII file, parse it and write it slightly changed to a defined line number in an output file.
The line format in the input file is as follows:
linenumber:designator,"variable text content"
e.g.
3:string1,"this is text of string 1"
So the outfile should look as follows in line 3:
string1,"this is text of string 1"
The line includes the double quotes and the blanks. All old lines are moved one line down.
The user is responsible to provide a proper input file regarding the order of lines and has to consider that lines in the output file are moved down with each new line in the input file. The script does not know about any order except for the line number given in the input file.
A script shall read all lines and put the content of those lines into an outputfile at the given line numbers
including double quotes and blanks
without the line number part and the colon
The command I use successfully with the shell is e.g.:
sed -i '3istring1,"this is text of string 1"' outfile
No trouble with quotes, double quotes and blanks there.
Using the bash script
while read line
do
linenum=$(echo $line | cut -f1 -d:)
linestr=$(echo $line | cut -f2 -d:)
sedcmd="sed -i '"
sedcmd=${sedcmd}${linenum}
sedcmd=${sedcmd}i
sedcmd=${sedcmd}${linestr}
sedcmd=${sedcmd}"' outfile"
echo "---> $sedcmd"
$sedcmd
done < script/new_records.txt
shows exactly the same sed command with echo but returns with:
sed: -e expression #1, char 1: unknown command: `''
Apparently executing the sed command from within a bash script is different from executing it directly in the bash shell.
I tried a variety of escape sequences "\" before quotes, double quotes and blanks...but rather randomly, and neither of those was successful.
What do I have to do in order to write the string including blanks and double quotes to a specified line in a text file?

# Assuming OutFile exist and have enough line
while read ThisLine
do
LineNum=$(echo "${ThisLine}" | cut -f1 -d ":" )
echo "${ThisLine##*:}" > /tmp/LineContent.txt
sed -i -n "${LineNum} !{p;b;};r /tmp/LineContent.txt" OutFile
done < script/new_records.txt
Not the best thing because you assume lot of issue like enough line in outfile, no problem reading the line (what about escaped char in quoted string, ...) could occur

Okay, I'll give it a shot. If I understand what you're trying to do correctly, and if you're certain the code input file is not malformed, then
sed -i -f <(sed 's/:/i/' insertions.txt) datafile.txt
is the most straightforward way. This works because with an input specification of
number:text
all one has to do to is to replace the : with an i to get a sed command that says: "When handling line number, insert text". The <() bit is bash-style command substitution that expands to the name of a FIFO from which the output of the command can be read.
It might be prudent to guard against mistakes by saying something like
sed -i -f <(sed '/^[0-9]\+:/!d; s/:/i/' insertions.txt) datafile.txt
This removes all lines from insertions.txt that don't begin with a number followed by a colon because those are obviously broken.
Note that this all-in-one-go approach treats line numbers as they were in the input file. That is to say, given an insertions file with content
2:foo,"bar "
4:baz,"qux "
baz,"qux " will appear in line 5 of the output (before line 4 of the input). If this is not desired, sed will have to be called multiple times to handle each insertion individually, as in
while read insertion; do
sed -i "${insertion/:/i}" datafile.txt
done < insertions.txt
${insertion/:/i} is another bashism that replaces the first : in a shell variable with i and expands to the result, i.e., if insertion=1:2:3, then ${insertion/:/i} is 1i2:3.

Related

why empty double quote is coming in file at last record | shell |

I have 10 files which contain one columnar vertical data that i converted to consolidate one file
with data in horizontal form
file 1 :
A
B
C
B
file 2 :
P
W
R
S
file 3 :
E
U
C
S
similarly like above their will be remaing files
I consolidated all files using below script
cd /path/
#storing all file names to array_list to club data of all into one file
array_list=`( awk -F'/' '{print $2}' )`
for i in {array_list[#]}
do
sed 's/"/""/g; s/.*/"&"/' /path/$i | paste -s -d, >> /path/consolidate.txt
done
Output obtained from above script :
"A","B","C","B"
"P","W","R","S",""
"E","U","C","S"
Why the second line as last entry -> "" -> "P","W","R","S",""
when their are only four values in file 2 , it should be : "P","W","R","S"
Is it happening because of empty line in that file 2 at last ?
Solution will be appreciated
I assume it is indeed from an empty line. You could remove such 'mistakes' by
updating your script to include sed 's/,""$//' like:
sed 's/"/""/g; s/.*/"&"/' /path/$i | paste -s -d, | sed 's/,""$//' >> /path/consolidate.txt
Explanation of the above command, piece by piece
Substitute a double quote for two double quotes (the g option means do this
for every match on each line, rather than just the first match):
sed 's/"/""/g;
We use a semi-colon to tell sed that we will issue another command. The next
substitute command to sed matches the entire line, and replaces it with itself,
but surrounded by double quotes (the & represents the matched pattern):
s/.*/"&"/'
This is an argument to the above sed command, expanding the variable i in the
for loop:
/path/$i
The above commands produce some output ('stdout'), which would by default be
sent to the terminal. Instead of that, we use it as input ('stdin') to a
subsequent command (this is called a 'pipeline'):
|
The next command joins the lines of 'stdin' by replacing the newline characters
with , delimiters (be default the delimiter would be a tab):
paste -s -d,
We pipe the 'stdout' of the last command into another command (continuing the
pipeline):
|
The next command is another sed, this time substituting any occurrences of
,"" that happen at the end of the line (in sed, $ means end of line) with
nothing (in effect deleting the matched patter):
sed 's/,""$//'
The output of the above pipeline is appended to our text file (>> appends,
whilst > overwrites):
>> /path/consolidate.txt

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Looping and grep writes output for the last line only

I am looping through the lines in a text file. And performing grep on each lines through directories. like below
while IFS="" read -r p || [ -n "$p" ]
do
echo "This is the field: $p"
grep -ilr $p * >> Result.txt
done < fields.txt
But the above writes the results for the last line in the file. And not for the other lines.
If i manually execute the command with the other lines, it works (which mean the match were found). Anything that i am missing here? Thanks
The fields.txt looks like this
annual_of_measure__c
attached_lobs__c
apple
When the file fields.txt
has DOS/Windows lineending convention consisting of two character (Carriage-Return AND Linefeed) and
that file is processed by Unix-Tools expecting Unix lineendings consisting of only one character (Linefeed)
then the line read by the read command and stored in the variable $p is in the first line annual_of_measure__c\r (note the additional \r for the Carriage-Return). Then grep will not find a match.
From your description in the question and the confirmation in the comments, it seems that the last line in fields.txt has no lineending at all, so the variable $p is the ordinary string apple and grep can find a match on the last line of the file.
There are tools for converting lineendings, e.g. see this answer or even more options in this answer.

How to append to specific lines in a flat file using shell script

I have a flat file that contains something like this:
11|30646|654387|020751520
11|23861|876521|018277154
11|30645|765418|016658304
Using shell script, I would like to append a string to certain lines in this file, if those lines contain a specific string.
For example, in the above file, for lines containing 23861, I would like to append a string "Processed" at the end, so that the file becomes:
11|30646|654387|020751520
11|23861|876521|018277154|Processed
11|30645|765418|016658304
I could use sed to append the string to all lines in the file, but how do I do it for specific lines ?
I'd do it this way
sed '/\|23861\|/{s/$/|Something/;}' file
This is similar to Marcelo's answer but doesn't require extended expressions and is, I think, a little cleaner.
First, match lines having 23861 between pipes
/\|23861\|/
Then, on those lines, replace the end-of-line with the string |Something
{s/$/|Something/;}
If you want to do more than one of these you could simply list them
sed '/\|23861\|/{s/$/|Something/;};/\|30645\|/{s/$/|SomethingElse/;}' file
Use the following awk-script:
$ awk '/23861/ { $0=$0 "|Processed" } {print}' input
11|30646|654387|020751520
11|23861|876521|018277154|Processed
11|30645|765418|016658304
or, using sed:
$ sed 's/\(.*23861.*$\)/\1|Processed/' input
11|30646|654387|020751520
11|23861|876521|018277154|Processed
11|30645|765418|016658304
Use the substitution command:
sed -i~ -E 's/(\|23861\|.*)/\1|Processed/' flat.file
(Note: the -i~ performs the substitution in-place. Just leave it out if you don't want to modify the original file.)
You can use the shell
while read -r line
do
case "$line" in
*23681*) line="$line|Processed";;
esac
echo "$line"
done < file > tempo && mv tempo file
sed is just a stream version of ed, which has a similar command set but was designed to edit files in place (allegedly interactively, but you wouldn't want to use it that way unless all you had was one of these). Something like
field_2_value=23861
appended_text='|processed'
line_match_regex="^[^|]*|$field_2_value|"
ed "$file" <<EOF
g/$line_match_regex/s/$/$appended_text/
wq
EOF
should get you there.
Note that the $ in .../s/$/... is not expanded by the shell, as are $line_match_regex and $appended_text, because there's no such thing as $/ - instead it's passed through as-is to ed, which interprets it as text to substitute ($ being regex-speak for "end of line").
The syntax to do the same job in sed, should you ever want to do this to a stream rather than a file in place, is very similar except that you don't need the leading g before the regex address:
sed -e "/$line_match_regex/s/$/$appended_text/" "$input_file" >"$output_file"
You need to be sure that the values you put in field_2_value and appended_text never contain slashes, because ed's g and s commands use those for delimiters.
If they might do, and you're using bash or some other shell that allows ${name//search/replace} parameter expansion syntax, you could fix them up on the fly by substituting \/ for every / during expansion of those variables. Because bash also uses / as a substitution delimiter and also uses \ as a character escape, this ends up looking horrible:
appended_text='|n/a'
ed "$file" <<EOF
g/${line_match_regex//\//\\/}/s/$/${appended_text//\//\\/}/
wq
EOF
but it does work. Nnote that both ed and sed require a trailing / after the replacement text in s/search/replace/ while bash's ${name//search/replace} syntax doesn't.

bash cat multiple files content in to single string without newlines

i got some files with name start as eg_. and only each contains one single line
eg_01.txt:
#china:129.00
eg_02.txt
#uk:219.98
eg_03.txt
#USA:341.90
......
i am expecting to cat them in to a single line to send by URL like:
#china:129.00#uk:219.98#USA:341.90
i use
echo cat eg_*
it give me the output look like a string, but it actually contains new line:
"#china:129.00
#uk:219.98
#USA:341.90"
is there any other way i can construct that string which expected and get rid of new line and even the space? is only cat enough to do this?
thanks in advance
You could always pipe it to tr
tr "\n" " "
That removes all newlines on stdin and replaces them with spaces
EDIT: as suggested by Bart Sas, you could also remove newlines with tr -d
tr -d "\n"
(note: just specifying an empty string to tr for the second argument won't do)
Using only one command
url=$(awk '{printf "%s",$0}' eg*)
In Perl, you'd do it like this:
perl -pe'chomp' eg*.txt
The -p says "loop through the input file and do whatever code is specified by the -e switch. The chomp in Perl says "Remove any trailing newlines."

Resources