why empty double quote is coming in file at last record | shell | - shell

I have 10 files which contain one columnar vertical data that i converted to consolidate one file
with data in horizontal form
file 1 :
A
B
C
B
file 2 :
P
W
R
S
file 3 :
E
U
C
S
similarly like above their will be remaing files
I consolidated all files using below script
cd /path/
#storing all file names to array_list to club data of all into one file
array_list=`( awk -F'/' '{print $2}' )`
for i in {array_list[#]}
do
sed 's/"/""/g; s/.*/"&"/' /path/$i | paste -s -d, >> /path/consolidate.txt
done
Output obtained from above script :
"A","B","C","B"
"P","W","R","S",""
"E","U","C","S"
Why the second line as last entry -> "" -> "P","W","R","S",""
when their are only four values in file 2 , it should be : "P","W","R","S"
Is it happening because of empty line in that file 2 at last ?
Solution will be appreciated

I assume it is indeed from an empty line. You could remove such 'mistakes' by
updating your script to include sed 's/,""$//' like:
sed 's/"/""/g; s/.*/"&"/' /path/$i | paste -s -d, | sed 's/,""$//' >> /path/consolidate.txt
Explanation of the above command, piece by piece
Substitute a double quote for two double quotes (the g option means do this
for every match on each line, rather than just the first match):
sed 's/"/""/g;
We use a semi-colon to tell sed that we will issue another command. The next
substitute command to sed matches the entire line, and replaces it with itself,
but surrounded by double quotes (the & represents the matched pattern):
s/.*/"&"/'
This is an argument to the above sed command, expanding the variable i in the
for loop:
/path/$i
The above commands produce some output ('stdout'), which would by default be
sent to the terminal. Instead of that, we use it as input ('stdin') to a
subsequent command (this is called a 'pipeline'):
|
The next command joins the lines of 'stdin' by replacing the newline characters
with , delimiters (be default the delimiter would be a tab):
paste -s -d,
We pipe the 'stdout' of the last command into another command (continuing the
pipeline):
|
The next command is another sed, this time substituting any occurrences of
,"" that happen at the end of the line (in sed, $ means end of line) with
nothing (in effect deleting the matched patter):
sed 's/,""$//'
The output of the above pipeline is appended to our text file (>> appends,
whilst > overwrites):
>> /path/consolidate.txt

Related

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Add file content in another file after first match only

Using bash, I have this line of code that adds the content of a temp file into another file, after a specific match:
sed -i "/text_to_match/r ${tmpFile}" ${fileName}
I would like it to add the temp file content only after the FIRST match.
I tried using addresses:
sed -i "0,/text_to_match//text_to_match/r ${tmpFile}" ${fileName}
But it doesn't work, saying that "/" is an unknown command.
I can make addresses work if I use a standard replacement "s/to_replace/with_this/", but I can't make it work with this sed command.
It seems like I can't use addresses if my sed command starts with / instead of a letter.
I'm not stuck with addresses, as long as I can insert the temp file content into another file only once.
You're getting that error because if you have an address range (ADDR1,ADDR2) you can't put another address after it: sed expects a command there and / is not a command.
You'll want to use some braces here:
$ seq 20 > file
$ echo "new content" > tmpFile
$ sed '0,/5/{/5/ r tmpFile
}' file
outputs the new text only after the first line with '5'
1
2
3
4
5
new content
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I found I needed to put a newline after the filename. I was getting this error otherwise
sed: -e expression #1, char 0: unmatched `{'
It appears that sed takes the whole rest of the line as the filename.
Probably more tidy to write
sed '0,/5/ {
/5/ r tmpFile
}' file
Full transparency: I don't use sed except for very simple tasks. In reality I would use awk for this job
awk '
{print}
!seen && $0 ~ patt {
while (getline line < f) print line
close(f)
seen = 1
}
' patt="5" f=tmpFile file
Glenn Jackman provided with an excellent answer to why the OP's attempt did not work.
In continuation to Glenn Jackman's answer, if you want to have the command on a single line, you should use branching so that the r command is at the end.
Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be followed by a <semicolon>, optional <blank> characters, and another editing command. However, when an s editing command is used with the w flag, following it with another command in this manner produces undefined results. [source: POSIX sed Standard]
The r,R,w,W commands parse the filename until end of the line. If whitespace, comments or semicolons are found, they will be included in the filename, leading to unexpected results.[source: GNU sed manual]
which gives:
sed -e '1,/pattern/{/pattern/ba};b;:a;r rfile' file
GNU sed also allows s///e to shell out. So there's this one-liner using Glenn's tmpFile and file.
sed '0,/5/{//{p;s/.*/cat tmpFile/e}}' file
// to repeat the previous pattern match (helps if it's longer than /5/)
p to print the matching line
s/.*/cat tmpFile/e to empty the pattern buffer and stick a the cat tmpFile shell command in there and e execute it and dump the output in the stream
You have 2 forward slashes together, right next to each other in the second sed example.

bash script: write string with double quotes and blanks to file

I try to use sed to read a line from an ASCII file, parse it and write it slightly changed to a defined line number in an output file.
The line format in the input file is as follows:
linenumber:designator,"variable text content"
e.g.
3:string1,"this is text of string 1"
So the outfile should look as follows in line 3:
string1,"this is text of string 1"
The line includes the double quotes and the blanks. All old lines are moved one line down.
The user is responsible to provide a proper input file regarding the order of lines and has to consider that lines in the output file are moved down with each new line in the input file. The script does not know about any order except for the line number given in the input file.
A script shall read all lines and put the content of those lines into an outputfile at the given line numbers
including double quotes and blanks
without the line number part and the colon
The command I use successfully with the shell is e.g.:
sed -i '3istring1,"this is text of string 1"' outfile
No trouble with quotes, double quotes and blanks there.
Using the bash script
while read line
do
linenum=$(echo $line | cut -f1 -d:)
linestr=$(echo $line | cut -f2 -d:)
sedcmd="sed -i '"
sedcmd=${sedcmd}${linenum}
sedcmd=${sedcmd}i
sedcmd=${sedcmd}${linestr}
sedcmd=${sedcmd}"' outfile"
echo "---> $sedcmd"
$sedcmd
done < script/new_records.txt
shows exactly the same sed command with echo but returns with:
sed: -e expression #1, char 1: unknown command: `''
Apparently executing the sed command from within a bash script is different from executing it directly in the bash shell.
I tried a variety of escape sequences "\" before quotes, double quotes and blanks...but rather randomly, and neither of those was successful.
What do I have to do in order to write the string including blanks and double quotes to a specified line in a text file?
# Assuming OutFile exist and have enough line
while read ThisLine
do
LineNum=$(echo "${ThisLine}" | cut -f1 -d ":" )
echo "${ThisLine##*:}" > /tmp/LineContent.txt
sed -i -n "${LineNum} !{p;b;};r /tmp/LineContent.txt" OutFile
done < script/new_records.txt
Not the best thing because you assume lot of issue like enough line in outfile, no problem reading the line (what about escaped char in quoted string, ...) could occur
Okay, I'll give it a shot. If I understand what you're trying to do correctly, and if you're certain the code input file is not malformed, then
sed -i -f <(sed 's/:/i/' insertions.txt) datafile.txt
is the most straightforward way. This works because with an input specification of
number:text
all one has to do to is to replace the : with an i to get a sed command that says: "When handling line number, insert text". The <() bit is bash-style command substitution that expands to the name of a FIFO from which the output of the command can be read.
It might be prudent to guard against mistakes by saying something like
sed -i -f <(sed '/^[0-9]\+:/!d; s/:/i/' insertions.txt) datafile.txt
This removes all lines from insertions.txt that don't begin with a number followed by a colon because those are obviously broken.
Note that this all-in-one-go approach treats line numbers as they were in the input file. That is to say, given an insertions file with content
2:foo,"bar "
4:baz,"qux "
baz,"qux " will appear in line 5 of the output (before line 4 of the input). If this is not desired, sed will have to be called multiple times to handle each insertion individually, as in
while read insertion; do
sed -i "${insertion/:/i}" datafile.txt
done < insertions.txt
${insertion/:/i} is another bashism that replaces the first : in a shell variable with i and expands to the result, i.e., if insertion=1:2:3, then ${insertion/:/i} is 1i2:3.

vim and sed single liner: negate search, multiple search the substituted line, open file and highlight in vim editor

Following is what I am want to do in one line:
1. Search for lines that do not match a pattern in file ak.txt, substitute each line with \<the negate matched line\>\|:
$ sed -n '/.*[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9]$/!s:.*:\\\<&\\\>\\\|:p' ak.txt
Output:
\<ee integration_01.32.00\>\|
\<ration_01.2.000\>\|
\<gon_1.21.000\>\|
\<dkei_ting-on_1.2.00\>\|
\<see integration instructions\>\|
Above works, I don't want last two characters on last line though ...
2. Asking 'VIM' to multiple highlight the substituted line searches, throws error, below is command:
$ vim -c "/`sed -n '/.*[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9]$/!s:.*:\\\<&\\\>\\\|:p' ak.txt`" -c 'set hls' ak.txt
Error:
bash: .: unrecognized history modifier
I guess we got what I am trying to get, How do we tell BASH through VIM and SED?
If you must do it in one line, then:
sed -n '/.*[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9]*$/!{ s:.*:\\\<&\\\>\\\|:; $s/\\|$//; p; }' ak.txt
or:
sed -n '/.*[0-9][0-9]\.[0-9][0-9]\.[0-9]\{3\}*$/!{ s:.*:\\\<&\\\>\\\|:; $s/\\|$//; p; }' ak.txt
This takes your basic operation, but uses the { and } to group 3 actions. The first is your substitution operation, unchanged (I was tempted to change the : into /, but didn't). The second only matches the last line of the file, and removes the trailing \| that seems to be your primary objective. The third operation prints the line.
This would run into problems if the last line actually matched dd.dd.ddd at the end (where d represents a digit). If that's a problem, you'll probably have to use two sed commands (you can write it on a single line if that's what you want, but I'd not write it on a single line):
sed -n '/.*[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9]$/!s:.*:\\\<&\\\>\\\|:p' ak.txt |
sed '$s/\\|$//'
The first line is exactly your original command; the second simply deletes the \| at the end of the last line.

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Resources