How to join all lines with '\n' in bash script [duplicate] - bash

This question already has answers here:
Replace newlines with literal \n
(6 answers)
Closed 3 years ago.
I'm writing a bash script which calls vim to modify another file, then join all the lines in the file using '\n'.
Code I tried in script:
vi filefff (then I modify the text in filefff)
cat filefff
new=$(cat filefff | sed 'N;N;s/\n/\\n/g')
echo $new
Here is the problem:
for example, if there are two lines in the file: first-line aa, second-line
bb,
aa
bb
then I change the file to:
aa
bb
cc
dd
ee
the result of echo $new is aa"\n"bb cc"\n"dd ee"\n".The command only joined some of the lines.
And then I append some more lines:
aa
bb
cc
dd
ee
ff
gg
hh
the result is aa"\n"bb cc"\n"dd ee"\n"ff, the 'hh' is gone.
So I'd like to know why and how to join all the lines with '\n', no matter how many lines I'm going to append to the file.

As enhancement to 'sed' or 'tr' solutions suggested by comments, which can produce VERY long line, consider the following option, which can produce more human-friendly output, allowing a cap on the maximum line length (200 in the examples below)
# Use fold to limit line length
cat filefff | tr '\n' ' ' | fold -w200
# Use fmt to combine lines
cat filefff | fmt -w200
# Use xargs to format
cat filefff | xargs -s200
Note that the 'fmt' will assume line breaks are required when an empty line is provided.

Related

Shell script: Insert multiple lines into a file ONLY after a specified pattern appears for the FIRST time. (The pattern appears multiple times)

I want to insert multiple lines into a file using shell script. Let us consider my original file: original.txt:
aaa
bbb
ccc
aaa
bbb
ccc
aaa
bbb
ccc
.
.
.
and my insert file: toinsert.txt
111
222
333
Now I have to insert the three lines from the 'toinsert.txt' file ONLY after the line 'ccc' appears for the FIRST time in the 'original.txt' file. Note: the 'ccc' pattern appears more than one time in my 'original.txt' file. After inserting ONLY after the pattern appears for the FIRST time, my file should change like this:
aaa
bbb
ccc
111
222
333
aaa
bbb
ccc
aaa
bbb
ccc
.
.
.
I should do the above insertion using a shell script. Can someone help me?
Note2: I found a similar case, with a partial solution:
sed -i -e '/ccc/r toinsert.txt' original.txt
which actually does the insertion multiple times (for every time the ccc pattern shows up).
Use ed, not sed, to edit files:
printf "%s\n" "/ccc/r toinsert.txt" w | ed -s original.txt
It inserts the contents of the other file after the first line containing ccc, but unlike your sed version, only after the first.
This might work for you (GNU sed):
sed '0,/ccc/!b;/ccc/r insertFile' file
Use a range:
If the current line is in the range following the first occurrence of ccc, break from further processing and implicitly print as usual.
Otherwise if the current line does contain ccc,insert lines from insertFile.
N.B. This uses the address 0 which allows the regexp to occur on line 1 and is specific to GNU sed.
or:
sed -e '/ccc/!b;r insertFile' -e ':a;n;ba' file
Use a loop:
If a line does not contain ccc, no further processing and print as usual.
Otherwise, insert lines from insertFile and then using a loop, fetch/print the remaining lines until the end of the file.
N.B. The r command insists on being delimited from other sed commands by a newline. The -e option simulates this effect and thus the sed commands are split across two -e options.
or:
sed 'x;/./{x;b};x;/ccc/!b;h;r insertFile' file
Use a flag:
If the hold space is not empty (the flag has already been set), no further processing and print as usual.
Otherwise, if the line does not contain ccc, no further processing and print as usual.
Otherwise, copy the current line to the hold space (set the flag) and insert lines from insertFile.
N.B. In all cases the r command inserts lines from insertFile after the current line is printed.

why empty double quote is coming in file at last record | shell |

I have 10 files which contain one columnar vertical data that i converted to consolidate one file
with data in horizontal form
file 1 :
A
B
C
B
file 2 :
P
W
R
S
file 3 :
E
U
C
S
similarly like above their will be remaing files
I consolidated all files using below script
cd /path/
#storing all file names to array_list to club data of all into one file
array_list=`( awk -F'/' '{print $2}' )`
for i in {array_list[#]}
do
sed 's/"/""/g; s/.*/"&"/' /path/$i | paste -s -d, >> /path/consolidate.txt
done
Output obtained from above script :
"A","B","C","B"
"P","W","R","S",""
"E","U","C","S"
Why the second line as last entry -> "" -> "P","W","R","S",""
when their are only four values in file 2 , it should be : "P","W","R","S"
Is it happening because of empty line in that file 2 at last ?
Solution will be appreciated
I assume it is indeed from an empty line. You could remove such 'mistakes' by
updating your script to include sed 's/,""$//' like:
sed 's/"/""/g; s/.*/"&"/' /path/$i | paste -s -d, | sed 's/,""$//' >> /path/consolidate.txt
Explanation of the above command, piece by piece
Substitute a double quote for two double quotes (the g option means do this
for every match on each line, rather than just the first match):
sed 's/"/""/g;
We use a semi-colon to tell sed that we will issue another command. The next
substitute command to sed matches the entire line, and replaces it with itself,
but surrounded by double quotes (the & represents the matched pattern):
s/.*/"&"/'
This is an argument to the above sed command, expanding the variable i in the
for loop:
/path/$i
The above commands produce some output ('stdout'), which would by default be
sent to the terminal. Instead of that, we use it as input ('stdin') to a
subsequent command (this is called a 'pipeline'):
|
The next command joins the lines of 'stdin' by replacing the newline characters
with , delimiters (be default the delimiter would be a tab):
paste -s -d,
We pipe the 'stdout' of the last command into another command (continuing the
pipeline):
|
The next command is another sed, this time substituting any occurrences of
,"" that happen at the end of the line (in sed, $ means end of line) with
nothing (in effect deleting the matched patter):
sed 's/,""$//'
The output of the above pipeline is appended to our text file (>> appends,
whilst > overwrites):
>> /path/consolidate.txt

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Add file content in another file after first match only

Using bash, I have this line of code that adds the content of a temp file into another file, after a specific match:
sed -i "/text_to_match/r ${tmpFile}" ${fileName}
I would like it to add the temp file content only after the FIRST match.
I tried using addresses:
sed -i "0,/text_to_match//text_to_match/r ${tmpFile}" ${fileName}
But it doesn't work, saying that "/" is an unknown command.
I can make addresses work if I use a standard replacement "s/to_replace/with_this/", but I can't make it work with this sed command.
It seems like I can't use addresses if my sed command starts with / instead of a letter.
I'm not stuck with addresses, as long as I can insert the temp file content into another file only once.
You're getting that error because if you have an address range (ADDR1,ADDR2) you can't put another address after it: sed expects a command there and / is not a command.
You'll want to use some braces here:
$ seq 20 > file
$ echo "new content" > tmpFile
$ sed '0,/5/{/5/ r tmpFile
}' file
outputs the new text only after the first line with '5'
1
2
3
4
5
new content
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I found I needed to put a newline after the filename. I was getting this error otherwise
sed: -e expression #1, char 0: unmatched `{'
It appears that sed takes the whole rest of the line as the filename.
Probably more tidy to write
sed '0,/5/ {
/5/ r tmpFile
}' file
Full transparency: I don't use sed except for very simple tasks. In reality I would use awk for this job
awk '
{print}
!seen && $0 ~ patt {
while (getline line < f) print line
close(f)
seen = 1
}
' patt="5" f=tmpFile file
Glenn Jackman provided with an excellent answer to why the OP's attempt did not work.
In continuation to Glenn Jackman's answer, if you want to have the command on a single line, you should use branching so that the r command is at the end.
Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be followed by a <semicolon>, optional <blank> characters, and another editing command. However, when an s editing command is used with the w flag, following it with another command in this manner produces undefined results. [source: POSIX sed Standard]
The r,R,w,W commands parse the filename until end of the line. If whitespace, comments or semicolons are found, they will be included in the filename, leading to unexpected results.[source: GNU sed manual]
which gives:
sed -e '1,/pattern/{/pattern/ba};b;:a;r rfile' file
GNU sed also allows s///e to shell out. So there's this one-liner using Glenn's tmpFile and file.
sed '0,/5/{//{p;s/.*/cat tmpFile/e}}' file
// to repeat the previous pattern match (helps if it's longer than /5/)
p to print the matching line
s/.*/cat tmpFile/e to empty the pattern buffer and stick a the cat tmpFile shell command in there and e execute it and dump the output in the stream
You have 2 forward slashes together, right next to each other in the second sed example.

In shell, how to process this line, in order to extract the filed that I want

I have some lines in a plat file. Take 2 line for instance:
1 aa bb 05 may 2014 cc G 14-MAY-2014 hello world
j sd az 20140505 sd G 14-MAY-2014 hello world haha
So maybe you have noticed, I can count neither the number of the char, nor the number of the space, because the lines are not well aligned, and the forth field, sometimes it's like 20140505, sometimes it's like 05 may 2014. So what I want, is to try to match the G , or match the 14-MAY-2014. Then I can easily get the following fields: hello world or hello world haha. So Can anyone help me? thank you!
Assuming your lines are in a file called test.txt:
cat test.txt | sed -r 's/^.*-[0-9]{4}\s//'
This is using GNU sed on a Linux system. There are many other ways. Here i simply remove anything up to and including the date from the begiining of the line.
sed -r 's/^.*-[0-9]{4}\s//'
-r = extendes reg ex, makes things like the quantor {4} possible
's/ ... //' = s is for substitute,
it matches the first part and replaces it with the second.
since the resocond part is empty, it's a remove/delete
^ = start of line
.* = any character, any number of times
-[0-9]{4} = a dash, followed by four digits ([0-9]), the year part of the date
\s = any white space
You can make use of lookbehind regex of perl:
perl -lne '/(?<=14-MAY-2014)(.*)/ && print $1' file
It will print anything after 14-MAY-2014.
You can also use grep if it supports -P:
grep -Po '(?<=14-MAY-2014)(.*)' file

Resources