How do I merge different text files? - bash

I have 3 txt files:
file1.txt:
11
file2.txt:
22
file3.txt:
33
I want to combine the 3 text files into a single file and put a comma between them.
endfile.txt should be as follows:
11,22,33
I'd try:
cat file1.txt; cat file2.txt; cat file3.txt > endfile.txt
Wrote line by line but I want to print side by side and put a comma
Could you help?

cat file1.txt | cat - file2.txt | cat - file3.txt | tr '\n' ',' | head --bytes -1

A very easy approach simply uses printf:
(printf "%s" $(cat file1.txt); printf ",%s" $(cat file2.txt file3.txt)) > endfile.txt
Which would results in 11,22,33 in endfile.txt. The two grouping of printf were used to prevent a comma from being written before 11 and the entire line is executed as a subshell so output from all commands is redirected to endfile.txt. You may also want to write a final '\n' after file3.txt to ensure the resulting endfile.txt contains a POSIX line-ending.

My answer is following.
$ cat *.txt | sed -z 's/\n\(.\)/,\1/g'
If you define exactly order, it is following.
$ cat file{1,2,3}.txt | sed -z 's/\n\(.\)/,\1/g'
CAUTION
My sed is version 4.8.
$ sed --version | head -n 1
sed (GNU sed) 4.8

Use paste:
paste -sd, file1.txt file2.txt file3.txt > endfile.txt

Related

How to merge multiple files in order and append filename at the end in bash

I have multiple files like this:
BOB_1.brother_bob12.txt
BOB_2.brother_bob12.txt
..
BOB_35.brother_bob12.txt
How to join these files in order from {1..36} and append filename at the end of each row? I have tried:
for i in *.txt; do sed 's/$/ '"$i"'/' $i; done > outfile #joins but not in order
cat $(for((i=1;i<38;i++)); do echo -n "BOB_${i}.brother_bob12.txt "; done) # joins in order but no filename at the end
file sample:
1 345 378 1 3 4 5 C T
1 456 789 -1 2 3 4 A T
Do not do cat $(....). You may just:
for ((i=1;i<38;i++)); do
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done
You may also do:
printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'
You may use:
for i in {1..36}; do
fn="BOB_${i}.brother_bob12.txt"
[[ -f $fn ]] && awk -v OFS='\t' '{print $0, FILENAME}' "$fn"
done > output
Note that it will insert FILENAME as the last field in every record. If this is not what you want then show your expected output in question.
This might work for you (GNU sed);
sed -n 'p;F' BOB_{1..36}.brother_bob12.txt | sed 'N;s/\n/ /' >newFile
Used 2 invocations of sed, the first to append the file name after each line of each file. The second to replace the newline between each 2 lines by a space.

How to apply 'awk' for all files in folder?

I am new to awk pls pardon my ignorance. I am using awk to extract tag values from file. following code works for single execution
awk -F"<NAME>|</NAME>" '{print $2; exit;}' file.txt
but I am not sure how I can run it for all files in folder.
File sample is as follows
<HEADER><H1></H1></HEADER><BODY><NAME>XYZ</NAME><DATE>2015-12-11</DATE></BODY>
#!/bin/bash
STRING=ABC
DATE=$(date +%Y/%m/%d | tr '/' '-')
changedate(){
for a in $(ls /root/Working/awk/*)
do
for b in $(awk -F"<NAME>|</NAME>" '{print $2;}' "$a")
do
if [ "$b" == "$STRING" ]; then
for c in $(awk -F"<DATE>|</DATE>" '{print $2;}' "$a")
do
sed "s/$c/$DATE/g" "$a";
done
else
echo "Strings are not a match";
fi
done
done
}
changedate
When you run it -
root#revolt:~# cat /root/Working/awk/*
<HEADER><H1></H1></HEADER><BODY><NAME>ABC</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>DEF</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>GHI</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>JKL</NAME><DATE>2015-12-11</DATE></BODY>
String in code is set to ABC
root#revolt:~# ./ANSWER
<HEADER><H1></H1></HEADER><BODY><NAME>ABC</NAME><DATE>2015-07-24</DATE></BODY>
Strings are not a match
Strings are not a match
Strings are not a match
String in code is set to DEF
root#revolt:~# ./ANSWER
Strings are not a match
<HEADER><H1></H1></HEADER><BODY><NAME>DEF</NAME><DATE>2015-07-24</DATE></BODY>
Strings are not a match
Strings are not a match
Alright. So in this you would set the STRING=ABC or whatever your desired string is. You can also set it to = a list of strings you're checking for.
The date variable echoes the date in the same format (Y/m/d) as your string. The tr command then replaces all instances of forward slashes with hyphens.
First we're creating a function called "changedate". Within this function we're going to nest a few for loops to do different things. The first for loop sets ls /root/Working/awk/* to the variable a. This means that for each instance of a file/directory in /root/Working/awk/, do the following.
The next for loop is checking for of each instance, grab between the Name tags and print it. Notice we're still using $a as the file because that's going to be the file path for each file. Then we're going to have an if statement to check for your string. If it is true, then do another for loop that will substitute the date in file a. If it isn't true, then echo Strings are not a match.
Lastly, we call our "changedate" function which basically runs the entire looping sequence above.
To answer somewhat generically your question about running awk on multiple
files, imagine we have these files:
$ cat file1.txt
<HEADER><H1></H1></HEADER><BODY><NAME>XYZ</NAME><DATE>2015-12-11</DATE></BODY>
$ cat file2.txt
<HEADER><H1></H1></HEADER><BODY><NAME>ABC</NAME><DATE>2015-12-11</DATE></BODY>
$ cat file3.txt
<HEADER><H1></H1></HEADER><BODY><NAME>123</NAME><DATE>2015-12-11</DATE></BODY>
One thing you can do is simply supply awk with multiple files as with almost any command (like ls *.txt):
$ awk -F"<NAME>|</NAME>" '{print $2}' *.txt
XYZ
ABC
123
Awk just reads lines from each file in turn. As mentioned in the comments,
be careful with exit because it will stop processing all together after the first match::
$ awk -F"<NAME>|</NAME>" '{print $2; exit}' *.txt
XYZ
However, if for efficiency or some other reason you want to stop
processing in the current file and move on immediately to the next one,
you can use the gawk only nextfile:
$ # GAWK ONLY!
$ gawk -F"<NAME>|</NAME>" '{print $2; nextfile}' *.txt
XYZ
ABC
123
Sometimes the results on multiple files are not useful without knowing
which lines came from which file. For that you can use the built in FILENAME
variable:
$ awk -F"<NAME>|</NAME>" '{print FILENAME, $2}' *.txt
file1.txt XYZ
file2.txt ABC
file3.txt 123
Things get trickier when you want to modify the files you are working
on. Imagine you want to convert the name to lower case:
$ awk -F"<NAME>|</NAME>" '{print tolower($2)}' *.txt
xyz
abc
123
With traditional awk, the usual pattern is to save to a temp file and copy
the temp file back to the original (obviously you want to be careful with
this, keeping copies of the orignals!)
$ cat file1.txt
<HEADER><H1></H1></HEADER><BODY><NAME>XYZ</NAME><DATE>2015-12-11</DATE></BODY>
$ awk -F"<NAME>|</NAME>" '{ sub($2,tolower($2)); print }' file1.txt > tmp && mv tmp file1.txt
$ cat file1.txt
<HEADER><H1></H1></HEADER><BODY><NAME>xyz</NAME><DATE>2015-12-11</DATE></BODY>
To use this style on multiple files, it's probably easier to drop back to
the shell and run awk in a loop on single files:
$ cat file1.txt file2.txt file3.txt
<HEADER><H1></H1></HEADER><BODY><NAME>XYZ</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>ABC</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>123</NAME><DATE>2015-12-11</DATE></BODY>
$ for f in file*.txt; do
> awk -F"<NAME>|</NAME>" '{ sub($2,tolower($2)); print }' $f > tmp && mv tmp $f
> done
$ cat file1.txt file2.txt file3.txt
<HEADER><H1></H1></HEADER><BODY><NAME>xyz</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>abc</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>123</NAME><DATE>2015-12-11</DATE></BODY>
Finally, with gawk you have the option if in-place editing (much like sed -i):
$ cat file1.txt file2.txt file3.txt
<HEADER><H1></H1></HEADER><BODY><NAME>XYZ</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>ABC</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>123</NAME><DATE>2015-12-11</DATE></BODY>
$ # GAWK ONLY!
$ gawk -v INPLACE_SUFFIX=.sav -i inplace -F"<NAME>|</NAME>" '{ sub($2,tolower($2)); print }' *.txt
$ cat file1.txt file2.txt file3.txt
<HEADER><H1></H1></HEADER><BODY><NAME>xyz</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>abc</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>123</NAME><DATE>2015-12-11</DATE></BODY>
The recommended INPLACE_SUFFIX variable tells gawk to make backups of
each file with that extension:
$ cat file1.txt.sav file2.txt.sav file3.txt.sav
<HEADER><H1></H1></HEADER><BODY><NAME>XYZ</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>ABC</NAME><DATE>2015-12-11</DATE></BODY>
<HEADER><H1></H1></HEADER><BODY><NAME>123</NAME><DATE>2015-12-11</DATE></BODY>

Using SED -n with variables in a script

I am trying to use sed in a script but it keeps failing. The commands on their own seem to work so I am not sure what I am doing wrong:
This is in Terminal in OS X (bash)
NOTE: file1.txt contains multiple lines of text
N=1
sed -n $Np < file1.txt > file2.txt
CONTENT=$(cat file2.txt)
echo $CONTENT
If I change $N to 1, it works perfectly
sed -n 1p <file1.txt >file2.txt
CONTENT=$(cat file2.txt)
echo $CONTENT
gives me
content of file2.txt
So basically, I am trying to copy the text from line 1 of a file to the start of line 2 of a file ... IF line 2 does not already start with the content of line 1.
The shell doesn't know that you want $N and not $Np.
Do this: ${N}p
Change:
sed -n $Np < file1.txt > file2.txt
to
sed -n ${N}p < file1.txt > file2.txt
You code has no clue what variable Np is...
Since N was an integer, I ended up using the following.
sed -n `expr $N`p < file1 > file2
This also allows me to get the next line in a file using
sed -n `expr $N+1`p < file1 > file2
Thanks for your help!!!
You should use >> for append redirection. > overwrites the original file.
How to append the output to a file?

extract multiple lines of a file unix

I have a file A with 400,000 lines. I have another file B that has a bunch of line numbers.
File B:
-------
98
101
25012
10098
23489
I have to extract those line numbers specified in file B from file A. That is I want to extract lines 98,101,25012,10098,23489 from file A. How to extract these lines in the following cases.
File B is a explicit file.
File B is arriving out of a pipe. For eg., grep -n pattern somefile.txt is giving me the file B.
I wanted to use see -n 'x'p fileA. However, I don't know how to give the 'x' from a file. Also, I don't to how to pipe the value of 'x' from a command.
sed can print the line numbers you want:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p'
bar
If you want multiple lines:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p;3p'
bar
baz
To transform a set of lines to a sed command like this, use sed for beautiful sedception:
$ printf $'98\n101' | sed -e 's/$/;/'
98;
101;
Putting it all together:
sed -ne "$(sed -e 's/$/p;/' B)" A
Testing:
$ cat A
1
22
333
4444
$ cat B
1
3
$ sed -ne "$(sed -e 's/$/p;/' B)" A
1
333
QED.
awk fits this task better:
fileA in file case:
awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB fileA
fileA content from pipe:
cat fileA|awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB -
oh, you want FileB in file or from pipe, then same awk cmd:
awk '...' fileB fileA
and
cat fileB|awk '...' - fileA

bash script: check if all words from one file are contained in another, otherwise issue error

I was wondering if you could help. I am new to bash scripting.
I want to be able to compare two lists. File1.txt will contain a list of a lot of parameters and file2.txt will only contain a section of those parameters.
File1.txt
dbipAddress=192.168.175.130
QAGENT_QCF=AGENT_QCF
QADJUST_INVENTORY_Q=ADJUST_INVENTORY_Q
QCREATE_ORDER_Q=CREATE_ORDER_Q
QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
File2.txt
AGENT_QCF
ADJUST_INVENTORY_Q
CREATE_ORDER_Q
I want to check if all the Qs in file1.txt are contained in file2.txt (after the =). If they aren't, then the bash script should stop and echo a message.
So, in the example above the script should stop as File2.txt does not contain the following Q: LOAD_INVENTORY_Q.
The Qs in file1.txt or file2.txt do not follow any particular order.
The following command will print out lines in file1.txt with values (anything appearing after =) that do not appear in file2.txt.
[me#home]$ awk -F= 'FNR==NR{keys[$0];next};!($2 in keys)' file2.txt file1.txt
dbipAddress=192.168.175.130
QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
Breakdown of the command:
awk -F= 'FNR==NR{keys[$0];next};!($2 in keys)' file2.txt file1.txt
--- ---------------------- -------------
| | |
change the | Target lines in file1.txt where
delimiter | the second column (delimited by `=`) do
to '=' | not exist in the keys[] array.
Store each line in
file2.txt as a key
in the keys[] array
To do something more elaborate, say if you wish to run the command as a pre-filter to make sure the file is valid before proceeding with your script, you can use:
awk -F= 'FNR==NR{K[$0];N++;next};!($2 in K) {print "Line "(NR-N)": "$0; E++};END{exit E}' file2.txt file1.txt
ERRS=$?
if [ $ERRS -ne 0 ]; then
# errors found, do something ...
fi
That will print out all lines (including line numbers) in file1.txt that do not meet the bill, and returns an exit code that matches the number of non-conforming lines. That way your script can detect the errors easily by checking $? and respond accordingly.
Example output:
[me#home]$ awk -F= 'FNR==NR{K[$0];N++;next};!($2 in K) {print "Line "(NR-N)": "$0;E++};END{exit E}' file2.txt file1.txt
Line 1: dbipAddress=192.168.175.130
Line 5: QLOAD_INVENTORY_Q=LOAD_INVENTORY_Q
[me#home]$ echo $?
2
You can use cut to get only the part after =. comm can be used to output the lines contained in the first file but not the second one:
grep ^Q File1.txt | cut -d= -f2- | sort | comm -23 - <(sort File2.txt)
The following command line expression will filter out the lines that occur in file2.txt but not file1.txt:
cat file1.txt | grep -Fvf file2.txt | grep '^Q'
explanation:
-F : match patterns exactly (no expansion etc.) ; much faster
-v : only print lines that don't match
-f : get your patterns from the file specified
| grep '^Q' : pipe the output into grep, and look for lines that start with "Q"
This isn't exactly "stop the bash script when..." since it will process and print every mismatch; also, it doesn't test that there's an "=" in front of the pattern - but I hope it's useful.
Here's another way:
missing=($(comm -23 <(awk -F= '/^Q/ {print $2}' file1.txt | sort) <(sort file2.txt)))
if (( ${#missing[#]} )); then
echo >&2 "The following items are missing from file2.txt:"
printf '%s\n' "${missing[#]}"
exit 1
fi
Assuming that the relevant lines in file1.txt always start with a Q:
grep "^Q" file1.txt | while IFS= read -r line
do
what=${line#*=}
grep -Fxq "$what" file2.txt || echo "error: $what not found"
done
Output:
error: LOAD_INVENTORY_Q not found

Resources