How to find and merge some specific lines from one file B to another file A in linux with condition that lines in file B can be increase or decrease - shell

File A:
abc
bcd
def
ghi
jkl
File B:
bcd
def
klm
Desired output:
abc
bcd
def
klm
ghi
jkl

Give this awk one-liner a try:
awk '!a[$0]++' fileA fileB > output
It works for your example files.

cat A B | sort -u will remove the repeated ones and do sorts, #Kent 's anwser is more elegant, but still, the output doesn't satisfy your description.

Related

Trying to merge 2 files but ignore new lines

I'm trying to merge 2 lists together: Only copy over common differences, but ignore new lines. Might be easier to explain by this:
a.txt b.txt
abc 123
def abc.^$234,~12
ghi abcdd
jkl asdf
mnn ghi.^$321,~11
opq jkl
mnn^$qws
zxy
Becomes:
output.txt:
abc.^$234,~12
def
ghi.^$321,~11
jkl
mnn^$qws
opq
Trying to combine to lists, copy common lines while dropping new lines.
This might work for you (GNU sed):
sed -nE '1{x;s/.*/cat file2/e;x};G;s/^([^\n]+)(\n.*)*\n(\1\>[^\n]*).*/\3/;P' file1
Slurp file2 into the hold space and then append it to each line in file1.
If the word in file1 matches a word in file2, print the contents of that line in file2. Otherwise, print the current line in file1.
you could try the diff and patch commands, they might help you.
diff -u old_file new_file > change.diff
patch new_file < change.diff
You're requirements aren't at all clear but this will produce the expected output you posted given the sample input you posted so it may be what you're looking for:
$ awk -F'[^[:alnum:]]' 'NR==FNR{a[$1]=$0; next} {print ($1 in a ? a[$1] : $1)}' b.txt a.txt
abc.^$234,~12
def
ghi.^$321,~11
jkl
mnn^$qws
opq
Using awk:
$ awk '
NR==FNR {
a[$0]
next
}
{
for(i in a)
if(index(i,$0)) {
print i
next
}
print
}' b a
Output:
abc.^$234,~12
def
ghi.^$321,~11
jkl
mnn^$qws
opq

BASH - Split file into several files based on conditions

I have a file (input.txt) with the following structure:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
...
I would like to split this file into multiple files (day.txt; month.txt; ...). Each new text file would contain all "header" lines (the one starting with >) and their content (lines between two header lines).
day.txt would therefore be:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
and month.txt:
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
I cannot use split -l in this case because the amount of lines is not the same for each category (day, month, etc.). However, each sub-category has the same number of lines (=3).
EDIT: As per OP adding 1 more solution now.
awk -F'[>_]' '/^>/{file=$2".txt"} {print > file}' Input_file
Explanation:
awk -F'[>_]' ' ##Creating field separator as > or _ in current lines.
/^>/{ file=$2".txt" } ##Searching a line which starts with > if yes then creating a variable named file whose value is 2nd field".txt"
{ print > file } ##Printing current line to variable file(which will create file name of variable file's value).
' Input_file ##Mentioning Input_file name here.
Following awk may help you on same.
awk '/^>day/{file="day.txt"} /^>month/{file="month.txt"} {print > file}' Input_file
You can set the record separator to > and then just set the file name based on the category given by $1.
$ awk -v RS=">" 'NF {f=$1; sub(/_.*$/, ".txt", f); printf ">%s", $0 > f}' input.txt
$ cat day.txt
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
$ cat month.txt
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
Here's a generic solution for >name_number format
$ awk 'match($0, /^>[^_]+_/){k = substr($0, RSTART+1, RLENGTH-2);
if(!(k in a)){close(op); a[k]; op=k".txt"}}
{print > op}' ip.txt
match($0, /^>[^_]+_/) if line matches >name_ at start of line
k = substr($0, RSTART+1, RLENGTH-2) save the name portion
if(!(k in a)) if the key is not found in array
a[k] add key to array
op=k".txt" output file name
close(op) in case there are too many files to write
print > op print input record to filename saved in op
Since each subcategory is composed of the same amount of lines, you can use grep's -A / --after flag to specify that number of lines to match after a header.
So if you know in advance the list of categories, you just have to grep the headers of their subcategories to redirect them with their content to the correct file :
lines_by_subcategory=3 # number of lines *after* a subcategory's header
for category in "month" "day"; do
grep ">$category" -A $lines_by_subcategory input.txt >> "$category.txt"
done
You can try it here.
Note that this isn't the most efficient solution as it must browse the input once for each category. Other solutions could instead browse the content and redirect each subcategory to their respective file in a single pass.

Print into one file the query result of a database file

I have two files:
abc
ghi
and the second (aka database file)
abc 123
def 456
ghi 789
and I want to query the database file to print the second column into the second column of the first file if there is a match
So my output would be
abc 123
ghi 789
logically, I understand what I have to do, but I lack the commands in bash for it...
my attempt was to use join with the -1 but I do not understand how to implement it...
what's wrong with join?
$ cat 1
abc
ghi
$ cat 2
abc 123
def 456
ghi 789
$ join 1 2
abc 123
ghi 789
then if you want to store it somewhere just redirect the stdout.
join is a little overkill here (as it requires sorting) because file1 has just one column. Can you not use grep -f?
grep -Fwf file1 file2
-F treats the content of file1 as strings, not patterns
-w looks for the whole word to match

Can't do operation with awk command [duplicate]

This question already has answers here:
Print second-to-last column/field in `awk`
(10 answers)
Closed 7 years ago.
What we have to do if input is variable each time and on the bases of that input we have to again make another operation on that output of first command. please refer below example.
suppose I executed x command on terminal and it gives me output as below (space separated):
abc efg hij klm nop qrs uvw
abc efg hij klm qrs uvw
Sometimes there are 6 columns and sometimes there are 5 columns.
I pipe this output to awk command to print the 6th column i.e. qrs, it returns the correct result in the 1st case but in second case it shows uvw.
If you want the last but one column then you can use of NF variable:
awk '{print $(NF-1)}' file
See this awk and the output!
awk '{print NF, $NF, $6}' <<EOF
abc efg hij klm nop qrs uvw
abc efg hij klm qrs uvw
EOF
awk starts counting from 1, so everything is correct.

How can I add a new line to a large file every n characters in terminal (one liner sed)?

What am I missing here?
file.txt:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
in Terminal:
> sed "s/.\{3\}/&\n/g" < file.txt > new-file.txt
result: new-file.txt
ABCnDEFnGHInJKLnMNOnPQRnSTUnVWXnYZ
Expected Result:
ABC
DEF
...
VWX
YZ
Use sed:
$ sed 's/.../&\n/g' file.txt
Or use grep:
$ grep -oE '.{1,3}' file.txt
result:
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
$ echo abcdefghi | dd cbs=3 conv=unblock 2>/dev/null
abc
def
ghi
Just with bash:
while read -n 3 chars; do printf "%s\n" "$chars"; done < file.txt > new-file.txt
An option, although maybe not quite correct depending on your input file is the gnu coreutil fold. This will wrap lines so that no line is more than w characters long, e.g.:
$ <<< 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' fold -w3
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
One way to do it is to explicitly hit the Enter key while typing the sed command:
$ sed 's/.\{3\}/&\
/g' < file.txt > new-file.txt
$ cat new-file.txt
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
The following ended up working for me:
perl -0777 -pe 's/(.{3})/\1\n/sg' < file.txt > new-file.txt
Still not sure why the original didn't work.
Thanks for your help.

Resources