How to replace columns (matching pattern) using awk? - bash

I am trying to use awk to edit files but I cant manage to do it without creating intermediate files.
Basicaly I want to search using column 1 in file2 and file3 and so on, and replace the 2nd column for matching 1st column lines. (note that file2 and file3 may contain other stuff)
I have
File1.txt
aaa 111
aaa 222
bbb 333
bbb 444
File2.txt
zzz zzz
aaa 999
zzz zzz
aaa 888
File3.txt
bbb 000
bbb 001
yyy yyy
yyy yyy
Desired output
aaa 999
aaa 888
bbb 000
bbb 001

this does what you specified but I guess there are many edge cases not covered.
$ awk 'NR==FNR{a[$1]; next} $1 in a' file{1..3}
aaa 999
aaa 888
bbb 000
bbb 001

Related

Joining two files that both have duplicate rows

I am trying to join two files that have identical column 1 and different column 2:
File1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
File2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
When I try to join them with
join File1 File2 > File3
I get
aaa 1 2
bbb 3 2
bbb 3 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 1 0
ccc 1 1
ccc 1 1
ccc 1 0
ccc 0 1
ccc 0 1
ccc 0 0
join is trying to expand the duplicates when all I want it to do is go line-by line so the output should be
aaa 1 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 0 0
How do I tell join to ignore duplicates and just combine the files line-by-line?
EDIT: This is being done in a loop with multiple files that all have the same column 1 but different column 2. I am joining the first two files into a temporary file and then looping through the other files joining with that temporary file.
Based on a suggestion from #Andre Wildberg, this worked best:
paste File1 <(cut -d " " -f 2 File2)
This allowed be to loop through a list of files:
cat File1 > tmp
for file in $files
do
paste tmp <(cut -d " " -f 2 $file) > tmpf
mv tmpf tmp
done
mv tmp FinalFile
Assumptions:
all files have the same number of rows
all files have the same values in the first column for the same numbered row
the final result set can fit into memory
Sample input:
$ for f in f{1..4}
do
echo "############ $f"
cat $f
done
############ f1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
############ f2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
############ f3
aaa 12
bbb 12
bbb 12
ccc 11
ccc 11
ccc 10
############ f4
aaa 202
bbb 202
bbb 202
ccc 201
ccc 201
ccc 200
One awk idea:
awk '
FNR==NR { a[FNR]=$0; next }
{ a[FNR]=a[FNR] OFS $2 }
END { for (i=1;i<=FNR;i++)
print a[i]
}
' f1 f2 f3 f4
This generates:
aaa 1 2 12 202
bbb 3 2 12 202
bbb 3 2 12 202
ccc 1 1 11 201
ccc 1 1 11 201
ccc 0 0 10 200

Can Anyone help me with Computer theory?

Hey guys I was given the following:
Consider the language S* where S = {aa, aaa}. Describe all the ways that a^12 can be written as the concatenation of factors in S.
I got 0 on this question even though it seemed pretty straight forward
I would interpreted it as in how many different ways can I reach 12 'a's.
Which would be 12 as you said:
one for only aa:
aa aa aa aa aa aa
one for only aaa
aaa aaa aaa aaa
and (5*4)/2 for two aaas
aa aa aa aaa aaa
aa aa aaa aa aaa
aa aaa aa aa aaa
aaa aa aa aa aaa
aa aa aaa aaa aa
aa aaa aa aaa aa
aaa aa aa aaa aa
aa aaa aaa aa aa
aaa aa aaa aa aa
aaa aaa aa aa aa

How to find any decrement in the column?

I am trying to find out the decrements in a column and if found then print the last highest value.
For example:
From 111 to 445 there is a continous increment in the column.But 333 is less then the number before it.
111 aaa
112 aaa
112 aaa
113 sdf
115 aaa
222 ddd
333 sss
333 sss
444 sss
445 sss
333 aaa<<<<<<this is less then the number above it (445)
If any such scenario is found then print 445 sss
Like this, for example:
$ awk '{if (before>$1) {print before_line}} {before=$1; before_line=$0}' a
445 sss
What is it doing? Check the variable before and compare its value with the current. In case it is bigger, print the line.
It works for many cases as well:
$ cat a
111 aaa
112 aaa
112 aaa
113 sdf
115 aaa <--- this
15 aaa
222 ddd
333 sss
333 sss
444 sss
445 sss <--- this
333 aaa
$ awk '{if (before>$1) {print before_line}} {before=$1; before_line=$0}' a
115 aaa
445 sss
Store each number in a single variable called prevNumber then when you come to print the next one do a check e.g. if (newNumber < prevNumber) print prevNumber;
dont really know what language you are using
You can say:
awk '$1 > max {max=$1; maxline=$0}; END{ print maxline}' inputfile
For your input, it'd print:
445 sss

Replace column by comparing with the other column

I have a file like this
1 CC AAA
1 Na AAA
1 Na AAA
1 Na AAA
1 Na AAA
1 CC BBB
1 Na BBB
1 Na BBB
1 xa BBB
1 CC CCC
1 Na CCC
1 da CCC
I would like to remove the column 2 and then replce with "01" for AAA, "02" for BBB and so on for entire file. Finally the output should looks like,
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 02 BBB
1 02 BBB
1 02 BBB
1 02 BBB
1 03 CCC
1 03 CCC
1 03 CCC
I dont have any clue to make this working. Please help me if possible. Here in every cc the new variable starts. that is from AAA to BBB can be track by only CC in 2nd column.
One way of doing it in awk:
awk '$3!=a&&NF{a=$3;x=sprintf("%02d",++x);print $1,x,$3;next}$3==a&&NF{print $1,x,$3;next }1' inputFile
Here's one way using awk:
awk '$3 != r { ++i } { $2 = sprintf ("%02d", i) } { r = $3 }1' OFS="\t" file
I've set the OFS to a tab-char, but you can choose what you like. Results:
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 01 AAA
1 02 BBB
1 02 BBB
1 02 BBB
1 02 BBB
1 03 CCC
1 03 CCC
1 03 CCC
Seems like you want:
awk '$2=="CC" { a+=1 } {$2=sprintf("%02d",a)} 1' input

awk : multilines output fr single string w easy looking replacement

000 000 000 000 (4 fields each of which is a group of 3 zeros separated by a space)
Process to generate 4 new lines
100 000 000 000
000 100 000 000
000 000 100 000
000 000 000 100
On ea line a group of three zeros is replaced by 100
How can I do this ?
tom
$ echo '000 000 000 000' | awk '{for (i=1;i<=NF;i++) {f=$i; $i="100"; print; $i=f}}'
100 000 000 000
000 100 000 000
000 000 100 000
000 000 000 100
Edit:
The fields are iterated over using the for loop. Each field ($i - for the field number i) is saved to a temporary variable f. Then the contents of the field are replaced. The record ($0) is printed. The field is returned to its previous value using the temporary value.
It might be easier to follow if this data was used: 001 002 003 004. Then the output would look like:
100 002 003 004
001 100 003 004
001 002 100 004
001 002 003 100
Here's a shell script version using sed:
data='001 002 003 004' # or 000 000 000 000
for i in {1..4}; do echo "$data" | sed "s/\<[0-9]\{3\}\>/100/$i"; done
or
count=$(echo "data" | $wc -w)
for ((i=1;i<=count;i++)); do echo "$data" | sed "s/\<[0-9]\{3\}\>/100/$i"; done
or Bash without any external utilities:
data=(001 002 003 004) # use an array
count=${#data[#]}
for ((i=0;i<count;i++)); do f=${data[i]}; data[i]=100; echo "${data[#]}"; data[i]=$f; done
Or many other possibilities.

Resources