Extract particular data from multiple files in UNIX - shell

Extract a particular column value from multiple files
ls -ltr
-rwxr-xr-x 4 dc staff 131 Feb 27 21:15 test.txt
-rwxr-xr-x 4 dc staff 134 Feb 25 21:15 test1.txt
test.txt and test1.txt (similar structure) contains a table structure like
cat test.txt
RECORD #1 DETAILS
sl no. regno name age
1 20 ABC 10
cat test1.txt
RECORD #2 DETAILS
sl no. regno name age
1 21 DEF 11
I want to extract the 2nd column value from all .txt files and store it into some other files.
Ouput.txt should be
test.txt 20
test1.txt 21

It's not exactly clear what you are looking for, but if you just want to print the second column of the 4th line (and that is the ambiguity, as it's not clear if you always want the data from line 4, or the data from 3 lines after ^RECORD, or data from the line after each occurrence of "sl no.", etc), you could do:
$ awk 'FNR == 4 { print FILENAME, $2 }' test.txt test1.txt
or, if you are using an awk that does not support FILENAME (at the moment, I'm not sure if that is standard or a gnu extension) and you are not using csh or one of its cousins, you could do:
$ for n in test.txt test1.txt; do printf '$s ' $n; awk 'NR==4{ print $2}' $n; done

awk 'NR > 1 {print FILENAME, $2}' *txt > Output.txt
Might work for you. But if you want to make sure, that only parts after the header should be printed, you can do it like:
awk 'fname != FILENAME {p=0 ; fname=FILENAME} }
/sl no. regno name age/ {p++; next}
p>0 {print FILENAME, $2}' *txt > Output.txt

Related

Update values in column in a file based on values from an array using bash script

I have a text file with the following details.
#test.txt
team_id team_level team_state
23 2
21 4
45 5
I have an array in my code teamsstatearr=(12 34 45 ...) and I want to be able add the value in the array to the third column. The array could have many elements and the test.txt file is just a small portion that I have shown below.
Details of the file contents:
The text file has only three headers. The headers are separated by tab. The number of rows in the file are equivalent to the number of items in the array as well.
Thus my test.txt would look like the following.
team_id team_level team_state
23 2 12
21 4 34
45 5 45
(many more rows are present)
What I have done as of now: I don't see the file have the update in the third column with the values.
# Write the issue ids to file
for item in "${teamstatearr[#]}"
do
printf '%s\n' "item id in loop: ${item}"
awk -F, '{$2=($item)}1' OFS='\t', test.txt
done
I would appreciate if anyone could help me find the most easiest and efficient way to do it.
If you don't mind a slightly different table layout, you could do:
teamsstatearr=(12 34 45)
{
# print header
head -n1 test.txt
# combine the remaining lines of test.txt and the array values
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
# use `column -t` to format the output as table
} | column -t
Output:
team_id team_level team_state
23 2 12
21 4 34
45 5 45
To write the output to the same file, you can redirect the output to a new file and overwrite the original file with mv:
teamsstatearr=(12 34 45)
{
head -n1 test.txt
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
} | column -t > temp && mv temp test.txt
If you have sponge from the moreutils package installed, you could to this without a temporary file:
teamsstatearr=(12 34 45)
{
head -n1 test.txt
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
} | column -t | sponge test.txt
Or using awk and column (with the same output):
teamsstatearr=(12 34 45)
awk -v str="${teamsstatearr[*]}" '
BEGIN{split(str, a)} # split `str` into array `a`
NR==1{print; next} # print header
{print $0, a[++cnt]} # print current line and next array element
' test.txt | column -t

Select the rows of a text file based on the values in the column

In my text file, I have something like below and I want to select the rows in which the second column has the value of "1".
flower 1 12
tree 2 13
car 3 14
sun 1 20
I have tried something like this: awk -F, 'int($1) == 1' test.txt > output.txt and the output was empty. What am I doing wrong?
The code in awk should be loke:
awk ' $2=="1" {print}' test.txt >output.txt

Add line numbers for duplicate lines in a file

My text file would read as:
111
111
222
222
222
333
333
My resulting file would look like:
1,111
2,111
1,222
2,222
3,222
1,333
2,333
Or the resulting file could alternatively look like the following:
1
2
1
2
3
1
2
I've specified a comma as a delimiter here but it doesn't matter what the delimeter is --- I can modify that at a future date.In reality, I don't even need the original text file contents, just the line numbers, because I can just paste the line numbers against the original text file.
I am just not sure how I can go through numbering the lines based on repeated entries.
All items in list are duplicated at least once. There are no single occurrences of a line in the file.
$ awk -v OFS=',' '{print ++cnt[$0], $0}' file
1,111
2,111
1,222
2,222
3,222
1,333
2,333
Use a variable to save the previous line, and compare it to the current line. If they're the same, increment the counter, otherwise set it back to 1.
awk '{if ($0 == prev) counter++; else counter = 1; prev=$0; print counter}'
Perl solution:
perl -lne 'print ++$c{$_}' file
-n reads the input line by line
-l handles newlines
++$c{$_} increments the value assigned to the contents of the current line $_ in the hash table %c.
Software tools method, given textfile as input:
uniq -c textfile | cut -d' ' -f7 | xargs -L 1 seq 1
Shell loop-based variant of the above:
uniq -c textfile | while read a b ; do seq 1 $a ; done
Output (of either method):
1
2
1
2
3
1
2

Print lines whose 1st and 4th column differ

I have a file with a bunch of lines of this form:
12 AAA 423 12 BBB beta^11 + 3*beta^10
18 AAA 1509 18 BBB -2*beta^17 - beta^16
18 AAA 781 12 BBB beta^16 - 5*beta^15
Now I would like to print only lines where the 1st and the 4th column differ (the columns are space-separated) (the values AAA and BBB are fixed). I know I can do that by getting all possible values in the first column and then use:
for i in $values; do
cat file.txt | grep "^$i" | grep -v " $i BBB"
done
However, this runs through the file as many times as how many different values appear in the first column. Is there a way how to do that simply in one pass only? I think I can do the comparison, my main problem is that I have no idea how to extract the space-separated columns.
This is something quite straight forward for awk:
awk '$1 != $4' file
With awk, you refer to the first field with $1, the second with $2 and so on. This way, you can compare the first and the forth with $1 != $4. If this is true (that is, $1 and $4 differ), awk performs its default action: print the current line.
For your sample input, this works:
$ awk '$1 != $4' file
18 AAA 781 12 BBB beta^16 - 5*beta^15
Note you can define a different field separator with -v FS="...". This way, you can tell awk that your lines contain fields tab / comma / ... separated. All together it would be like this: awk -v FS="\t" '$1 != $4' file.

Find difference in second field, report using first field (awk)

I have 2 (dummy) files
file1.txt
Tom 25
John 27
Bob 22
Justin 37
Nick 19
Max 42
file2.txt
Tom 25
John 40
Bob 22
Justin 37
Nick 19
Max 24
I want to compare the Second field of these files (the numbers). Then If they are different, report using the First field (Names). So the expected output would be the following.
John's age in file1.txt is different from file2.txt
Max's age in file1.txt is different from file2.txt
I don't know if my approach is good but I first parse the ages into another file and compare them. If they are different, I will look in which line number is the difference. Then I will go back to the original file and parse the Name of the person from THAT line.
I run the following code in shell.
$ awk '{print $2}' file1.txt > tmp1.txt
$ awk '{print $2}' file2.txt > tmp2.txt
$
$ different=$(diff tmp1.txt tmp2.txt | awk '{$1=""; print $0')
$
$ if ["${different}"]; then
$ #This is to get the line number where the ages are different
$ #so that I can go to THAT line in file1.txt and get the first field.
$ awk 'NR==FNR{a[$0];next}!($0 in a){print FNR}' tmp1.txt tmp2.txt > lineNumber.txt
$ fi
However, I am blocked here. I don't know if my approach is right or there's an easier approach.
Thanks a lot
awk 'NR==FNR{a[$1]=$2;next} $2!=a[$1]{print "Age of "$1" is different"}' file1 file2
awk '
NR==FNR{a[$1]=$2;next}
a[$1] != $2 {print $1"\047s age in "ARGV[1]" is different from "ARGV[2]}
' file1.txt file2.txt
If both files list the same names, something like this works:
join file{1,2}.txt | awk '$2 != $3 { print "Age of " $1 " is different" }'

Resources