Update values in column in a file based on values from an array using bash script - bash

I have a text file with the following details.
#test.txt
team_id team_level team_state
23 2
21 4
45 5
I have an array in my code teamsstatearr=(12 34 45 ...) and I want to be able add the value in the array to the third column. The array could have many elements and the test.txt file is just a small portion that I have shown below.
Details of the file contents:
The text file has only three headers. The headers are separated by tab. The number of rows in the file are equivalent to the number of items in the array as well.
Thus my test.txt would look like the following.
team_id team_level team_state
23 2 12
21 4 34
45 5 45
(many more rows are present)
What I have done as of now: I don't see the file have the update in the third column with the values.
# Write the issue ids to file
for item in "${teamstatearr[#]}"
do
printf '%s\n' "item id in loop: ${item}"
awk -F, '{$2=($item)}1' OFS='\t', test.txt
done
I would appreciate if anyone could help me find the most easiest and efficient way to do it.

If you don't mind a slightly different table layout, you could do:
teamsstatearr=(12 34 45)
{
# print header
head -n1 test.txt
# combine the remaining lines of test.txt and the array values
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
# use `column -t` to format the output as table
} | column -t
Output:
team_id team_level team_state
23 2 12
21 4 34
45 5 45
To write the output to the same file, you can redirect the output to a new file and overwrite the original file with mv:
teamsstatearr=(12 34 45)
{
head -n1 test.txt
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
} | column -t > temp && mv temp test.txt
If you have sponge from the moreutils package installed, you could to this without a temporary file:
teamsstatearr=(12 34 45)
{
head -n1 test.txt
paste <(tail -n+2 test.txt) <(printf '%s\n' "${teamsstatearr[#]}")
} | column -t | sponge test.txt
Or using awk and column (with the same output):
teamsstatearr=(12 34 45)
awk -v str="${teamsstatearr[*]}" '
BEGIN{split(str, a)} # split `str` into array `a`
NR==1{print; next} # print header
{print $0, a[++cnt]} # print current line and next array element
' test.txt | column -t

Related

Using BASH, selecting row and column [CUT command?]

1 A 18 -180
2 B 19 -180
3 C 20 -150
50 D 21 -100
128 E 22 -130
10 F 23 -0
10 G 23 -0
In the above file, I can easily print out the column using cat command.
cat /file_directory | cut -d' ' -f3
In that case, the output would be the third column.
But, What I want to do is something different. For example, I wanna pick the element depending on the row element.
So if I pick B from the second row, the printout would be [row associated "B" in the second column ][column =3] = [2][3]. which is only 19, not anything else. HOW TO DO IT?
Use awk:
$ awk '$2 == "B" {print $3}' file.txt
19
awk splits each row into fields (by default using arbitrary whitespace a the field delimiters). Each statement has two parts: a pattern to select a line, and an action to take on a selected line. In the above, we check if the 2nd column ($2) has the value "B"; for each line for which that is true, we print the value in the 3rd column.
#!/bin/sh
cat sofile+.txt | tr -s ' ' > sofile1+.txt
mv sofile1+.txt sofile+.txt
cat > edcommands+.sh << EOF
/B/
EOF
line=$(ed -s sofile+.txt < edcommands+.txt)
echo ${line} | cut -d' ' -f2,3
rm ./edcommands+.txt
Sofile+.txt is what contains your data.
You might also need to install ed for this, since it isn't in most distributions by default any more, sadly.

Is there a way to use the cut command in BASH to print specific columns but with characters?

I know I can use -f1 to print a column, but is there a way for the cut to look through columns for a specific string and print out that column?
Not entirely clear if this is what you're looking for, but:
$ cat input
Length,Color,Height,Weight,Size
1,2,1,4,5
7,7,1,7,7
$ awk 'NR==1{for(i=1;i<=NF+1;i++) if($i==h) break; next} {print $i}' h=Color FS=, input
2
7
You can figure out the colum no by a small function like this:
function select_column() {
file="$1"
sep="$2"
col_name="$3"
# get the separators before the field:
separators=$(head -n 1 "${file}" | sed -e"s/\(.*${sep}\|^\)${col_name}\(${sep}.*\|$\)/\1/g" | tr -d -c ",")
# add one, because before the n-th fields there are n-1 separators
((field_no=${#separators}+1))
# now just call cut and skip the first row by using tail -n +2
cut -d "${sep}" -f ${field_no} "${file}" | tail -n +2
}
When called with:
select_column testfile.csv "," subno
it outputs:
10
76
55
83
30
53
67
25
52
16
57
86
2
75
28
on the following testfile.csv:
rand2,no,subno,rand1
john,8017610,10,96
ringo,5673276,76,42
ringo,9260555,55,19
john,7565683,83,72
ringo,8833230,30,35
paul,1571553,53,55
john,9972467,67,80
ringo,922025,25,88
paul,9908052,52,1
john,6264216,16,19
paul,4350857,57,3
paul,7253386,86,50
john,3426002,2,57
ringo,1437775,75,85
paul,4384228,28,77

Add line numbers for duplicate lines in a file

My text file would read as:
111
111
222
222
222
333
333
My resulting file would look like:
1,111
2,111
1,222
2,222
3,222
1,333
2,333
Or the resulting file could alternatively look like the following:
1
2
1
2
3
1
2
I've specified a comma as a delimiter here but it doesn't matter what the delimeter is --- I can modify that at a future date.In reality, I don't even need the original text file contents, just the line numbers, because I can just paste the line numbers against the original text file.
I am just not sure how I can go through numbering the lines based on repeated entries.
All items in list are duplicated at least once. There are no single occurrences of a line in the file.
$ awk -v OFS=',' '{print ++cnt[$0], $0}' file
1,111
2,111
1,222
2,222
3,222
1,333
2,333
Use a variable to save the previous line, and compare it to the current line. If they're the same, increment the counter, otherwise set it back to 1.
awk '{if ($0 == prev) counter++; else counter = 1; prev=$0; print counter}'
Perl solution:
perl -lne 'print ++$c{$_}' file
-n reads the input line by line
-l handles newlines
++$c{$_} increments the value assigned to the contents of the current line $_ in the hash table %c.
Software tools method, given textfile as input:
uniq -c textfile | cut -d' ' -f7 | xargs -L 1 seq 1
Shell loop-based variant of the above:
uniq -c textfile | while read a b ; do seq 1 $a ; done
Output (of either method):
1
2
1
2
3
1
2

Creating a mapping count

I have this data with two columns
Id Users
123 2
123 1
234 5
234 6
34 3
I want to create this count mapping from the given data like this
123 3
234 11
34 3
How can I do it in bash?
You have to use associative arrays, something like
declare -A newmap
newmap["123"]=2
newmap["123"]=$(( ${newmap["123"]} + 1))
obviously you have to iterate through your input, see if the entry exists then add to it, else initialize it
It will be easier with awk.
Solution 1: Doesn't expect the file to be sorted. Stores entire file in memory
awk '{a[$1]+=$2}END{for(x in a) print x,a[x]}' file
34 3
234 11
123 3
What we are doing here is using the first column as key and adding second column as value. In the END block we iterate over our array and print the key=value pair.
If you have the Id Users line in your input file and want to exclude it from the output, then add NR>1 condition by saying:
awk 'NR>1{a[$1]+=$2}END{for(x in a) print x,a[x]}' file
NR>1 is telling awk to skip the first line. NR contains the line number so we instruct awk to start creating our array from second line onwards.
Solution 2: Expects the file to be sorted. Does not store the file in memory.
awk '$1!=prev && NR>1{print prev,sum}{prev=$1; sum+=$2}END{print prev,sum}' file
123 3
234 14
34 17
If you have the Id Users line in your input file and want to exclude it from the output, then add NR>1 condition by saying:
awk '$1!=prev && NR>2{print prev, sum}NR>1{prev = $1; sum+=$2}END{print prev, sum}' ff
123 3
234 14
34 17
A Bash (4.0+) solution:
declare -Ai count
while read a b ; do
count[$a]+=b
done < "$infile"
for idx in ${!count[#]}; do
echo "${idx} ${count[$idx]}"
done
For a sorted output the last line should read
done | sort -n

Extract particular data from multiple files in UNIX

Extract a particular column value from multiple files
ls -ltr
-rwxr-xr-x 4 dc staff 131 Feb 27 21:15 test.txt
-rwxr-xr-x 4 dc staff 134 Feb 25 21:15 test1.txt
test.txt and test1.txt (similar structure) contains a table structure like
cat test.txt
RECORD #1 DETAILS
sl no. regno name age
1 20 ABC 10
cat test1.txt
RECORD #2 DETAILS
sl no. regno name age
1 21 DEF 11
I want to extract the 2nd column value from all .txt files and store it into some other files.
Ouput.txt should be
test.txt 20
test1.txt 21
It's not exactly clear what you are looking for, but if you just want to print the second column of the 4th line (and that is the ambiguity, as it's not clear if you always want the data from line 4, or the data from 3 lines after ^RECORD, or data from the line after each occurrence of "sl no.", etc), you could do:
$ awk 'FNR == 4 { print FILENAME, $2 }' test.txt test1.txt
or, if you are using an awk that does not support FILENAME (at the moment, I'm not sure if that is standard or a gnu extension) and you are not using csh or one of its cousins, you could do:
$ for n in test.txt test1.txt; do printf '$s ' $n; awk 'NR==4{ print $2}' $n; done
awk 'NR > 1 {print FILENAME, $2}' *txt > Output.txt
Might work for you. But if you want to make sure, that only parts after the header should be printed, you can do it like:
awk 'fname != FILENAME {p=0 ; fname=FILENAME} }
/sl no. regno name age/ {p++; next}
p>0 {print FILENAME, $2}' *txt > Output.txt

Resources