Bash Iterate through repeated values in file - bash

I have a file with this format:
User_ID , Place_ID , Rating
U32 , 1305 , 2
U32 , 1276 , 2
U32 , 1789 , 3
U65 , 1985 , 1
U65 , 1305 , 1
U65 , 1276 , 2
I would like to iterate through this file, sort by Place_ID, iterate through repeated values in Place_ID and add the ratings, once the last element of the Place_ID is added, check if value > x and if true, push the Place_ID into an array.
Ex: Place_ID 1305: 2 + 1 / 2 = 1.5 > 1 ----> ids+=($id)
Place_ID 1276: 2 + 2 / 2 = 2 > 1 -----> ids+=($id)
I have tried with
test5 () {
id=0
count=0
rating=0
ids=()
ratings=()
for i in `sort -t',' -k 2 ratings.csv`
do
aux=`echo "$i"| cut -f2 -d','`
if (( $id != $aux )); then
if (( $rating != 0 )); then
rating=`echo "scale=1; $rating / $count" | bc -l`
if (( $(echo "$rating >= 1" | bc -l) )); then
ids+=($id)
ratings+=($rating)
fi
fi
id=$aux
count=0
rating=0
else
rating=$(($rating + `echo "$i"| cut -f3 -d','`))
count=$(($count + 1))
fi
done
echo ${#ids[#]}
echo ${#ratings[#]}
}
EDIT: I think it works, but is there a way to make it better? Something that doesn't force me to use as many if's and count.
Thanks for the help.

This is another option using less if's:
#!/bin/bash
sum=()
count=()
while read -r line; do
place=$(echo "$line" | cut -d',' -f2)
rating=$(echo "$line" | cut -d',' -f3)
sum[$place]=$(echo "$rating + ${sum[$place]-0}" | bc -l)
count[$place]=$((count[$place] + 1))
done < <( sed 1d ratings.csv | sort -t',' -k 2 | tr -d '[:blank:]' )
ratings=()
for place in "${!sum[#]}"; do
ratings[$place]=$(echo "scale=1; ${sum[$place]} / ${count[$place]}" | bc -l)
done
# ratings at this point has the ratings for each place
echo ${!ratings[#]} # place ids
echo ${ratings[#]} # ratings
I'm assuming your ratings.csv has headers that is why this has sed 1d ratings.csv

Related

Bash - adding numbers from array matrix

When adding numbers in a matrix, I get this error:
line 271: 1 2 3 4: syntax error in expression (error token is "2 3 4")
My add function:
add()
{
#Reading matrices into temp files
while read line1 <&3 && read line2 <&4
do
echo "$line1" | tr "\n" "\t" >> "temp50"
echo "$line2" | tr "\n" "\t" >> "temp60"
done 3<$temp1 4<$fileTwo
echo >>"temp50"
echo >>"temp60"
cat "temp60" >> "temp50"
i=1
x=1
while [ $i -le $totalNum ]
do
sum=0
cut -f $i "temp50" > "temp55"
while read num
do
sum=$(($sum + $num))
done <"temp55"
echo "$sum" | tr "\n" "\t" >> "temp65"
#try and remove hanging tab
if [[ "$x" -eq "$numcolOne" ]]
then
rev "temp65" > "temp222"
cat "temp222" | cut -c 1- >"temp333"
rev "temp333">"temp65"
x=0
fi
i=$((i+1))
x=$((x+1))
done
Matrix array (temp1):
1 2 3 4
5 6 7 8
Function is supposed to add the matrix array in the temp1 file; sample output:
2 4 6 8
10 12 14 16
Appreciate anyone's help!

extract a numeric substring and add value to it

I have a string like 1001.2001.3001.5001.6001 or 1001-2001-3001-5001-6001. How to extract the 4th string i.e., 5001, add a value like 121 to it and put it back in the same string. The output should be like 1001.2001.3001.5122.6001 or 1001-2001-3001-5122-6001. I have to achieve this in Linux bash scripting.
Try this
#!/bin/bash
str=$1
if [[ $(echo $str | grep '\.' | wc -l) == 1 ]]
then
str1=$(echo $str | cut -d '.' -f 1,2,3)
str2=$(echo $str | cut -d '.' -f 4 | awk {'print $1+121'})
str3=$(echo $str | cut -d '.' -f 5)
echo $str1.$str2.$str3
elif [[ $(echo $str | grep - | wc -l) == 1 ]]
then
str1=$(echo $str | cut -d '-' -f 1,2,3)
str2=$(echo $str | cut -d '-' -f 4 | awk {'print $1+121'})
str3=$(echo $str | cut -d '-' -f 5)
echo $str1-$str2-$str3
else
echo "do nothing"
fi
Pass a string as parameter
No pipes, no forks, no cutting, no awking, just plain POSIX shell:
$ s=1001.2001.3001.5001.6001
$ oldIFS=$IFS
$ IFS=.-
$ set -- $s
$ case $s in
> (*.*) echo "$1.$2.$3.$(($4 + 121)).$5";;
> (*-*) echo "$1-$2-$3-$(($4 + 121))-$5";;
> esac
1001.2001.3001.5122.6001
$ IFS=$oldIFS
One liner
value=121 ; str='1001.2001.3001.5001.6001' ; token="$(echo "$str" | cut -f 4 -d '.')" ; newtoken=$(( $token + $value )) ; newstr="$(echo "$str" | sed -e "s/$token/$newtoken/g" | tr '.' '-')" ; echo "$newstr"
Breakdown:
value=121 # <- Increment
str='1001.2001.3001.5001.6001' # <- Initial String
token="$(echo "$str" | cut -f 4 -d '.')" # <- Extract the 4th field with . sep
newtoken=$(( $token + $value )) # <- Add value and save to $newtoken
newstr="$(echo "$str" \
| sed -e "s/$token/$newtoken/g" \
| tr '.' '-')" # <- Replace 4th field with $newtoken
# and translate "." to "-"
echo "$newstr" # <- Echo new string
Works in:
Bash
sh
FreeBSD
Busybox
Using out of the box tools
If the field separator can either be . or -, then do something like
echo "1001.2001.3001.5001.6001" | awk 'BEGIN{FS="[.-]";OFS="-"}{$4+=121}1'
1001-2001-3001-5122-6001
However, if you need to match the regex FS or field separator with OFS then you need to have gawk installed
echo "1001.2001.3001.5001.6001" |
gawk 'BEGIN{FS="[.-]"}{split($0,a,FS,seps)}{$4+=121;OFS=seps[1]}1'
1001.2001.3001.5122.6001
Though resetting the argument list with the values is probably the preferred way, or by setting IFS to the delimiter and reading the values into an array and adding the desired value to the array index at issue, you can also do it with a simple loop to look for the delimiters and continually skipping characters until the desired segment is found (4 in you case -- when the delimiter count is 3). Then simply appending the digit at each array index until your next delimiter is found will give you the base value. Simply adding your desired 121 to the completed number completes the script, e.g.
#!/bin/bash
str=${1:-"1001.2001.3001.5001.6001"} ## string
ele=${2:-4} ## element to add value to [1, 2, 3, ...]
add=${3:-121} ## value to add to element
cnt=0 ## flag to track delimiters found
num=
## for each character in str
for ((i = 0; i < ${#str}; i++))
do
if [ "${str:$i:1}" = '.' -o "${str:$i:1}" = '-' ] ## is it '.' or '-'
then
(( cnt++ )) ## increment count
(( cnt == ele )) && break ## if equal to ele, break
## check each char is a valid digit 0-9
elif [ "0" -le "${str:$i:1}" -a "${str:$4i:1}" -le "9" ]
then
(( cnt == (ele - 1) )) || continue ## it not one of interest, continue
num="$num${str:$i:1}" ## append digit to num
fi
done
((num += add)) ## add the amount to num
printf "num: %d\n" $num ## print results
Example Use/Output
$ bash parsenum.sh
num: 5122
$ bash parsenum.sh "1001.2001.3001.5001.6001" 2
num: 2122
$ bash parsenum.sh "1001.2001.3001.5001.6001" 2 221
num: 2222
Look things over and let me know if you have any questions.

Rounding up to 3 decimal points just truncates the rest

read n
i=0
sum=0
while [ $i -lt $n ]
do
read X
sum=`expr $X + $sum `
i=`expr $i + 1 `
done
echo "scale = 3; $sum/$n" | bc -l
--my above code is rounding upto a lesser value, where i want the greater one
e.g. if the ans is 4696.9466 it is rounding up to 4696.946 whereas 4696.947 is what i want. So , suggest any edits
You may pipe your bc to printf :
echo "scale = 4; $sum/$n" | bc -l | xargs printf '%.*f\n' 3
From you example :
$ echo "scale = 4; 4696.9466" | bc -l | xargs printf '%.*f\n' 3
4696,947
Change last line of your script from echo "scale = 3; $sum/$n" | bc -l to
printf %.3f $(echo "$sum/$n" | bc -l)
printf will round it off correctly. For example,
$ sum=1345
$ n=7
$ echo "$sum/$n" | bc -l
192.14285714285714285714
$ printf %.3f $(echo "$sum/$n" | bc -l)
192.143

Assigning variables to values in a text file with 3 columns, line by line

I've got a .txt file with three columns, each separated by a tab, and 264 rows called PowerCoords.txt. Each row contains an x (column 1), y (column2) and z (column3) coordinate. I want to go through this file, line by line, assign each value to X,Y, and Z, and then input those variables into another function.
I'm new to bash, and I don't understand how to specify that I want the value in Row 1, Column 2 to be the variable Y, and so on...
I know this is likely super simple and I could do it in a flash in Matlab, but I'm trying to keep everything in bash.
while read x y z; do
echo x=$x y=$y z=$z
done < input.txt
The above requires that none of your columns contain any whitespace.
EDIT:
In response to comments, here is one technique to handle numbering the lines:
nl -ba < input.txt | while read line x y z rest; do
~/data/standard/MNI152_T1_2mm -mul 0 \
-add 1 -roi $x 1 $y 1 $z 1 0 1 point -odt float > NewFile$line
done
William Pursell's answer is much smarter, but in my straight-forward beginners mind I tried following some time ago:
#!/bin/bash
data="data.dat"
datalength=`wc $data | awk '{print $1;}'`
for (( i=1; i<=$datalength; i++ )) ;do
x=`cat $data | awk '{print $1;}' | sed -n "$i"p | sed -e 's/[eE]+*/\\*10\\^/'` ; x=`echo "$x" | bc -l` ; echo "x$i=$x";
y=`cat $data | awk '{print $2;}' | sed -n "$i"p | sed -e 's/[eE]+*/\\*10\\^/'` ; y=`echo "$y" | bc -l` ; echo "y$i=$y";
z=`cat $data | awk '{print $3;}' | sed -n "$i"p | sed -e 's/[eE]+*/\\*10\\^/'` ; z=`echo "$z" | bc -l` ; echo "z$i=$z";
# do something with xyz:
fslmaths ~/data/standard/MNI152_T1_2mm -mul 0 -add 1 -roi $x 1 $y 1 $z 1 0 1 point -odt float > NewFile$i
done
The bc and the sed -e 's/[eE]+*/\\*10\\^/' have to be added if you like to use floating point numbers and for the case that input also uses exponential notation.
I had a similar problem but for lots of input data those bash scripts are very slow. I migrated to perl then. In perl it would look like this:
#!/usr/bin/perl -w
use strict;
open (IN, "data.dat") or die "Error opening";
my $i=0;
for my $line (<IN>){
$i++;
open(OUT, ">NewFile$i.out");
chomp $line;
(my $x,my $y,my $z) = split '\t',$line;
print "$x $y $z\n";
# do something with xyz:
my $f= fslmaths ~/data/standard/MNI152_T1_2mm -mul 0 -add 1 -roi $x 1 $y 1 $z 1 0 1 point -odt float
print OUT "f= $f\n";
close OUT;
}
close IN;

How can I align the columns of tables in Bash?

I want to format text as a table. I tried echoing with a '\t' separator, but it was misaligned.
Desired output:
a very long string.......... 112232432 anotherfield
a smaller string 123124343 anotherfield
Use the column command:
column -t -s' ' filename
printf is great, but people forget about it.
$ for num in 1 10 100 1000 10000 100000 1000000; do printf "%10s %s\n" $num "foobar"; done
1 foobar
10 foobar
100 foobar
1000 foobar
10000 foobar
100000 foobar
1000000 foobar
$ for((i=0;i<array_size;i++));
do
printf "%10s %10d %10s" stringarray[$i] numberarray[$i] anotherfieldarray[%i]
done
Notice I used %10s for strings. %s is the important part. It tells it to use a string. The 10 in the middle says how many columns it is to be. %d is for numerics (digits).
See man 1 printf for more info.
function printTable()
{
local -r delimiter="${1}"
local -r data="$(removeEmptyLines "${2}")"
if [[ "${delimiter}" != '' && "$(isEmptyString "${data}")" = 'false' ]]
then
local -r numberOfLines="$(wc -l <<< "${data}")"
if [[ "${numberOfLines}" -gt '0' ]]
then
local table=''
local i=1
for ((i = 1; i <= "${numberOfLines}"; i = i + 1))
do
local line=''
line="$(sed "${i}q;d" <<< "${data}")"
local numberOfColumns='0'
numberOfColumns="$(awk -F "${delimiter}" '{print NF}' <<< "${line}")"
# Add Line Delimiter
if [[ "${i}" -eq '1' ]]
then
table="${table}$(printf '%s#+' "$(repeatString '#+' "${numberOfColumns}")")"
fi
# Add Header Or Body
table="${table}\n"
local j=1
for ((j = 1; j <= "${numberOfColumns}"; j = j + 1))
do
table="${table}$(printf '#| %s' "$(cut -d "${delimiter}" -f "${j}" <<< "${line}")")"
done
table="${table}#|\n"
# Add Line Delimiter
if [[ "${i}" -eq '1' ]] || [[ "${numberOfLines}" -gt '1' && "${i}" -eq "${numberOfLines}" ]]
then
table="${table}$(printf '%s#+' "$(repeatString '#+' "${numberOfColumns}")")"
fi
done
if [[ "$(isEmptyString "${table}")" = 'false' ]]
then
echo -e "${table}" | column -s '#' -t | awk '/^\+/{gsub(" ", "-", $0)}1'
fi
fi
fi
}
function removeEmptyLines()
{
local -r content="${1}"
echo -e "${content}" | sed '/^\s*$/d'
}
function repeatString()
{
local -r string="${1}"
local -r numberToRepeat="${2}"
if [[ "${string}" != '' && "${numberToRepeat}" =~ ^[1-9][0-9]*$ ]]
then
local -r result="$(printf "%${numberToRepeat}s")"
echo -e "${result// /${string}}"
fi
}
function isEmptyString()
{
local -r string="${1}"
if [[ "$(trimString "${string}")" = '' ]]
then
echo 'true' && return 0
fi
echo 'false' && return 1
}
function trimString()
{
local -r string="${1}"
sed 's,^[[:blank:]]*,,' <<< "${string}" | sed 's,[[:blank:]]*$,,'
}
SAMPLE RUNS
$ cat data-1.txt
HEADER 1,HEADER 2,HEADER 3
$ printTable ',' "$(cat data-1.txt)"
+-----------+-----------+-----------+
| HEADER 1 | HEADER 2 | HEADER 3 |
+-----------+-----------+-----------+
$ cat data-2.txt
HEADER 1,HEADER 2,HEADER 3
data 1,data 2,data 3
$ printTable ',' "$(cat data-2.txt)"
+-----------+-----------+-----------+
| HEADER 1 | HEADER 2 | HEADER 3 |
+-----------+-----------+-----------+
| data 1 | data 2 | data 3 |
+-----------+-----------+-----------+
$ cat data-3.txt
HEADER 1,HEADER 2,HEADER 3
data 1,data 2,data 3
data 4,data 5,data 6
$ printTable ',' "$(cat data-3.txt)"
+-----------+-----------+-----------+
| HEADER 1 | HEADER 2 | HEADER 3 |
+-----------+-----------+-----------+
| data 1 | data 2 | data 3 |
| data 4 | data 5 | data 6 |
+-----------+-----------+-----------+
$ cat data-4.txt
HEADER
data
$ printTable ',' "$(cat data-4.txt)"
+---------+
| HEADER |
+---------+
| data |
+---------+
$ cat data-5.txt
HEADER
data 1
data 2
$ printTable ',' "$(cat data-5.txt)"
+---------+
| HEADER |
+---------+
| data 1 |
| data 2 |
+---------+
REF LIB at: https://github.com/gdbtek/linux-cookbooks/blob/master/libraries/util.bash
To have the exact same output as you need, you need to format the file like this:
a very long string..........\t 112232432\t anotherfield\n
a smaller string\t 123124343\t anotherfield\n
And then using:
$ column -t -s $'\t' FILE
a very long string.......... 112232432 anotherfield
a smaller string 123124343 anotherfield
It's easier than you wonder.
If you are working with a separated-by-semicolon file and header too:
$ (head -n1 file.csv && sort file.csv | grep -v <header>) | column -s";" -t
If you are working with an array (using tab as separator):
for((i=0;i<array_size;i++));
do
echo stringarray[$i] $'\t' numberarray[$i] $'\t' anotherfieldarray[$i] >> tmp_file.csv
done;
cat file.csv | column -t
awk solution that deals with stdin
Since column is not POSIX, maybe this is:
mycolumn() (
file="${1:--}"
if [ "$file" = - ]; then
file="$(mktemp)"
cat > "${file}"
fi
awk '
FNR == 1 { if (NR == FNR) next }
NR == FNR {
for (i = 1; i <= NF; i++) {
l = length($i)
if (w[i] < l)
w[i] = l
}
next
}
{
for (i = 1; i <= NF; i++)
printf "%*s", w[i] + (i > 1 ? 1 : 0), $i
print ""
}
' "$file" "$file"
if [ "$1" = - ]; then
rm "$file"
fi
)
Test:
printf '12 1234 1
12345678 1 123
1234 123456 123456
' > file
Test commands:
mycolumn file
mycolumn <file
mycolumn - <file
Output for all:
12 1234 1
12345678 1 123
1234 123456 123456
See also:
Using awk to align columns in text file?
AWK: go through the file twice, doing different tasks
I am not sure where you were running this, but the code you posted would not produce the output you gave, at least not in the Bash version that I'm familiar with.
Try this instead:
stringarray=('test' 'some thing' 'very long long long string' 'blah')
numberarray=(1 22 7777 8888888888)
anotherfieldarray=('other' 'mixed' 456 'data')
array_size=4
for((i=0;i<array_size;i++))
do
echo ${stringarray[$i]} $'\x1d' ${numberarray[$i]} $'\x1d' ${anotherfieldarray[$i]}
done | column -t -s$'\x1d'
Note that I'm using the group separator character (0x1D) instead of tab, because if you are getting these arrays from a file, they might contain tabs.
Just in case someone wants to do that in PHP, I posted a gist on GitHub:
https://gist.github.com/redestructa/2a7691e7f3ae69ec5161220c99e2d1b3
Simply call:
$output = $tablePrinter->printLinesIntoArray($items, ['title', 'chilProp2']);
You may need to adapt the code if you are using a PHP version older than 7.2.
After that, call echo or writeLine depending on your environment.
The below code has been tested and does exactly what is requested in the original question.
Parameters:
%30s Column of 30 char and text right align.
%10d integer notation, %10s will also work. \
stringarray[0]="a very long string.........."
# 28Char (max length for this column)
numberarray[0]=1122324333
# 10digits (max length for this column)
anotherfield[0]="anotherfield"
# 12Char (max length for this column)
stringarray[1]="a smaller string....."
numberarray[1]=123124343
anotherfield[1]="anotherfield"
printf "%30s %10d %13s" "${stringarray[0]}" ${numberarray[0]} "${anotherfield[0]}"
printf "\n"
printf "%30s %10d %13s" "${stringarray[1]}" ${numberarray[1]} "${anotherfield[1]}"
# a var string with spaces has to be quoted
printf "\n Next line will fail \n"
printf "%30s %10d %13s" ${stringarray[0]} ${numberarray[0]} "${anotherfield[0]}"
a very long string.......... 1122324333 anotherfield
a smaller string..... 123124343 anotherfield
column -t skips empty fields when a line starts with a delimiter character or when there are two or more consecutive delimiter characters:
$ printf %s\\n a,b,c a,,c ,b,c|column -s, -t
a b c
a c
b c
Therefore I use this awk function instead (it requires gawk because it uses arrays of arrays):
$ tab(){ awk '{if(NF>m)m=NF;for(i=1;i<=NF;i++){a[NR][i]=$i;l=length($i);if(l>b[i])b[i]=l}}END{for(h in a){for(i=1;i<=m;i++)printf("%-"(b[i]+n)"s",a[h][i]);print""}}' n="${2-1}" "${1+FS=$1}"|sed 's/ *$//';}
$ printf %s\\n a,b,c a,,c ,b,c|tab ,
a b c
a c
b c
if you data doesn't contain the equal sign ("=") anywhere in it, you can use that as a shell-friendly delimiter for column without having to escape anything -
by modifying FS to be either a tab ("\t") plus any amount of spaces (" ") or tabs ("\t") on either side of it, or a contiguous chunk of 2 or more spaces, it also allows the input data to have any amount of single space within each field
echo "${inputdata2}" |
mawk NF=NF OFS== FS=' + |[ \t]*\t[ \t]*' |
column -s= -t
a very long string.......... 112232432 anotherfield
a smaller string 123124343 anotherfield
if the data does contain the equal sign, use a combo sep that's close to impossible to exist in typical data :
gawk -e NF=NF OFS='\301\372\5' FS=' + |[ \t]*\t[ \t]*' |
LC_ALL=C column -s$'\301\372\5' -t
a very long string.......... 112232432 anotherfield
a smaller string 123124343 anotherfield
and if ur data only has 2 columns, and you have ballpark sense of how wide the first field is, you can use this \r trick for nice on-screen formatting (but those don't become runs of spaces if u need to send it down the pipe) :
# each \t is 8-spaces at console terminal
mawk NF=2 FS=' + |[ \t]*\t[ \t]*' OFS='\r\t\t\t\t'
a very long string.......... 112232432
a smaller string 123124343

Resources