I have csv file like this:
-0.106992, -0.106992, -0.059528, -0.059528, -0.028184, -0.028184, 0.017793, 0.017793, 0.0, 0.220367
-0.094557, -0.094557, -0.063707, -0.063707, -0.020796, -0.020796, 0.003707, 0.003707, 0.200767, 0.200767
-0.106038, -0.106038, -0.056540, -0.056540, -0.015119, -0.015119, 0.032954, 0.032954, 0.237774, 0.237774
-0.049499, -0.049499, -0.006934, -0.006934, 0.026562, 0.026562, 0.067442, 0.067442, 0.260149, 0.260149
-0.081001, -0.081001, -0.039581, -0.039581, -0.008817, -0.008817, 0.029912, 0.029912, 0.222084, 0.222084
-0.046782, -0.046782, -0.000180, -0.000180, 0.030788, 0.030788, 0.075928, 0.075928, 0.266452, 0.266452
-0.082107, -0.082107, -0.026791, -0.026791, 0.001874, 0.001874, 0.052341, 0.052341, 0.249779, 0.249779
enter image description here
I want to get the minimum value from each row.
Expected output must be:
-0.106992
-0.094557
-0.106038
-0.049499
-0.08100
-0.046782
-0.082107
I tried get it by awk but awk doesn't give minimum values:
awk command:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
output:
-0.028184,
-0.020796,
-0.015119,
-0.006934,
-0.008817,
-0.000180,
-0.026791,
Perl makes short work of this:
perl -MList::Util=min -F', ' -E 'say min #F' file.csv
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
Using any awk in any shell on every Unix box whether you have blanks after each comma or not:
$ awk -F', *' '{min=$1; for (i=2;i<=NF;i++) if ($i<min) min=$i; print min}' file
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
with ruby :-D
ruby -F', ' -ane 'puts $F.map(&:to_f).min' file.csv
Your code is correct:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
Except that you must add a comma to the field separator:
awk -F '[[:blank:],]' '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
[[:blank:],] is spaces, tabs, and commas.
Related
I'm currently trying to write a bash/shell script that pulls data from a .csv and reverses all the string values in every other column and outputs to a new csv. I have a script that grabs every other column but I'm not sure how to reverse the strings in those columns.
awk 'BEGIN{FS=","} {s=$NF;
for (i=1; i<=NF; i+=2)
printf ("%s%c", $i, i + 2 <= NF ? "," : "\n")
}' input.csv > output.csv
awk to the rescue!
$ seq 100 141 | pr -6ats, |
awk -F, 'function rev(x) {r="";
for(j=length(x);j;j--) r=r substr(x,j,1);
return r}
BEGIN {OFS=FS}
{for(i=1;i<NF;i+=2) $i=rev($i)}1'
001,101,201,103,401,105
601,107,801,109,011,111
211,113,411,115,611,117
811,119,021,121,221,123
421,125,621,127,821,129
031,131,231,133,431,135
631,137,831,139,041,141
$ cat file
abc,def,ghi,klm
$ rev file
mlk,ihg,fed,cba
$ rev file |
awk 'BEGIN{FS=OFS=","} NR==FNR{split($0,a); next} {for (i=1; i<=NF; i+=2) $i=a[NF-i+1]} 1' - file
cba,def,ihg,klm
I am trying to split a file (testfile.csv) that contains the following:
1,2,4,5,6,7,8,9
a,b,c,d,e,f,g,h
q,w,e,r,t,y,u,i
a,s,d,f,g,h,j,k
z,x,c,v,b,n,m,z
into a file
1,2
a,b
q,w
a,s
z,x
and another file
4,5
c,d
e,r
d,f
c,v
but I cannot seem to do that in awk using an iterative solution.
awk -F, '{print $1, $2}'
awk -F, '{print $3, $4}'
does it for me but I would like a looping solution.
I tried
awk -F, '{ for (i=1;i< NF;i+=2) print $i, $(i+1) }' testfile.csv
but it gives me a single column. It appears that I am iterating over the first row and then moving onto the second row skipping every other element of that specific row.
You can use cut:
$ cut -d, -f1,2 file > file_1
$ cut -d, -f3,4 file > file_2
If you are going to use awk be sure to set the OFS so that the columns remain a CSV file:
$ awk 'BEGIN{FS=OFS=","}
{print $1,$2 >"f1"; print $3,$4 > "f2"}' file
$ cat f1
1,2
a,b
q,w
a,s
z,x
$cat f2
4,5
c,d
e,r
d,f
c,v
Is there a quick and dirty way of renaming the resulting files with the first row and first column (like first file would be 1.csv, second file would be 4.csv:
awk 'BEGIN{FS=OFS=","}
FNR==1 {n1=$1 ".csv"; n2=$3 ".csv"}
{print $1,$2 >n1; print $3,$4 > n2}' file
awk -F, '{ for (i=1; i < NF; i+=2) print $i, $(i+1) > i ".csv"}' tes.csv
works for me. I was trying to get the output in bash which was all jumbled up.
It's do-able in bash, but it will be much slower than awk:
f=testfile.csv
IFS=, read -ra first < <(head -1 "$f")
for ((i = 0; i < (${#first[#]} + 1) / 2; i++)); do
slice_file="${f%.csv}$((i+1)).csv"
cut -d, -f"$((2 * i + 1))-$((2 * (i + 1)))" "$f" > "$slice_file"
done
with sed:
sed -r '
h
s/(.,.),./\1/w file1.txt
g
s/.,.,(.,.),./\1/w file2.txt' file.txt
I have a csv file which looks like this:
ID_X,1,2,7,8
ID_Y,6,9,3,5
ID_Z,7,12,4,4
My goal is to create a csv file with the sum of all the values in each single column (from second column on), so in this case, that file will look like this:
SUM,14,23,14,17
So far, I am able to do it for one column at a time using awk. For instance, for the first column with numbers:
awk 'BEGIN {FS=OFS=","} ; {sum+=$2} END {print sum}' test.txt
14
Is there any way to achieve what I am looking for?
Many thanks!
You are almost there.
With awk you could say:
awk ' BEGIN {FS=OFS=","}
{for (i=2; i<=NF; i++) {sum[i]+=$i} len=NF}
END {$1="SUM"; for (i=2; i<=len; i++) $i=sum[i]; print}
' file.csv
Using datamash:
echo -n SUM,; datamash -t, sum 2,3,4,5 < file.csv
Using numsum:
printf 'SUM%.0s,%s,%s,%s,%s\n' `numsum -s, -c file.csv`
or, if the number of columns in file.csv is variable:
numsum -s, -c file.csv | sed 's/^0/SUM/;y/ /,/'
Output:
SUM,14,23,14,17
In bash how to perform a string rename deleting all words that contains a number:
name_befor_proc="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
result:
name_after_proc="art-of-medusa.jpg"
In sed, remove everything between - that contains a number.
sed 's/[^-]*[0-9][^-\.]*-\{0,1\}//g;s/-\././' test
art-of-medusa.jpg
I guess there is no generic solution, also you can use the following python script for your particular use case
name = "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
ext = name.split(".")[1]
def contains_number(word):
for i in "0123456789":
if i in word:
return False
return True
final = '-'.join([word for word in name.split('-') if contains_number(word)])
if ext not in final:
final += "."+ext
print final
output:
art-of-medusa.jpg
It is not trivial!
awk -F"." -v sep="-" '
{n=split($1,a,sep)
for (i=1; i<=n; i++)
{if (a[i] ~ /[0-9]/) delete a[i]}
n=length(a)
for (i in a)
printf "%s%s", a[i], (++c<n?sep:"")
printf "%s%s\n", FS, $2}'
Split the string (up to the dot) and loop through the pieces. If one contains a digit, remove it. Then, rejoin the array and print accordingly.
Test
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
art-of-medusa.jpg
Testing with "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg" to make sure other words are also matched:
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg"
art-of-medusa-a-b.jpg
You can use gnu-awk for this:
s="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
name_after_proc=$(awk -v RS='[.-]' '!/[[:digit:]]/{printf r $1} {r=RT}' <<< "$s")
echo "$name_after_proc"
art-of-medusa.jpg
Two possible solutions:
Using Sed:
sed 's/[a-zA-Z0-9]*[0-9][a-zA-Z0-9]*/ /g' filename
Using grep:
grep -wo -E [a-zA-Z]+ foo | xargs filename
I am trying to make a script for text editing. In this case I have a text file named text.csv, which reads:
first;48548a;48954a,48594B
second;58757a;5875b
third;58756a;58576b;5867d;56894d;45864a
I want to make text format to like this:
first;48548a
first;48954a
first;48594B
second;58757a
second;5875b
third;58756a
third;58576b
third;5867d
third;56894d
third;45864a
What is command should I use to make this happen?
I'd do this in awk.
Assuming your first line should have a ; instead of a ,:
$ awk -F\; '{for(n=2; n<=NF; n++) { printf("%s;%s\n",$1,$n); }}' input.txt
Untested.
Here is a pure bash solution that handles both , and ;.
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
for item in "${data[#]}"; do
printf '%s;%s\n' "$id" "$item"
done
done < input.txt
UPDATED - alternate printing method based on chepner's suggestion:
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
printf "$id;%s\n" "${data[#]}"
done < input.txt
awk -v FS=';' -v OFS=';' '{for (i = 2; i <= NF; ++i) { print $1, $i }}'
Explanation: awk implicitly splits data into records(by default separeted by newline, i.e. line == record) which then are split into numbered fields by given field separator(FS for input field separator and OFS for output separator).
For each record this script prints first field(which is record name), along with i-th field, and that's exactly what you need.
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
printf "$id;%s\n" "${data[#]}"
done < input.txt
or
awk -v FS=';' -v OFS=';' '{for (i = 2; i <= NF; ++i) { print $1, $i }}'
And
$ awk -F\; '{for(n=2; n<=NF; n++) { printf("%s;%s\n",$1,$n); }}' input.txt
thanks all for your suggestions, :d. It's really give me a new knowledge..