I have a file with the following structure:
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 75 159 dsadasd/2 0 +
B 78 852 dsadasd/1 0 -
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 87 52 dsadasd/2 0 +
A 52 15 dsadasd/1 0 -
I would like to sort it by the 4th field (which is basically sorting by the last number) in groups of two lines by two lines to output the following result:
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 78 852 dsadasd/1 0 -
B 75 159 dsadasd/2 0 +
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 52 15 dsadasd/1 0 -
A 87 52 dsadasd/2 0 +
TIA
there should be an easier way but this works
$ awk '{c+=p!=$1; p=$1; print c "\t" $0}' file | sort -k1,1 -k5 | cut -f2-
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 78 852 dsadasd/1 0 -
B 75 159 dsadasd/2 0 +
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 52 15 dsadasd/1 0 -
A 87 52 dsadasd/2 0 +
creates a group id based on the first field groups, sort by it first then the other key field; remove the dummy group id.
awk + sort
$ awk ' { $(NF+1)=int((NR+1)/2) } 1 ' angel.txt | sort -k7,7 -k4,4 | awk ' {$NF=""}1 '
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 78 852 dsadasd/1 0 -
B 75 159 dsadasd/2 0 +
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 52 15 dsadasd/1 0 -
A 87 52 dsadasd/2 0 +
$ cat angel.txt
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 75 159 dsadasd/2 0 +
B 78 852 dsadasd/1 0 -
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 87 52 dsadasd/2 0 +
A 52 15 dsadasd/1 0 -
$
Try Perl.. note that this preserves the spaces in your input
perl -0777 -ne ' while( /(.+?)\n(.+?)\n/gms ) { $a=$1;$b=$2; (split(/\s+/,$a))[3] gt (split(/\s+/,$b))[3] ? print "$b\n$a\n" : print "$a\n$b\n" }'
with inputs
$ cat angel.txt
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 75 159 dsadasd/2 0 +
B 78 852 dsadasd/1 0 -
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 87 52 dsadasd/2 0 +
A 52 15 dsadasd/1 0 -
$ perl -0777 -ne ' while( /(.+?)\n(.+?)\n/gms ) { $a=$1;$b=$2; (split(/\s+/,$a))[3] gt (split(/\s+/,$b))[3] ? print "$b\n$a\n" : print "$a\n$b\n" }' angel.txt
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 78 852 dsadasd/1 0 -
B 75 159 dsadasd/2 0 +
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 52 15 dsadasd/1 0 -
A 87 52 dsadasd/2 0 +
$
In awk:
$ awk '{
k=NR%2; a[k]=$4; b[k]=$0 # store compare value and
} # record for 0 and 1
!(NR%2) { # on even we compare
print b[(a[0]>a[1])] ORS b[(a[0]<=a[1])] # and print the smaller first
}' file
A 35 74 dsadasd/1 0 +
A 95 74 dsadasd/2 0 -
B 78 852 dsadasd/1 0 -
B 75 159 dsadasd/2 0 +
C 12 789 dsadasd/1 0 +
C 91 546 dsadasd/2 0 -
A 52 15 dsadasd/1 0 -
A 87 52 dsadasd/2 0 +
This should work with awk:
awk '{if(p==""){p=$0;p4=$4}
else{
if(p4>$4){print $0"\n"p}
else{print p"\n"$0};p=p4=""
}}' file
I would like to extract certain part of an image. Let's say, only those parts that are indexed by ones, in some kind of template or frame.
GRAYPIC = reshape(randperm(169), 13, 13);
FRAME = ones(13);
FRAME(5:9, 5:9) = 0;
FRAME_OF_GRAYPIC = []; % the new pic that only shows the frame extracted
I can achieve this using a for loop:
for X = 1:13
for Y = 1:13
vlaue = FRAME(Y, X);
switch vlaue
case 1
FRAME_OF_GRAYPIC(X,Y) = GRAYPIC(X,Y)
case 0
FRAME_OF_GRAYPIC(X,Y) = 0
end
end
end
imshow(mat2gray(FRAME_OF_GRAYPIC));
However, is it possible to use it with some kind of vector operation, i.e.:
FRAME_OF_GRAYPIC = GRAYPIC(FRAME==1);
Though, this doesn't work unfortunately.
Any suggestions?
Thanks a lot for your answers,
best,
Clemens
Too long for a comment...
GRAYPIC = reshape(randperm(169), 13, 13);
FRAME = zeros(13);
FRAME(5:9, 5:9) = 0;
FRAME_OF_GRAYPIC = zeros(size(GRAYPIC); % MUST preallocate new pic the right size
FRAME = logical(FRAME); % ... FRAME = (FRAME == 1)
FRAME_OF_GRAYPIC(FRAME) = GRAYPIC(FRAME);
Three things to note here:
FRAME must be a logical array. Create it with true()/false(), or cast it using logical(), or select a value to be true using FRAME = (FRAME == true_value);
You must preallocate your final image to the proper dimensions, otherwise it will turn into a vector.
You need the image indices on both sides of the assignment:
FRAME_OF_GRAYPIC(FRAME) = GRAYPIC(FRAME);
Output:
FRAME_OF_GRAYPIC =
38 64 107 63 27 132 148 160 88 59 102 69 81
14 108 76 58 49 55 51 19 158 52 100 153 39
79 139 12 115 147 154 96 112 82 73 159 146 93
169 2 71 25 33 149 138 150 129 117 65 97 17
43 111 37 142 0 0 0 0 0 128 84 86 22
9 137 127 45 0 0 0 0 0 68 28 46 163
42 11 31 29 0 0 0 0 0 152 3 85 36
50 110 165 18 0 0 0 0 0 144 143 44 109
114 133 1 122 0 0 0 0 0 80 167 157 145
24 116 60 130 53 77 156 35 6 78 90 30 140
74 120 40 26 106 166 121 34 98 57 56 13 48
8 155 4 16 124 75 123 23 105 66 7 141 70
89 113 99 101 54 20 94 72 83 168 61 5 10
I wrote this in response to Reddit's daily programmer challenge, and I would like to get some of your feedback on it to improve the code (it seems to work). The challenge is as follows:
We are given a list of numbers in a "short-hand" range notation where only the significant part of the next number is written because we know the numbers are always increasing (ex. "1,3,7,2,4,1" represents [1, 3, 7, 12, 14, 21]). Some people use different separators for their ranges (ex. "1-3,1-2", "1:3,1:2", "1..3,1..2" represent the same numbers [1, 2, 3, 11, 12]) and they sometimes specify a third digit for the range step (ex. "1:5:2" represents [1, 3, 5]).
NOTE: For this challenge range limits are always inclusive.
Our job is to return a list of the complete numbers.
The possible separators are: ["-", ":", ".."]
Sample input:
104..02
545,64:11
Sample output:
104 105 106...200 201 202 # truncated for simplicity
545 564 565 566...609 610 611 # truncated for simplicity
My solution:
BEGIN { FS = "," }
function next_value(current_value, previous_value) {
regexp = current_value "$"
while(current_value <= previous_value || !(current_value ~ regexp)) {
current_value += 10
}
return current_value;
}
{
j = 0
delete number_list
for(i = 1; i <= NF; i++) {
# handle fields with ranges
if($i ~ /-|:|\.\./) {
split($i, range, /-|:|\.\./)
if(range[1] > range[2]) {
if(j != 0) {
range[1] = next_value(range[1], number_list[j-1])
range[2] = next_value(range[2], range[1])
}
else
range[2] = next_value(range[2], range[1]);
}
if(range[3] == "")
number_to_iterate_by = 1;
else
number_to_iterate_by = range[3];
range_iterator = range[1]
while(range_iterator <= range[2]) {
number_list[j] = range_iterator
range_iterator += number_to_iterate_by
j++
}
}
else {
number_list[j] = $i
j++
}
}
# apply increasing range logic and print
for(i = 0; i < j; i++ ) {
if(i == 0) {
if(NR != 1) printf "\n"
current_value = number_list[i]
}
else {
previous_value = current_value
current_value = next_value(number_list[i], previous_value)
}
printf "%s ", current_value
}
}
END { printf "\n" }
This is BASH (Not AWK).
I believe it is a valid answer because the original challenge doesn't specify a language.
#!/bin/bash
mkord(){ local v=$1 dig base
max=$2
(( dig=10**${#v} , base=max/dig*dig , v+=base ))
while (( v < max )); do (( v+=dig )); done
max=$v
}
while read line; do
line="${line//[,\"]/ }" line="${line//[:-]/..}"
IFS=' ' read -a arr <<<"$line"
max=0 a='' res=''
for val in "${arr[#]//../ }"; do
IFS=" " read v1 v2 v3 <<<"$val"
(( a==0 )) && max=$v1
[[ $v1 ]] && mkord "$v1" "$max" && v1=$max
[[ $v2 ]] && mkord "$v2" "$max" && v2=$max
res=$res${a:+,}${v2:+\{}$v1${v2:+\.\.}$v2${v3:+\.\.}$v3${v2:+\}}
a=1
done
(( ${#arr[#]} > 1 )) && res={$res}
eval set -- $res
echo "\"$*\""
done <"infile"
If the source of the tests is:
$ cat infile
"1,3,7,2,4,1"
"1-3,1-2"
"1:5:2"
"104-2"
"104..02"
"545,64:11"
The result will be:
"1 3 7 12 14 21"
"1 2 3 11 12"
"1 3 5"
"104 105 106 107 108 109 110 111 112"
"104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202"
"545 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611"
This gets the list done in 7 milliseconds.
My solution using gawk, RT (It contains the input text that matched the text denoted by RS) and next_n function uses modulo operation for to find the next number based on the last
cat range.awk
BEGIN{
RS="\\.\\.|,|:|-"
start = ""
end = 0
temp = ""
}
function next_n(n, last){
mod = last % (10**length(n))
if(mod < n) return last - mod + n
return last + ((10**length(n))-mod) + n
}
{
if(RT==":" || RT==".." || RT=="-"){
if(start=="") start = next_n($1,end)
else temp = $1
}else{
if(start != ""){
if(temp==""){
end = next_n($1,start)
step = 1
}else {
end = next_n(temp,start)
step = $1
}
for(i=start; i<=end; i+=step) printf "%s ", i
start = ""
temp = ""
}else{
end = next_n($1,end)
printf "%s ", end
}
}
}
END{
print ""
}
TEST 1
echo "104..02" | awk -f range.awk
OUTPUT 1
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
TEST 2
echo "545,64:11" | awk -f range.awk
OUTPUT 2
545 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611
TEST 3
echo "2..5,7,2-1,2:1,0-3,2-7,8..0,4,4,2..1" | awk -f range.awk
OUTPUT 3
2 3 4 5 7 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 40 41 42 43 52 53 54 55 56 57 58 59 60 64 74 82 83 84 85 86 87 88 89 90 91
TEST 4 with step
echo "1:5:2,99,88..7..3" | awk -f range.awk"
OUTPUT 4
1 3 5 99 188 191 194 197
I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")
This is basically what I'm trying to accomplish:
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
. . . .
. . . .
. . . .
I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.
Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python:
l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
print "\t".join(i)
produces
30 28 -7 0
30 6
32 6
awk based solution:
awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........
Another solution, but it works only for file with 4 lines:
$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
68 87 28 1.5
88 99 13 0.5
97 110 13 0.5
105 116 10 0
107 119 11 0.5
107 120 12 0.5
105 117 11 0.5
101 114 13 0.5
93 113 22 1
88 103 17 0.5
80 82 3 0
69 6 -0.5
55 47 -15 -0.5
-20 2.5
38
71
Updated: or another version with preprocessing:
$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:
$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T
Example:
cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T
30 28 -7 0
30 6 43 3
32 6 71 5
35 50 30 1.5
34 58 23 1
43 56 28 1.5
52 64 13 0.5
68 87 13 0.5
88 99 10 0
97 110 11 0.5
105 116 12 0.5
107 119 11 0.5
107 120 13 0.5
105 117 22 1
101 114 17 0.5
93 113 3 0
88 103 -15 -0.5
80 82 -20 -0.5
69 6 38 2.5
55 47 71
I'm trying to create the program to read each lines and calculate each line's average and store into array...For example, program will read first line, add all the numbers and divide by 24 to calculate the average which will be stored into Avg_list[1].
When I try to run the program, I encounter following error, I have no idea why it doesn't work... Can someone identify the problem?
Code: in Ksh
c=0
while read -r line ; do
v=$line
set -- $v ((g=($2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22+$23+$24+$25+$26)/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
daily.txt
CPU 55 54 54 54 54 54 54 54 54 54 54 54 54 54 54 55 54 54 55 56 57 54 57 54
CPEAK 56 56 57 55 58 56 56 56 57 55 60 56 55 56 55 56 58 55 57 56 63 56 72 57
RAM 97 97 97 97 97 96 96 96 96 96 96 93 91 89 86 84 90 90 95 97 97 97 97 97
RPEAK 97 97 97 97 97 97 96 96 96 96 96 96 92 90 91 81 94 89 97 97 97 97 97 97
Error note:
while read -r line ; do
v=$line
set -- $v
((g=($2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22+$23+$24+$25+$26)/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
+ 0< daily.txt
+ read -r line
+ v=CPU 54 54 54 54 54 54 54 54 54 54 54 54 54 54 55 54 54 55 56 57 54 57 54 54
+ set -- CPU 54 54 54 54 54 54 54 54 54 54 54 54 54 54 55 54 54 55 56 57 54 57 54 54
+ (( g=(54+54+54+54+54+54+54+54+CPU0+CPU1+CPU2+CPU3+CPU4+CPU5+CPU6+CPU7+CPU8+CPU9+540+541+542+543+544+545+546)/24 ))
PerformanceAM.sh[21]: g=(54+54+54+54+54+54+54+54+CPU0+CPU1+CPU2+CPU3+CPU4+CPU5+CPU6+CPU7+CPU8+CPU9+540+541+542+543+544+545+546)/24: 0403-009 The specified number is not valid for this command.
EDIT
while read -r line ; do
v=$line
set -- $v
((g=${2}+${3}+${4}+${5}+${6}+${7}+${8}+${9}+${10}+${11}+${12}+${13}+${14}+${15}+${16}+${17}+${18}+${19}+${20}+${21}+${22}+${23}+${24}+${25}+${26})/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
New error:
while read -r line ; do
v=$line
set -- $v
((g=${2}+${3}+${4}+${5}+${6}+${7}+${8}+${9}+${10}+${11}+${12}+${13}+${14}+${15}+${16}+${17}+${18}+${19}+${20}+${21}+${22}+${23}+${24}+${25}+${26})/24)PerformanceAM.sh[18]: 0403-057 Syntax error at line 21 : `/24' is not expected.
Thanks for your suggestions! when I tried used bracket i get this error...I'm now even more confused....it seems like it's not collecting numbers at all...
#!/bin/ksh
while read -r line ; do
v=$line
set -- $v
((g=(${2}+${3}+${4}+${5}+${6}+${7}+${8}+${9}+${10}+${11}+${12}+${13}+${14}+${15}+${16}+${17}+${18}+${19}+${20}+${21}+${22}+${23}+${24}+${25})/24))
echo $g
Avg_list[${c}]=$g
((c=c+1))
done < daily.txt
You were missing one ( at g=${2} and the arguments only go until ${25} not ${26}.
This should do it:
while read -r line; do
sum=0
total=0
for x in $line; do
# if x is numeric
if echo "$x" | grep -E '^[0-9]*$' > /dev/null ; then
((sum=sum+x))
((total=total+1))
else
echo -n "$x "
fi
done
if [ $total = 0 ]; then
echo
else
echo $((sum/total))
fi
done < daily.txt
This follows my general principle of never making long lists of $1 $2... This solution works for any number of integers per line, and it also prints out the line label (a feature easy to remove if you want).
For reference purposes, here's how awk can be used to solve this
array=( $(awk '{sum=0; for (i=2;i<=25; i++) sum=sum+$i; printf "%.0f ",sum/24 }' daily.txt ) )
Given a list of numbers, it's a little tidier to use reverse-polish notation for the calculations:
c=0
while read line; do
set -- $line
shift
script="3 k $* + + + + + + + + + + + + + + + + + + + + + + + 24 / p"
Avg_list[c++]=$( dc -e "$script" )
done < daily.txt
Then
printf "%s\n" "${Avg_list[#]}"
produces
54.458
57.250
94.333
94.875