I am getting some strange behavior on sort utility on Ubuntu 18.04.2. Here's some sequence of commands issued. How can I ensure numeric sort for all the columns? column 1, 2, 3, 4 should be in order.
$ cat zz
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
2 2 2 2
10 10 10 10
1 1 10 1
1 1 100 1
$ cat zz | sort
0 0 0 0
0 1 0 0
1 0 0 0
10 10 10 10
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
$ cat zz | sort -n
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
10 10 10 10
$ cat zz | sort -n -k1,3
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
10 10 10 10
Desired output (with numeric sorting):
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
1 1 10 1
1 1 100 1
2 2 2 2
10 10 10 10
What options should I use in sort to get my desired output i.e. sorted in numeric order
Related
I have to generate variables entry_1, entry_2 and entry_3 which will adopt the value 1 if id_i for that particular month had entry=1.
Example.
id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
Would anyone be so kind to propose an idea of how to implement a loop in order to do this?
I am thinking of something like this:
forvalues i=1(1)3 {
gen entry`i'=0
replace entry`i'=1 if on that particular month id=`i' had entry=1
}
You could do something like this (although your data don't quite look right for the question you're asking):
forvalues i = 1/3 {
gen entry_`i' = id == `i' & entry == 1
}
This generates a dummy variable entry_i for each i in the forvalues loop where entry_i = 1 if id is i and entry is 1, and 0 otherwise.
The code can be simplified down to at most one loop.
clear
input id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
end
forval j = 1/4 {
egen entry`j' = total(entry & id == `j'), by(month)
}
list id month entry entry? , sepby(id)
+--------------------------------------------------------+
| id month entry entry1 entry2 entry3 entry4 |
|--------------------------------------------------------|
1. | 1 1 1 1 0 0 0 |
2. | 1 2 0 0 0 0 0 |
3. | 1 3 0 0 1 1 0 |
4. | 1 4 0 0 0 0 0 |
|--------------------------------------------------------|
5. | 2 1 0 1 0 0 0 |
6. | 2 2 0 0 0 0 0 |
7. | 2 3 1 0 1 1 0 |
8. | 2 4 0 0 0 0 0 |
|--------------------------------------------------------|
9. | 3 1 0 1 0 0 0 |
10. | 3 2 0 0 0 0 0 |
11. | 3 3 1 0 1 1 0 |
12. | 3 4 0 0 0 0 0 |
+--------------------------------------------------------+
I have a PLINK ped file that looks like this:
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I am interested in counting how many time the string "2" occurs between the 7th field (included) and the last field. Then, if the number of occurrences is:
0: add 1 (being absent) to the new last field
1: add 2 (being present) to the new last field
2: add 2 (being present) to the new last field
The output would be:
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I know that to count the occurence of a string in every field I can use:
grep -n -o "2" file1 | sort -n | uniq -c | cut -d : -f 1
And that I can merge the 2 results using:
paste -d' ' file1 file2 > file3
But I don't know how to count the occurrences between two fields.
Thank you in advance for helping me!
You can use awk to check for column, row based data:
awk '{c=0; for(i=7; i<=NF; i++) if ($i==2) c++; if (c<2) c++; print $0, c}' file
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Perl to the rescue:
perl -ape 's/$/" " . (1 + !! grep 2 == $_, #F[6 .. $#F])/e'
-p reads the input line by line and prints the result
-a splits each input line on whitespace into the #F array
grep in scalar context returns the count, by !! (double negation) we change it to 0 or 1, and by adding 1 we make it into 1 and 2 as requested
s/// substitutes $ (end of line) with the result of the code in the replacement part (that's what /e does)
You could use awk:
awk '{s=0;for(i=7;i<=NF;i++) if($i==2) s+=1; s=s==0?1:2; print $0, s;}' data.txt
Explanations:
The instructions between the {} are executed on each line of the file.
NF is the number of fields in the line. They are numbered 1 to NF and you can access them with the $n notation.
I have a question in sort of connected component. I have a binary image ( onlye 0 and 1) I run the function from matlab:
f=
1 0 0 1 0 0 0 1 0 0
1 1 0 1 1 1 0 0 1 0
0 0 0 0 0 0 0 1 1 1
1 0 0 0 1 0 1 0 1 1
1 1 0 0 0 0 0 1 1 1
0 0 0 1 0 0 1 0 0 0
0 0 0 1 0 1 1 0 1 1
1 1 0 0 1 0 0 0 1 0
1 1 0 1 1 1 0 1 0 0
1 1 0 0 1 0 0 0 1 0
[L num]=bwlabel(f);
suppose that they give me the ma trix:
1 0 0 4 0 0 0 5 0 0
1 1 0 4 4 4 0 0 5 0
0 0 0 0 0 0 0 5 5 5
2 0 0 0 6 0 5 0 5 5
2 2 0 0 0 0 0 5 5 5
0 0 0 5 0 0 5 0 0 0
0 0 0 5 0 5 5 0 7 7
3 3 0 0 5 0 0 0 7 0
3 3 0 5 5 5 0 7 0 0
3 3 0 0 5 0 0 0 7 0
But you can see in this resul, the order of matrix is follow the column. Now I want to change this in to the oder rows, that mean number 4 is 2 , number 5 is 3... so on.
The oder is left-> right and top -> down. How can I do that ( the oder of reading )??
Thank you so much
f=f';
[L num]=bwlabel(f);
L=L';
does this solves your problem?
I don't know if this question is considered to be related to stackoverflow (I'm sorry if it's not but I have searched and did not find an answer anywhere).
I have coded a full adder
Output:
Truth Table :
a1 a2 b1 b2 S1 S2 C
______________________________
0 0 0 0 0 0 0
0 0 0 1 0 1 0
0 0 1 0 1 0 0
0 0 1 1 1 1 0
0 1 0 0 0 1 0
0 1 0 1 0 0 1
0 1 1 0 1 1 0
0 1 1 1 1 0 1
1 0 0 0 1 0 0
1 0 0 1 1 1 0
1 0 1 0 0 1 0
1 0 1 1 0 0 1
1 1 0 0 1 1 0
1 1 0 1 1 0 1
1 1 1 0 0 0 1
1 1 1 1 0 1 1
If somebody has ever calculated this, can they tell me if my output is correct
a1 a2 b1 b2 S1 S2 C a b s c
______________________________
0 0 0 0 0 0 0 0 0 0 0 nothing plus nothing is nothing
0 0 0 1 0 1 0 0 2 2 0 nothing plus two is two
0 0 1 0 1 0 0 0 1 1 0 nothing plus one is one
0 0 1 1 1 1 0 0 3 3 0 nothing plus three is three
0 1 0 0 0 1 0 2 0 2 0 two plus nothing is two
0 1 0 1 0 0 1 2 2 0 1 two plus two is four (four not in 0-3)
0 1 1 0 1 1 0 2 1 3 0 two plus 1 is three
0 1 1 1 1 0 1 2 3 1 1 two plus three is five (one and four)
1 0 0 0 1 0 0 1 0 1 0 one plus nothing is one
1 0 0 1 1 1 0 1 2 3 0 one plus two is three
1 0 1 0 0 1 0 1 1 2 0 one plus one is two
1 0 1 1 0 0 1 1 3 0 1 one plus three is four
1 1 0 0 1 1 0 3 0 3 0 three plus nothing is three
1 1 0 1 1 0 1 3 2 1 1 three plus two is five (one and four)
1 1 1 0 0 0 1 3 1 0 1 three plus one is four
1 1 1 1 0 1 1 3 3 2 1 three plus three is 6 (two and four)
Looks right. Ordering your 16 rows a little differently would make them flow in a more logical order.
It's an adder! Just check if it's adding. Let's take this row:
a2 a1 b2 b1 C S2 S2
1 0 1 1 1 0 1
Here I have reordered the columns in an easier to read manner: higher order bits first.
The a input is 10 = 2 (base 10). The b input is 11 = 3 (base 10). The output is 101, which
is 5 (base 10). So this one is right: 2 + 3 == 5.
I'll let you check the other rows.
Not sure if there is any name for this algorithm I'm currently developing - "growing neighbourhood algorithm" sounds like an appropriate name. So what is my problem about?
I would like to draw a stroke around an alpha transparent image to outline it. The size of the stroke should be user-definable.
I have an array which is filled by zeros and ones, consider each item of the array as a cell like in Game of Life. An item with 0 is empty (transparent pixel), an item with 1 is a first generation cell (non transparent pixel), the number of generations is defined by the size of the surrounding stroke.
This example depicts an rectangle surrounded by alpha values:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Then I would like to let the ones grow a new generation by surrounding every 0-generation Moore neighbour. It's the second generation (stroke with 1px) - thus the array looks after growing as follows:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 2 2 2 2 2 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 2 2 2 2 2 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
3rd and 4th generation (stroke with 3px):
4 4 4 4 4 4 4 4 4 4
4 3 3 3 3 3 3 3 3 4
4 3 2 2 2 2 2 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 2 2 2 2 2 3 4
4 3 3 3 3 3 3 3 3 4
4 4 4 4 4 4 4 4 4 4
So far so good. I'm achieving this simple task by the following code snippet:
for (int gen = 1; gen <= 4; gen++)
{
for (int x = 1; x < arrayWidth - 1; x++)
{
for (int y = 1; y < arrayHeight - 1; y++)
{
// See if this cell is in the current generation.
if (_generation[x + arrayWidth * y] == gen)
{
// Generate next generation.
for (int i = x - 1; i <= x + 1; i++)
{
for (int j = y - 1; j <= y + 1; j++)
{
if (_generation[i + arrayWidth * j] == 0 || _generation[i + arrayWidth * j] > gen)
{
_generation[i + arrayWidth * j] = gen + 1;
}
}
}
}
}
}
}
This approach works perfectly for simple shapes like a rectangle for example. But how can I do this for an ellipse? As soon as we have kind of a stair pattern in the cells, I'm getting messy results:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 1 1 1 1 1 1 0 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 2 2 2 2 2 0 0 0 0
0 0 0 2 2 1 1 1 1 2 2 0 0 0
0 0 2 2 1 1 1 1 1 1 2 2 0 0
0 2 2 1 1 1 1 1 1 1 1 2 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 2 1 1 1 1 1 1 1 1 2 0 0
0 0 2 2 1 1 1 1 1 1 2 2 0 0
0 0 0 2 2 1 1 1 1 2 2 0 0 0
0 0 0 0 2 2 2 2 2 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 3 3 3 3 3 3 3 3 0 0 0
0 0 3 3 2 2 2 2 2 2 3 3 0 0
0 3 3 2 2 1 1 1 1 2 2 3 3 0
3 3 2 2 1 1 1 1 1 1 2 2 3 3
3 2 2 1 1 1 1 1 1 1 1 2 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 2 1 1 1 1 1 1 1 1 2 2 3
3 3 2 2 1 1 1 1 1 1 2 2 3 3
0 3 3 2 2 1 1 1 1 2 2 3 3 0
0 0 3 3 2 2 2 2 2 2 3 3 0 0
0 0 0 3 3 3 3 3 3 3 3 0 0 0
When applying this algorithm to an ellipse, the outline looks kinda weird because of this problem (left: algorithm result, right: requested result):
The problem here is that I do not want have those 2 2 and 3 3 duplicate blocks which occur every time I have this "stair" pattern:
1 0 0 0 0 0 0 1
0 1 0 0 0 0 1 0
0 0 1 0 0 1 0 0
0 0 0 1 1 0 0 0
I want the above 2nd and 3rd generation calculations look like this:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 2 2 0 0 0 0 0
0 0 0 0 2 1 1 1 1 2 0 0 0 0
0 0 0 2 1 1 1 1 1 1 2 0 0 0
0 0 2 1 1 1 1 1 1 1 1 2 0 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 0 2 1 1 1 1 1 1 1 1 2 0 0
0 0 0 2 1 1 1 1 1 1 2 0 0 0
0 0 0 0 2 1 1 1 1 2 0 0 0 0
0 0 0 0 0 2 2 2 2 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 3 3 3 3 0 0 0 0 0
0 0 0 0 3 2 2 2 2 2 3 0 0 0
0 0 0 3 2 1 1 1 1 2 3 0 0 0
0 0 3 2 1 1 1 1 1 1 2 3 0 0
0 3 2 1 1 1 1 1 1 1 1 2 3 0
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
0 3 2 1 1 1 1 1 1 1 1 2 3 0
0 0 3 2 1 1 1 1 1 1 2 3 0 0
0 0 0 3 2 1 1 1 1 2 3 0 0 0
0 0 0 3 2 2 2 2 2 2 3 0 0 0
0 0 0 0 3 3 3 3 3 3 0 0 0 0
I've tried numerous methods to filter out those duplicate cell blocks, but I can't find an easy and generic solution for solving the problem.
Any ideas how to get stroke/outline like I get from Photoshop or Paint.NET?
Thanks!
Cheers
P
The proper name is dilation, check out morphological operations. You should try dilation with circle element, this will give you the requested result.
Here is a Matlab code that shows how it is done:
im = imcircle(70);
im = padarray(im,[20,20]);
figure;imshow(im);
im2 = imdilate(im,strel('disk',8));
figure;imshow(im2);