UNIX - Count occurrences of character per line between two fields and add new column with result

UNIX - Count occurrences of character per line between two fields and add new column with result - bash

I have a PLINK ped file that looks like this:
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I am interested in counting how many time the string "2" occurs between the 7th field (included) and the last field. Then, if the number of occurrences is:
0: add 1 (being absent) to the new last field
1: add 2 (being present) to the new last field
2: add 2 (being present) to the new last field
The output would be:
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I know that to count the occurence of a string in every field I can use:
grep -n -o "2" file1 | sort -n | uniq -c | cut -d : -f 1
And that I can merge the 2 results using:
paste -d' ' file1 file2 > file3
But I don't know how to count the occurrences between two fields.
Thank you in advance for helping me!

You can use awk to check for column, row based data:
awk '{c=0; for(i=7; i<=NF; i++) if ($i==2) c++; if (c<2) c++; print $0, c}' file
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Perl to the rescue:
perl -ape 's/$/" " . (1 + !! grep 2 == $_, #F[6 .. $#F])/e'
-p reads the input line by line and prints the result
-a splits each input line on whitespace into the #F array
grep in scalar context returns the count, by !! (double negation) we change it to 0 or 1, and by adding 1 we make it into 1 and 2 as requested
s/// substitutes $ (end of line) with the result of the code in the replacement part (that's what /e does)

You could use awk:
awk '{s=0;for(i=7;i<=NF;i++) if($i==2) s+=1; s=s==0?1:2; print $0, s;}' data.txt
Explanations:
The instructions between the {} are executed on each line of the file.
NF is the number of fields in the line. They are numbered 1 to NF and you can access them with the $n notation.

Related

numeric vs alphanumeric sort on ubuntu 18.04.2

I am getting some strange behavior on sort utility on Ubuntu 18.04.2. Here's some sequence of commands issued. How can I ensure numeric sort for all the columns? column 1, 2, 3, 4 should be in order.
$ cat zz
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
2 2 2 2
10 10 10 10
1 1 10 1
1 1 100 1
$ cat zz | sort
0 0 0 0
0 1 0 0
1 0 0 0
10 10 10 10
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
$ cat zz | sort -n
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
10 10 10 10
$ cat zz | sort -n -k1,3
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 100 1
1 1 10 1
1 1 1 1
2 2 2 2
10 10 10 10
Desired output (with numeric sorting):
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
1 1 10 1
1 1 100 1
2 2 2 2
10 10 10 10
What options should I use in sort to get my desired output i.e. sorted in numeric order

Create 2D Circle from Existing Grid

We have a matrix:
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
I want to retain the data in these fields, but in the shape of a 2D-circle:
0 1 1 1 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0 1 1 1 0
But this also scales up:
0 0 0 1 1 1 1 0 0 0
0 0 1 1 1 1 1 1 0 0
0 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 0 0 0
What would the greatest way to approach this?

cy and cx are center points
r is the radius
tiles is an empty grid
function MakeCircle(tiles, cx, cy, r):
for x in range(cx - r, cx + r):
for y in range(cy - r, cy + r):
if (distance(cx, cy, x, y) <= r):
tiles[x][y] = 1
return(tiles)
function distance(x1, y1, x2, y2):
return(sqrt((x1 - x2)**2) + (y1 - y2)**2))
This dynamically creates a circle of ones in a square-matrix regardless of size of matrix.

how to record properties of other variables in stata

I have to generate variables entry_1, entry_2 and entry_3 which will adopt the value 1 if id_i for that particular month had entry=1.
Example.
id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
Would anyone be so kind to propose an idea of how to implement a loop in order to do this?
I am thinking of something like this:
forvalues i=1(1)3 {
gen entry`i'=0
replace entry`i'=1 if on that particular month id=`i' had entry=1
}

You could do something like this (although your data don't quite look right for the question you're asking):
forvalues i = 1/3 {
gen entry_`i' = id == `i' & entry == 1
}
This generates a dummy variable entry_i for each i in the forvalues loop where entry_i = 1 if id is i and entry is 1, and 0 otherwise.

The code can be simplified down to at most one loop.
clear
input id month entry entry_1 entry_2 entry_3
1 1 1 1 0 0
1 2 0 0 0 0
1 3 0 0 1 1
1 4 0 0 0 0
2 1 0 1 0 0
2 2 0 0 0 0
2 3 1 0 1 1
2 4 0 0 0 0
3 1 0 1 0 0
3 2 0 0 0 0
3 3 1 0 1 1
3 4 0 0 0 0
end
forval j = 1/4 {
egen entry`j' = total(entry & id == `j'), by(month)
}
list id month entry entry? , sepby(id)
+--------------------------------------------------------+
| id month entry entry1 entry2 entry3 entry4 |
|--------------------------------------------------------|
1. | 1 1 1 1 0 0 0 |
2. | 1 2 0 0 0 0 0 |
3. | 1 3 0 0 1 1 0 |
4. | 1 4 0 0 0 0 0 |
|--------------------------------------------------------|
5. | 2 1 0 1 0 0 0 |
6. | 2 2 0 0 0 0 0 |
7. | 2 3 1 0 1 1 0 |
8. | 2 4 0 0 0 0 0 |
|--------------------------------------------------------|
9. | 3 1 0 1 0 0 0 |
10. | 3 2 0 0 0 0 0 |
11. | 3 3 1 0 1 1 0 |
12. | 3 4 0 0 0 0 0 |
+--------------------------------------------------------+

full adder gate

I don't know if this question is considered to be related to stackoverflow (I'm sorry if it's not but I have searched and did not find an answer anywhere).
I have coded a full adder
Output:
Truth Table :
a1 a2 b1 b2 S1 S2 C
______________________________
0 0 0 0 0 0 0
0 0 0 1 0 1 0
0 0 1 0 1 0 0
0 0 1 1 1 1 0
0 1 0 0 0 1 0
0 1 0 1 0 0 1
0 1 1 0 1 1 0
0 1 1 1 1 0 1
1 0 0 0 1 0 0
1 0 0 1 1 1 0
1 0 1 0 0 1 0
1 0 1 1 0 0 1
1 1 0 0 1 1 0
1 1 0 1 1 0 1
1 1 1 0 0 0 1
1 1 1 1 0 1 1
If somebody has ever calculated this, can they tell me if my output is correct

a1 a2 b1 b2 S1 S2 C a b s c
______________________________
0 0 0 0 0 0 0 0 0 0 0 nothing plus nothing is nothing
0 0 0 1 0 1 0 0 2 2 0 nothing plus two is two
0 0 1 0 1 0 0 0 1 1 0 nothing plus one is one
0 0 1 1 1 1 0 0 3 3 0 nothing plus three is three
0 1 0 0 0 1 0 2 0 2 0 two plus nothing is two
0 1 0 1 0 0 1 2 2 0 1 two plus two is four (four not in 0-3)
0 1 1 0 1 1 0 2 1 3 0 two plus 1 is three
0 1 1 1 1 0 1 2 3 1 1 two plus three is five (one and four)
1 0 0 0 1 0 0 1 0 1 0 one plus nothing is one
1 0 0 1 1 1 0 1 2 3 0 one plus two is three
1 0 1 0 0 1 0 1 1 2 0 one plus one is two
1 0 1 1 0 0 1 1 3 0 1 one plus three is four
1 1 0 0 1 1 0 3 0 3 0 three plus nothing is three
1 1 0 1 1 0 1 3 2 1 1 three plus two is five (one and four)
1 1 1 0 0 0 1 3 1 0 1 three plus one is four
1 1 1 1 0 1 1 3 3 2 1 three plus three is 6 (two and four)
Looks right. Ordering your 16 rows a little differently would make them flow in a more logical order.

It's an adder! Just check if it's adding. Let's take this row:
a2 a1 b2 b1 C S2 S2
1 0 1 1 1 0 1
Here I have reordered the columns in an easier to read manner: higher order bits first.
The a input is 10 = 2 (base 10). The b input is 11 = 3 (base 10). The output is 101, which
is 5 (base 10). So this one is right: 2 + 3 == 5.
I'll let you check the other rows.

Algorithm for drawing an outline or stroke around any alpha transparent image

Not sure if there is any name for this algorithm I'm currently developing - "growing neighbourhood algorithm" sounds like an appropriate name. So what is my problem about?
I would like to draw a stroke around an alpha transparent image to outline it. The size of the stroke should be user-definable.
I have an array which is filled by zeros and ones, consider each item of the array as a cell like in Game of Life. An item with 0 is empty (transparent pixel), an item with 1 is a first generation cell (non transparent pixel), the number of generations is defined by the size of the surrounding stroke.
This example depicts an rectangle surrounded by alpha values:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Then I would like to let the ones grow a new generation by surrounding every 0-generation Moore neighbour. It's the second generation (stroke with 1px) - thus the array looks after growing as follows:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 2 2 2 2 2 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 1 1 1 1 2 0 0
0 0 2 2 2 2 2 2 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
3rd and 4th generation (stroke with 3px):
4 4 4 4 4 4 4 4 4 4
4 3 3 3 3 3 3 3 3 4
4 3 2 2 2 2 2 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 1 1 1 1 2 3 4
4 3 2 2 2 2 2 2 3 4
4 3 3 3 3 3 3 3 3 4
4 4 4 4 4 4 4 4 4 4
So far so good. I'm achieving this simple task by the following code snippet:
for (int gen = 1; gen <= 4; gen++)
{
for (int x = 1; x < arrayWidth - 1; x++)
{
for (int y = 1; y < arrayHeight - 1; y++)
{
// See if this cell is in the current generation.
if (_generation[x + arrayWidth * y] == gen)
{
// Generate next generation.
for (int i = x - 1; i <= x + 1; i++)
{
for (int j = y - 1; j <= y + 1; j++)
{
if (_generation[i + arrayWidth * j] == 0 || _generation[i + arrayWidth * j] > gen)
{
_generation[i + arrayWidth * j] = gen + 1;
}
}
}
}
}
}
}
This approach works perfectly for simple shapes like a rectangle for example. But how can I do this for an ellipse? As soon as we have kind of a stair pattern in the cells, I'm getting messy results:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 1 1 1 1 1 1 0 0 0 0
0 0 0 0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 2 2 2 2 2 0 0 0 0
0 0 0 2 2 1 1 1 1 2 2 0 0 0
0 0 2 2 1 1 1 1 1 1 2 2 0 0
0 2 2 1 1 1 1 1 1 1 1 2 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 2 1 1 1 1 1 1 1 1 2 0 0
0 0 2 2 1 1 1 1 1 1 2 2 0 0
0 0 0 2 2 1 1 1 1 2 2 0 0 0
0 0 0 0 2 2 2 2 2 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 3 3 3 3 3 3 3 3 0 0 0
0 0 3 3 2 2 2 2 2 2 3 3 0 0
0 3 3 2 2 1 1 1 1 2 2 3 3 0
3 3 2 2 1 1 1 1 1 1 2 2 3 3
3 2 2 1 1 1 1 1 1 1 1 2 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 2 1 1 1 1 1 1 1 1 2 2 3
3 3 2 2 1 1 1 1 1 1 2 2 3 3
0 3 3 2 2 1 1 1 1 2 2 3 3 0
0 0 3 3 2 2 2 2 2 2 3 3 0 0
0 0 0 3 3 3 3 3 3 3 3 0 0 0
When applying this algorithm to an ellipse, the outline looks kinda weird because of this problem (left: algorithm result, right: requested result):
The problem here is that I do not want have those 2 2 and 3 3 duplicate blocks which occur every time I have this "stair" pattern:
1 0 0 0 0 0 0 1
0 1 0 0 0 0 1 0
0 0 1 0 0 1 0 0
0 0 0 1 1 0 0 0
I want the above 2nd and 3rd generation calculations look like this:
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 2 2 0 0 0 0 0
0 0 0 0 2 1 1 1 1 2 0 0 0 0
0 0 0 2 1 1 1 1 1 1 2 0 0 0
0 0 2 1 1 1 1 1 1 1 1 2 0 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 2 1 1 1 1 1 1 1 1 1 1 2 0
0 0 2 1 1 1 1 1 1 1 1 2 0 0
0 0 0 2 1 1 1 1 1 1 2 0 0 0
0 0 0 0 2 1 1 1 1 2 0 0 0 0
0 0 0 0 0 2 2 2 2 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 3 3 3 3 0 0 0 0 0
0 0 0 0 3 2 2 2 2 2 3 0 0 0
0 0 0 3 2 1 1 1 1 2 3 0 0 0
0 0 3 2 1 1 1 1 1 1 2 3 0 0
0 3 2 1 1 1 1 1 1 1 1 2 3 0
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
3 2 1 1 1 1 1 1 1 1 1 1 2 3
0 3 2 1 1 1 1 1 1 1 1 2 3 0
0 0 3 2 1 1 1 1 1 1 2 3 0 0
0 0 0 3 2 1 1 1 1 2 3 0 0 0
0 0 0 3 2 2 2 2 2 2 3 0 0 0
0 0 0 0 3 3 3 3 3 3 0 0 0 0
I've tried numerous methods to filter out those duplicate cell blocks, but I can't find an easy and generic solution for solving the problem.
Any ideas how to get stroke/outline like I get from Photoshop or Paint.NET?
Thanks!
Cheers
P

The proper name is dilation, check out morphological operations. You should try dilation with circle element, this will give you the requested result.
Here is a Matlab code that shows how it is done:
im = imcircle(70);
im = padarray(im,[20,20]);
figure;imshow(im);
im2 = imdilate(im,strel('disk',8));
figure;imshow(im2);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

UNIX - Count occurrences of character per line between two fields and add new column with result - bash

You could use awk: awk '{s=0;for(i=7;i<=NF;i++) if($i==2) s+=1; s=s==0?1:2; print $0, s;}' data.txt Explanations: The instructions between the {} are executed on each line of the file. NF is the number of fields in the line. They are numbered 1 to NF and you can access them with the $n notation.

Related

numeric vs alphanumeric sort on ubuntu 18.04.2

Create 2D Circle from Existing Grid

how to record properties of other variables in stata

full adder gate

Algorithm for drawing an outline or stroke around any alpha transparent image

Categories

Resources