How to recode this string variable into a new variable?

How to recode this string variable into a new variable? - sorting

I want to recode my variable Ucod in Stata with >100000 different observations into 3-4 classified values in the form of a new variable.
The problem is that I don't want to enter all the values of Ucod to recode. For example I want to use an if condition like if any value in Ucod starts with I (e.g, I234, I345, I587) recode the whole value to CVD.
I have tried using strpos() function using different conditions but I was unsuccessful.
Attaching picture of my data and variable Ucod

You could just use gen and a series of replace commands:
gen ucod_category = 0 if ucod >= "I00" & ucod <= "I519"
replace ucod_category = 1 if ucod >= "I60" & ucod <= "I698"
Then label these categories as CVD, Stroke, etc. This should sort in the expected way for your I10 codes with missing decimal points (e.g. "I519" < "I60").
However it might be more convenient to convert ucod into a number (with first digit 0 for A, 1 for B etc.) so that you can recode it with labels in a single command:
gen ucod_numeric = (ascii(substr(ucod, 0, 1)) - 65) * 1000 + real(substr(ucod, 1)) / cond(strlen(ucod) == 4, 10, 1)
recode ucod_numeric (800/851.9=0 "CVD") (860/869.8=1 "Stroke"), generate(ucod_category)
Again, this should sort in the expected order: I519 (which becomes 851.9) < I60 (860).
EDIT: since ascii isn't working (possibly a Stata version issue) you can try something like this to change the letter to a number.
gen ucod_letter_code = -1
forvalues i = 0/25 {
replace ucod_letter_code = `i' if substr(ucod, 1) == char(`i' + 65)
}
gen ucod_numeric = ucod_letter_code * 1000 + real(substr(ucod, 1)) / cond(strlen(ucod) == 4, 10, 1)
recode ucod_numeric (800/851.9=0 "CVD") (860/869.8=1 "Stroke"), generate(ucod_category)

Related

Non-associative RDOM parallellization in Halide

I am trying to write a decoder for GPU. My encoding scheme has data dependencies between lines. So when decoding columns of data each column depends on the previous. I want to parallellize the internal computation of each column, but execute each column one-by-one and sequentially, but I am having trouble getting this correctly.
Below I have modeled a toy example to show the problem:
Func f;
Var x,y;
RDom r(1,3,1,3); // goes from (1,1) to (4,4)
f(x,y) = 0;
f(0,y) = y;
Expr p_1 = f(r.x-1,r.y);
Expr p_2 = f(r.x-1,r.y-1);
f(r.x,r.y) = p_1 + p_2;
Buffer<int32_t> output_2D = f.realize({4,4});
A visualization of this program can be seen here: Serial Computation Visualisation
This reduction should give the following array():
int expected_output[4][4] = {{0,0,0,0},
{1,1,1,1},
{2,3,4,5},
{3,5,8,12}};
And checking using Catch2 I can see that it actually calculates it correctly
for(int j = 0; j < output_2D.height(); j++){
for(int i = 0; i < output_2D.width(); i++){
CAPTURE(i,j);
REQUIRE(expected_output[j][i]==output_2D(i,j));
}
}
My task is to speed this computation up. Since column one depends on column zero I have to calculate each column in series. I can however, calculate all the values in the column in parallel. Please see Computation Steps Parallel and Desired Pipeline to see how I want Halide to compute the pipeline.
I tried doing this in halide using the f.update(1).allow_race_conditions().parallel(r.y); and this does almost what I want.
f(r.x,r.y) = p_1 + p_2;
f.update(1).allow_race_conditions().parallel(r.y);
f.trace_stores();
Buffer<int32_t> output_2D = f.realize({4,4});
For some reason however, it seems that parallel(y) executes the columns in seemingly random order.
It yields the following store_trace:
Init Image:
Store f29.0(0, 0) = 0
Store f29.0(1, 0) = 0
....
Store f29.0(3, 3) = 0
Init first row:
Store f29.0(0, 0) = 0
Store f29.0(1, 0) = 1
Store f29.0(2, 0) = 2
Store f29.0(3, 0) = 3
Start Parallel Computation:
Store f29.0(1, 1) = 1 // First parallel column
Store f29.0(2, 1) = 1
Store f29.0(3, 1) = 1
Store f29.0(1, 3) = 5 // Second parallel column: THIS IS MY PROBLEM
Store f29.0(2, 3) = 5 // This should be column 2 not column 3.
Store f29.0(3, 3) = 5
Store f29.0(1, 2) = 3
Store f29.0(2, 2) = 4
Store f29.0(3, 2) = 5
A visualization of this pattern can be seen here in this figure: Current Pipeline.
I know that I explicitly enabling the race_conditions so I must be doing something wrong, but I dont know what is the right way to do this and this is the closest I got. I could vectorize() with respect to y and that gives the correct evaluation, but I want to use the parallel() block to gain greater speedup for larger matrixes/images. RFactor might be a solution as my problem should be associative in the y direction, but it might not work as it is non-associative in the x-direction(each column depends on the previous) Does anyone know how to be serial in x and parallel in y when using RDoms?

difference in results while using pixel value or int

While using Matlab for image processing (exactly improving img by Fuzzy Logic) I found a really strange thing. My fuzzy function is correct, I tested it on random values and they are basically simple linear functions.
function f = Udark(z)
if z < 50
f = 1;
elseif z > 125
f = 0;
elseif (z >= 50) && (z <= 125)
f = -z/75 + 125/75;
end
end
where z is a value of a pixel (in grayscale). Now there is a really strange thing going on.
f = -z/75 + 125/75;, where a is an image. However, it is giving really different results if used as an input. I.e. if I use a variable p = 99, the output of the function is 0.3467 as it should be, when if I use A(i,j) it is giving me result f=2. Since it is clearly impossible, I do not know where is the problem. I thought that maybe there is a case with the type of the variable but if I change it to uint8 it stays the same... If you know what's going on, please, let me know :)

1.Changed line:
f = (125/75) - (z/75);
After editing the third condition the resultant/transformed image has no pixel values of 2. Not sure if you intend to work with decimals. If decimals are necessary using the im2double() function to convert the image and scaling it up by a factor of 255 might suffice your needs. See heading 3 for rounding details.
2.Reading in Image and Testing:
%Reading in the image and applying the function%
Image = imread("RGB_Image.png");
Greyscale_Image = rgb2gray(Image);
[Image_Height,Image_Width] = size(Greyscale_Image);
Transformed_Image = zeros(Image_Height,Image_Width);
for Row = 1: +1: Image_Height
for Column = 1: +1: Image_Width
Pixel_Value = Greyscale_Image(Row,Column);
[Transformed_Pixel_Value] = Udark(Pixel_Value);
Transformed_Image(Row,Column) = Transformed_Pixel_Value;
end
end
subplot(1,2,1); imshow(Greyscale_Image);
subplot(1,2,2); imshow(Transformed_Image);
%Checking that no transformed pixels falls in this impossible range%
Check = (Transformed_Image > (125/75)) & (Transformed_Image ~= 1);
Check_Flag = any(Check,'all');
%Function to transform pixel values%
function f = Udark(z)
if z < 50
f = 1;
elseif z > 125
f = 0;
elseif (z >= 50) && (z <= 125)
f = (125/75) - (z/75);
end
end
3.Evaluating the Specifics of the Third Condition
Working with integers (uint8) will force the values to be rounded to the nearest integer. Any number that falls between the range (50,125] will evaluate to 1 or 0.
f = -z/75 + 125/75;
If z = 50.1,
-50.1/75 + 125/75 = 74.9/75 ≈ 0.9987 → rounds to 1
Using MATLAB version: R2019b

Caesar's cypher encryption algorithm

Caesar's cypher is the simplest encryption algorithm. It adds a fixed value to the ASCII (unicode) value of each character of a text. In other words, it shifts the characters. Decrypting a text is simply shifting it back by the same amount, that is, it substract the same value from the characters.
My task is to write a function that:
accepts two arguments: the first is the character vector to be encrypted, and the second is the shift amount.
returns one output, which is the encrypted text.
needs to work with all the visible ASCII characters from space to ~ (ASCII codes of 32 through 126). If the shifted code goes outside of this range, it should wrap around. For example, if we shift ~ by 1, the result should be space. If we shift space by -1, the result should be ~.
This is my MATLAB code:
function [coded] = caesar(input_text, shift)
x = double(input_text); %converts char symbols to double format
for ii = 1:length(x) %go through each element
if (x(ii) + shift > 126) & (mod(x(ii) + shift, 127) < 32)
x(ii) = mod(x(ii) + shift, 127) + 32; %if the symbol + shift > 126, I make it 32
elseif (x(ii) + shift > 126) & (mod(x(ii) + shift, 127) >= 32)
x(ii) = mod(x(ii) + shift, 127);
elseif (x(ii) + shift < 32) & (126 + (x(ii) + shift - 32 + 1) >= 32)
x(ii) = 126 + (x(ii) + shift - 32 + 1);
elseif (x(ii) + shift < 32) & (126 + (x(ii) + shift - 32 + 1) < 32)
x(ii) = abs(x(ii) - 32 + shift - 32);
else x(ii) = x(ii) + shift;
end
end
coded = char(x); % converts double format back to char
end
I can't seem to make the wrapping conversions correctly (e.g. from 31 to 126, 30 to 125, 127 to 32, and so on). How should I change my code to do that?

Before you even start coding something like this, you should have a firm grasp of how to approach the problem.
The main obstacle you encountered is how to apply the modulus operation to your data, seeing how mod "wraps" inputs to the range of [0 modPeriod-1], while your own data is in the range [32 126]. To make mod useful in this case we perform an intermediate step of shifting of the input to the range that mod "likes", i.e. from some [minVal maxVal] to [0 modPeriod-1].
So we need to find two things: the size of the required shift, and the size of the period of the mod. The first one is easy, since this is just -minVal, which is the negative of the ASCII value of the first character, which is space (written as ' ' in MATLAB). As for the period of the mod, this is just the size of your "alphabet", which happens to be "1 larger than the maximum value, after shifting", or in other words - maxVal-minVal+1. Essentially, what we're doing is the following
input -> shift to 0-based ("mod") domain -> apply mod() -> shift back -> output
Now take a look how this can be written using MATLAB's vectorized notation:
function [coded] = caesar(input_text, shift)
FIRST_PRINTABLE = ' ';
LAST_PRINTABLE = '~';
N_PRINTABLE_CHARS = LAST_PRINTABLE - FIRST_PRINTABLE + 1;
coded = char(mod(input_text - FIRST_PRINTABLE + shift, N_PRINTABLE_CHARS) + FIRST_PRINTABLE);
Here are some tests:
>> caesar('blabla', 1)
ans =
'cmbcmb'
>> caesar('cmbcmb', -1)
ans =
'blabla'
>> caesar('blabla', 1000)
ans =
'5?45?4'
>> caesar('5?45?4', -1000)
ans =
'blabla'

We can solve it using the idea of periodic functions :
periodic function repeats itself every cycle and every cycle is equal to 2π ...
like periodic functions ,we have a function that repeats itself every 95 values
the cycle = 126-32+1 ;
we add one because the '32' is also in the cycle ...
So if the value of the character exceeds '126' we subtract 95 ,
i.e. if the value =127(bigger than 126) then it is equivalent to
127-95=32 .
&if the value is less than 32 we subtract 95.
i.e. if the value= 31 (less than 32) then it is equivalent to 31+95
=126..
Now we will translate that into codes :
function out= caesar(string,shift)
value=string+shift;
for i=1:length(value)
while value(i)<32
value(i)=value(i)+95;
end
while value(i)>126
value(i)=value(i)-95;
end
end
out=char(value);

First i converted the output(shift+ text_input) to char.
function coded= caesar(text_input,shift)
coded=char(text_input+shift);
for i=1:length(coded)
while coded(i)<32
coded(i)=coded(i)+95;
end
while coded(i)>126
coded(i)=coded(i)-95;
end
end

Here Is one short code:
function coded = caesar(v,n)
C = 32:126;
v = double(v);
for i = 1:length(v)
x = find(C==v(i));
C = circshift(C,-n);
v(i) = C(x);
C = 32:126;
end
coded = char(v);
end

how to iterate one value twice in for loop (Python)

I am having a problem in really simple program. My problem is that i lose value (y = 50 or 100 or 150) because at that time first condition is not valid. so how can i repeat loop for let say y = 50. (i don't want to use '=' e.g y< = (50+increment) because this is just a dummy program.
thanks
increment = 0
b = 1
var = 0
for y in range(1,1000):
if y>= increment and y< (50+increment):
print(f'{y} in List {b}')
else:
var = y
increment += 50
b+=1

Convert Excel Column Number to Column Name in Matlab

I am using Excel 2007 which supports Columns upto 16,384 Columns. I would like to obtain the Column name corresponding Column Number.
Currently, I am using the following code. However this code supports upto 256 Columns. Any idea how to obtain Column Name if the column number is greater than 256.
function loc = xlcolumn(column)
if isnumeric(column)
if column>256
error('Excel is limited to 256 columns! Enter an integer number <256');
end
letters = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
count = 0;
if column-26<=0
loc = char(letters(column));
else
while column-26>0
count = count + 1;
column = column - 26;
end
loc = [char(letters(count)) char(letters(column))];
end
else
letters = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'];
if size(column,2)==1
loc =findstr(column,letters);
elseif size(column,2)==2
loc1 =findstr(column(1),letters);
loc2 =findstr(column(2),letters);
loc = (26 + 26*loc1)-(26-loc2);
end
end
Thanks

As a diversion, here is an all function handle example, with (almost) no file-based functions required. This is based on the dec2base function, since Excel column names are (almost) base 26 numbers, with the frustrating difference that there are no "0" characters.
Note: this is probably a terrible idea overall, but it works. Better solutions are probably found elsewhere in the file exchange.
First, the one file based function that I couldn't get around, to perform arbitrary depth function composition.
function result = compose( fnHandles )
%COMPOSE Compose a set of functions
% COMPOSE({fnHandles}) returns a function handle consisting of the
% composition of the cell array of input function handles.
%
% For example, if F, G, and H are function handles with one input and
% one output, then:
% FNCOMPOSED = COMPOSE({F,G,H});
% y = FNCOMPOSED(x);
% is equivalent to
% y = F(G(H(x)));
if isempty(fnHandles)
result = #(x)x;
elseif length(fnHandles)==1
result = fnHandles{1};
else
fnOuter = fnHandles{1};
fnRemainder = compose(fnHandles(2:end));
result = #(x)fnOuter(fnRemainder(x));
end
Then, the bizarre, contrived path to convert base26 values into the correct string
%Functions leading to "getNumeric", which creates a numeric, base26 array
remapUpper = #(rawBase)(rawBase + (rawBase>='A')*(-55)); %Map the letters 'A-P' to [10:26]
reMapLower = #(rawBase)(rawBase + (rawBase<'A')*(-48)); %Map characters '0123456789' to [0:9]
getRawBase = #(x)dec2base(x, 26);
getNumeric = #(x)remapUpper(reMapLower(getRawBase(x)));
%Functions leading to "correctNumeric"
% This replaces zeros with 26, and reduces the high values entry by 1.
% Similar to "borrowing" as we learned in longhand subtraction
borrowDownFrom = #(x, fromIndex) [x(1:(fromIndex-1)) (x(fromIndex)-1) (x(fromIndex+1)+26) (x((fromIndex+2):end))];
borrowToIfNeeded = #(x, toIndex) (x(toIndex)<=0)*borrowDownFrom(x,toIndex-1) + (x(toIndex)>0)*(x); %Ugly numeric switch
getAllConditionalBorrowFunctions = #(numeric)arrayfun(#(index)#(numeric)borrowToIfNeeded(numeric, index),(2:length(numeric)),'uniformoutput',false);
getComposedBorrowFunction = #(x)compose(getAllConditionalBorrowFunctions(x));
correctNumeric = #(x)feval(getComposedBorrowFunction(x),x);
%Function to replace numerics with letters, and remove leading '#' (leading
%zeros)
numeric2alpha = #(x)regexprep(char(x+'A'-1),'^#','');
%Compose complete function
num2ExcelName = #(x)arrayfun(#(x)numeric2alpha(correctNumeric(getNumeric(x))), x, 'uniformoutput',false)';
Now test using some stressing transitions:
>> num2ExcelName([1:5 23:28 700:704 727:729 1024:1026 1351:1355 16382:16384])
ans =
'A'
'B'
'C'
'D'
'E'
'W'
'X'
'Y'
'Z'
'AA'
'AB'
'ZX'
'ZY'
'ZZ'
'AAA'
'AAB'
'AAY'
'AAZ'
'ABA'
'AMJ'
'AMK'
'AML'
'AYY'
'AYZ'
'AZA'
'AZB'
'AZC'
'XFB'
'XFC'
'XFD'

This function I wrote works for any number of columns (until Excel runs out of columns). It just requires a column number input (e.g. 16368 will return a string 'XEN').
If the application of this concept is different than my function, it's important to note that a column of x number of A's begins every 26^(x-1) + 26^(x-2) + ... + 26^2 + 26 + 1. (e.g. 'AAA' begins on 26^2 + 26 + 1 = 703)
function [col_str] = let_loc(num_loc)
test = 2;
old = 0;
x = 0;
while test >= 1
old = 26^x + old;
test = num_loc/old;
x = x + 1;
end
num_letters = x - 1;
str_array = zeros(1,num_letters);
for i = 1:num_letters
loc = floor(num_loc/(26^(num_letters-i)));
num_loc = num_loc - (loc*26^(num_letters-i));
str_array(i) = char(65 + (loc - 1));
end
col_str = strcat(str_array(1:length(str_array)));
end
Hope this saves someone some time!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to recode this string variable into a new variable? - sorting

Related

Non-associative RDOM parallellization in Halide

difference in results while using pixel value or int

Caesar's cypher encryption algorithm

how to iterate one value twice in for loop (Python)

Convert Excel Column Number to Column Name in Matlab

Categories

Resources