I want to recode my variable Ucod in Stata with >100000 different observations into 3-4 classified values in the form of a new variable.
The problem is that I don't want to enter all the values of Ucod to recode. For example I want to use an if condition like if any value in Ucod starts with I (e.g, I234, I345, I587) recode the whole value to CVD.
I have tried using strpos() function using different conditions but I was unsuccessful.
Attaching picture of my data and variable Ucod
You could just use gen and a series of replace commands:
gen ucod_category = 0 if ucod >= "I00" & ucod <= "I519"
replace ucod_category = 1 if ucod >= "I60" & ucod <= "I698"
Then label these categories as CVD, Stroke, etc. This should sort in the expected way for your I10 codes with missing decimal points (e.g. "I519" < "I60").
However it might be more convenient to convert ucod into a number (with first digit 0 for A, 1 for B etc.) so that you can recode it with labels in a single command:
gen ucod_numeric = (ascii(substr(ucod, 0, 1)) - 65) * 1000 + real(substr(ucod, 1)) / cond(strlen(ucod) == 4, 10, 1)
recode ucod_numeric (800/851.9=0 "CVD") (860/869.8=1 "Stroke"), generate(ucod_category)
Again, this should sort in the expected order: I519 (which becomes 851.9) < I60 (860).
EDIT: since ascii isn't working (possibly a Stata version issue) you can try something like this to change the letter to a number.
gen ucod_letter_code = -1
forvalues i = 0/25 {
replace ucod_letter_code = `i' if substr(ucod, 1) == char(`i' + 65)
}
gen ucod_numeric = ucod_letter_code * 1000 + real(substr(ucod, 1)) / cond(strlen(ucod) == 4, 10, 1)
recode ucod_numeric (800/851.9=0 "CVD") (860/869.8=1 "Stroke"), generate(ucod_category)
I'm trying to make a random number game but the condition is always false even though I added the b = input box statement
Option Explicit
dim b,a,max,min
'To randomize variable (a)
max=3
min=1
Randomize
a = (Int((max-min+1)*Rnd+min))
b = inputbox("Guess a number from " & min & " to " & max)
If a = b Then
msgbox("you win")
Else
msgbox("you died it was " & a)
End If
I expected when you guessed the right number it would say you when but it always you died the number was #
You are almost there but as has been mentioned in the comments you do not populate the variable be with a values so the comparison will always be False.
If you are expecting b to be populated by the user you could ask for input via the InputBox() function by adding one line;
Option Explicit
Dim beans, b, a, max, min
'To randomize variable (a)
max = 100
min = 1
Call Randomize()
'Enter the line below to collect input from the user.
b = InputBox("Enter a number between " & min & " and " & max & ".")
'Remember to round the number to make sure you have a whole number.
a = Round((Int((max - min + 1) * Rnd() + min)))
If (a = b) Then
Call MsgBox("You win")
Else
Call MsgBox("You died it was " & a)
End If
You might also consider validating the input to make sure that the user enters a value between your min and max and responding accordingly if the value is invalid.
This matches 1 - 10.
Randomize
Num = Int((10 - 1 + 1) * Rnd + 1)
If CInt(Inputbox("Enter Number")) = Num Then
Msgbox "match"
Else
Msgbox "Nope it was " & Num
End If
The formula from help is Int((upperbound - lowerbound + 1) * Rnd + lowerbound). See http://download.microsoft.com/download/winscript56/Install/5.6/W982KMeXP/EN-US/scrdoc56en.exe.
Caesar's cypher is the simplest encryption algorithm. It adds a fixed value to the ASCII (unicode) value of each character of a text. In other words, it shifts the characters. Decrypting a text is simply shifting it back by the same amount, that is, it substract the same value from the characters.
My task is to write a function that:
accepts two arguments: the first is the character vector to be encrypted, and the second is the shift amount.
returns one output, which is the encrypted text.
needs to work with all the visible ASCII characters from space to ~ (ASCII codes of 32 through 126). If the shifted code goes outside of this range, it should wrap around. For example, if we shift ~ by 1, the result should be space. If we shift space by -1, the result should be ~.
This is my MATLAB code:
function [coded] = caesar(input_text, shift)
x = double(input_text); %converts char symbols to double format
for ii = 1:length(x) %go through each element
if (x(ii) + shift > 126) & (mod(x(ii) + shift, 127) < 32)
x(ii) = mod(x(ii) + shift, 127) + 32; %if the symbol + shift > 126, I make it 32
elseif (x(ii) + shift > 126) & (mod(x(ii) + shift, 127) >= 32)
x(ii) = mod(x(ii) + shift, 127);
elseif (x(ii) + shift < 32) & (126 + (x(ii) + shift - 32 + 1) >= 32)
x(ii) = 126 + (x(ii) + shift - 32 + 1);
elseif (x(ii) + shift < 32) & (126 + (x(ii) + shift - 32 + 1) < 32)
x(ii) = abs(x(ii) - 32 + shift - 32);
else x(ii) = x(ii) + shift;
end
end
coded = char(x); % converts double format back to char
end
I can't seem to make the wrapping conversions correctly (e.g. from 31 to 126, 30 to 125, 127 to 32, and so on). How should I change my code to do that?
Before you even start coding something like this, you should have a firm grasp of how to approach the problem.
The main obstacle you encountered is how to apply the modulus operation to your data, seeing how mod "wraps" inputs to the range of [0 modPeriod-1], while your own data is in the range [32 126]. To make mod useful in this case we perform an intermediate step of shifting of the input to the range that mod "likes", i.e. from some [minVal maxVal] to [0 modPeriod-1].
So we need to find two things: the size of the required shift, and the size of the period of the mod. The first one is easy, since this is just -minVal, which is the negative of the ASCII value of the first character, which is space (written as ' ' in MATLAB). As for the period of the mod, this is just the size of your "alphabet", which happens to be "1 larger than the maximum value, after shifting", or in other words - maxVal-minVal+1. Essentially, what we're doing is the following
input -> shift to 0-based ("mod") domain -> apply mod() -> shift back -> output
Now take a look how this can be written using MATLAB's vectorized notation:
function [coded] = caesar(input_text, shift)
FIRST_PRINTABLE = ' ';
LAST_PRINTABLE = '~';
N_PRINTABLE_CHARS = LAST_PRINTABLE - FIRST_PRINTABLE + 1;
coded = char(mod(input_text - FIRST_PRINTABLE + shift, N_PRINTABLE_CHARS) + FIRST_PRINTABLE);
Here are some tests:
>> caesar('blabla', 1)
ans =
'cmbcmb'
>> caesar('cmbcmb', -1)
ans =
'blabla'
>> caesar('blabla', 1000)
ans =
'5?45?4'
>> caesar('5?45?4', -1000)
ans =
'blabla'
We can solve it using the idea of periodic functions :
periodic function repeats itself every cycle and every cycle is equal to 2π ...
like periodic functions ,we have a function that repeats itself every 95 values
the cycle = 126-32+1 ;
we add one because the '32' is also in the cycle ...
So if the value of the character exceeds '126' we subtract 95 ,
i.e. if the value =127(bigger than 126) then it is equivalent to
127-95=32 .
&if the value is less than 32 we subtract 95.
i.e. if the value= 31 (less than 32) then it is equivalent to 31+95
=126..
Now we will translate that into codes :
function out= caesar(string,shift)
value=string+shift;
for i=1:length(value)
while value(i)<32
value(i)=value(i)+95;
end
while value(i)>126
value(i)=value(i)-95;
end
end
out=char(value);
First i converted the output(shift+ text_input) to char.
function coded= caesar(text_input,shift)
coded=char(text_input+shift);
for i=1:length(coded)
while coded(i)<32
coded(i)=coded(i)+95;
end
while coded(i)>126
coded(i)=coded(i)-95;
end
end
Here Is one short code:
function coded = caesar(v,n)
C = 32:126;
v = double(v);
for i = 1:length(v)
x = find(C==v(i));
C = circshift(C,-n);
v(i) = C(x);
C = 32:126;
end
coded = char(v);
end
dim a(100)
a(0)=9,a(1)=3,a(2)=-3,a(3)=8,a(4)=2
how can i find size of used array(i.e used size is 5
You have to count the non-empty elements:
Option Explicit
Function UsedElms(a)
UsedElms = 0
Dim i
For i = 0 To UBound(a)
If Not IsEmpty(a(i)) Then UsedElms = UsedElms + 1
Next
End Function
Dim a(5)
a(2) = 2
a(4) = 4
WScript.Echo "ub:", UBound(a), "sz:", UBound(a) + 1, "us:", UsedElms(a)
output:
cscript 23027576.vbs
ub: 5 sz: 6 us: 2
Here's a hacky one-liner that I just thought of. It essentially counts the number of empty elements by converting them to spaces and then trimming them off.
intLastIndex = UBound(a) - Len(Join(a, " ")) + Len(Trim(Join(a, " ")))
Just for fun! Don't go putting it into your production code. It would certainly be more efficient as a two-liner:
s = Join(a, " ")
intLastIndex = UBound(a) - Len(s) + Len(Trim(s))
Ekkehard has the proper answer here, though. This hack only works if your array is filled contiguously.
I am working with alphanumeric data from a mainframe. Due to the nature of the access point, the GetString method is used within a webbrowser interface to pull data from the mainframe. I am refactoring my code as well as older code to make use of data structures instead of merely range objects, as range object code takes far longer with large data sets.
As a part of general optimization practice, I run all large data set macros with Application.ScreenUpdating = False and Application.Calculation = xlCalculationManual active. To time it, I use QueryPerformanceCounter with a DoEvents after using the Counter in conjunction with the statusbar, so that it provides me the time it takes to complete a particular macro. The QueryPerformanceCounter is located in a Class Module and has played no direct role in executing the domain logic / business logic of my code.
For instance, I recently refactored code that pulled 10,000 or so strings from the mainframe screen and placed them into a worksheet via a loop. When refactored into a datastructure loop, the code takes around 70 seconds when shucking the strings into an array. The code is also more portable, in that those strings could as easily be shifted/placed to a dictionary for sorting or a collection for parsing. I am therefore switching all my VBA code from range-based to datastructures, and this is the lead-in/background for my question.
I came across some older code during an analysis project that has some interesting logic for pulling content from the mainframe. In essence, the code pulls content from the server in this layout form:
And then parses the the content into this form in an excel sheet using Worksheet/Cell logic as a framework:
The code, sans the login/access logic as well as sans subroutine declarations, is as follows:
Sub AcquireData()
CurrentServerRow = 13
WhileLoopHolder = 1
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)) <> "" Then
NewWorksheetLine_Sub
End If
Do While WhileLoopHolder = 1
If CurrentSession.Screen.Getstring(CurrentServerRow, 9, 1) = "-" Then
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow + 1, 15, 1)) <> "" Then
NewWorksheetLine_Sub
End If
ElseIf Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)) = "" Then
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14)) <> "" Then
Cells(WorksheetRow, ValueSets) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14))
ValueSets = ValueSets + 1
End If
Else
If CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1) = "" Then
Cells(WorksheetRow, WorksheetColumn) = "X"
Else
Cells(WorksheetRow, WorksheetColumn) = CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1)
End If
Cells(WorksheetRow, WorksheetColumn + 1) = CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)
Cells(WorksheetRow, WorksheetColumn + 2) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 17, 39))
Cells(WorksheetRow, ValueSets) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14))
WorksheetColumn = WorksheetColumn + 3
ValueSets = ValueSets + 1
End If
CurrentServerRow = CurrentServerRow + 1
If CurrentServerRow > 41 Then
WhileLoopHolder = 0
End If
Loop
End Sub
Sub NewWorksheetLine_Sub()
WorksheetRow = WorksheetRow + 1
WorksheetColumn = 1
ValueSets = 10
End Sub
This code is nested in a loop within another program, and thereby pulls thousands of lines and organizes them neatly. It also takes hours and wastes valuable time that could be used analyzing the data acquired from the server. I managed to refactor the basic code into a data structure, and used my learning to refactor other code as well. Unfortunately, I refactored this particularly code incorrectly, as I am unable to mimic the business logic correctly. My snippet is as follows:
Sub AcquireData()
'This code refactors the data into a datastructure from a range object, but does not really capture the logic.
'Also, There is an error in attempting to insert a variant array into a collection/dictionary data structure.
CurrentServerRow = 13
ReDim SourceDataArray(10)
WhileLoopHolder = 1
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)) <> "" Then
NewWorksheetLine_Sub
End If
Do While WhileLoopHolder = 1
If CurrentSession.Screen.Getstring(CurrentServerRow, 9, 1) = "-" Then
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow + 1, 15, 1)) <> "" Then
NewWorksheetLine_Sub
End If
ElseIf Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)) = "" Then
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14)) <> "" Then
ReDim Preserve SourceDataArray(ValueSets)
SourceDataArray(ValueSets) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14))
ValueSets = ValueSets + 1
ReDim Preserve SourceDataArray(ValueSets)
End If
Else
If CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1) = "" Then
ReDim Preserve SourceDataArray(WorkSheetColumn)
SourceDataArray(WorkSheetColumn) = "X"
Else
SourceDataArray(WorkSheetColumn) = CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1)
End If
SourceDataArray(WorkSheetColumn + 1) = CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)
SourceDataArray(WorkSheetColumn + 2) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 17, 39))
SourceDataArray(ValueSets) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14))
WorkSheetColumn = WorkSheetColumn + 3
ValueSets = ValueSets + 1
ReDim Preserve SourceDataArray(ValueSets)
End If
CurrentServerRow = CurrentServerRow + 1
If CurrentServerRow > 41 Then
WhileLoopHolder = 0
End If
Loop
End Sub
Sub NewWorksheetLine_Sub()
SourceIndexAsString = SourceCollectionIndex
SourceDataCollection.Add SourceDataArray(), SourceIndexAsString
SourceCollectionIndex = SourceCollectionIndex + 1
WorkSheetColumn = 1
ValueSets = 10
End Sub
I have considered that in order to use the same type of "cell" logic, I may want to use arrays nested within an array, and then transpose that to a worksheet. However, I have been thus far unsuccessful in implementing any such solution these past few weeks. Also, there may be a superior method of refactoring the logic to a datastructure form. However, I have been unable to determine how to do so successfully.
To summarize, my questions are as follows: In what way(s) can I shift "cell"-based logic to data structure logic? What is the best data structure for doing so? In this particular case, how can I implement the use of data structure logic with the this business logic?
Some of the use of ReDim Preserve seems problematic.
If CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1) = "" Then
ReDim Preserve SourceDataArray(WorkSheetColumn)
SourceDataArray(WorkSheetColumn) = "X"
So if WorksheetColumn had the value 1 we would have reduced SourceDataArray to being one entry in size and discarded all of the data in the higher locations in the array.
Else
SourceDataArray(WorkSheetColumn) = CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1)
End If
SourceDataArray(WorkSheetColumn + 1) = CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)
SourceDataArray(WorkSheetColumn + 2) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 17, 39))
Now we are potentially looking at entries in SourceDataArray which don't exist (i.e. when the If branch above was followed rather than the Else branch) and we should get a "Subscript out of range" error
ReDim Preserve only retains the data for array elements which make sense with the new array size. So if we have ReDim a(10) and then later have ReDim Preserve a(5) (and assume that arrays start at element 0 - i.e. no Option Base 1) then a(5) through a(9) now are inaccessible and the data they contained is lost
To refactor the code that uses cell references into an array you need to use a 2 dimensional array.
Cell references are 1 based, so you should stick to that in your array too.
You can copy Ranges to and from arrays using the Range.Value property
' Range to array
Dim a as Variant
a = Range("A1:J100").Value
will result in a being a variant array of size 1 To 100, 1 To 10
' Array to Range
Dim a(1 To 100, 1 To 10) as Variant
' populate a
' ...
' Put a into a range
Range("A1:J100").Value = a
These two code snippets result in the same output, but the second runs much faster
Dim r as Long, c as Long
For r = 1 To 1000
For c = 1 To 100
Cells(r, c) = r * c
Next c, r
Dim r as Long, c as Long
Dim a() as Variant
Redim a(1 To 1000, 1 To 100)
For r = 1 To 1000
For c = 1 To 100
a(r, c) = r * c
Next c, r
Range("A1:CV1000") = a
ReDim Preserve is a relatively expensive operation, so it's faster to ReDim in chunks
Rather than this
Redim a(1 To 10, 1 To 1)
For 1 = 1 to 100000
Redim Preserve a(1 To 10, 1 To i)
a(i) = SomeValue
Next
Do this instead
Redim a(1 To 10, 1 To 1000)
For 1 = 1 to 100000
If i > UBound(a) Then
Redim Preserve a(1 To 10, 1 To UBound(a) + 1000)
End If
a(i) = SomeValue
Next
Redim Preserve a (1 To 10, 1 To i - 1)
Redim Preserve can only change the last dimension of a multi dimensional array.
Eg This works
Redim a(1 to 100, 1 To 10)
Redim Preserve a(1 to 100, 1 To 20)
This does not work
Redim a(1 to 100, 1 To 10)
Redim Preserve a(1 to 200, 1 To 20)
Usually when working with arrays representing ranges, its the number of rows that varies most. This presents a problem, since the Range.Value array is (1 To Rows, 1 To Columns)
A work around is to actually dimension your array (1 To Columns, 1 To Rows). Redim number of rows as required, then Transpose into the destination range
Dim r As Long, c As Long
Dim a() As Variant
ReDim a(1 To 100, 1 To 200)
For r = 1 To 1000
For c = 1 To 100
If r > UBound(a, 2) Then
ReDim Preserve a(1 To UBound(a, 1), 1 To UBound(a, 2) + 200)
End If
a(c, r) = r * c
Next c, r
Range("A1:CV1000") = Application.Transpose(a)
If you need to vary both dimensions, to change the first dimension will require creating a new array of the required size and copying the data from the old array to the new one. Again, redim like this in chunks to avoid too many redim's
One last thing: you don't seem to Dim your variable (unless you've just left this part out of you post). I would reccomend you use Option Explicit and Dim all your variables. This helps to avoid data type mistakes, and also avoids using Variant for everything. Variants are fine when you need then, but when you don't, other data types are usually faster.
Once I spent a few weeks refactoring other macros from range-based logic to abstracted data structure logic, the answer hit me once I returned to this macro. If I am merely mimicking the range logic so as to more quickly complete the macro, then I need only fill the array such that it matches the range once it is transposed. This means that I do not need to trim the array or in any way manipulate its form - I only need to fill the data structure in array form, and then transpose it to the spreadsheet. I can also make alternative use of the data once the array is filled up.
Here is the solution code:
Sub AcquireData()
'The array 'MyArray' was dimensioned as a dynamic array in the declarations section at the top of the module.
'Redim the array to a big 2 dimensional array that fits the needs of the data/macro.
ReDim MyArray(1 To 20, 1 To 20000)
'From here on, simply mimic the logic of the range macro... [i]
CurrentServerRow = 13
WhileLoopHolder = 1
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)) <> "" Then
NewWorksheetLine_Sub
End If
Do While WhileLoopHolder = 1
If CurrentSession.Screen.Getstring(CurrentServerRow, 9, 1) = "-" Then
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow + 1, 15, 1)) <> "" Then
NewWorksheetLine_Sub
End If
ElseIf Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)) = "" Then
If Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14)) <> "" Then
'[i] ... except, move the values into the array in Column, Row logic form.
MyArray(ValueSets, WorksheetRow) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14))
ValueSets = ValueSets + 1
End If
Else
If CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1) = "" Then
MyArray(WorksheetColumn, WorksheetRow) = "X"
Else
MyArray(WorksheetColumn, WorksheetRow) = CurrentSession.Screen.Getstring(CurrentServerRow, 5, 1)
End If
MyArray(WorksheetColumn + 1, WorksheetRow) = CurrentSession.Screen.Getstring(CurrentServerRow, 9, 7)
MyArray(WorksheetColumn + 2, WorksheetRow) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 17, 39))
MyArray(ValueSets, WorksheetRow) = Trim(CurrentSession.Screen.Getstring(CurrentServerRow, 58, 14))
WorksheetColumn = WorksheetColumn + 3
ValueSets = ValueSets + 1
End If
CurrentServerRow = CurrentServerRow + 1
If CurrentServerRow > 41 Then
WhileLoopHolder = 0
End If
Loop
ArrayToWorkSheet_Sub
End Sub
Sub NewWorksheetLine_Sub()
WorksheetRow = WorksheetRow + 1
WorksheetColumn = 1
ValueSets = 10
End Sub
'When finished with the loop, push the array to the worksheet, and transpose it to provide the correct column to row relationship in the spreadsheet.
Sub ArrayToWorkSheet_Sub()
Dim ArrayLimit As Long
Dim LastCell As Long
Dim MyRange As Range
'This level of precision in setting the range appears unnecessary, but in theory I think it could speed up tranposing the array - [ii]
'[ii]but that is just speculation. Performance improvements for the tranposition appear to be minor, perhaps due to the fact that [iii]
'[iii]most - if not nearly all - of the intense computations occur earlier.
With Sheets("Sheet2")
ArrayLimit = UBound(MyArray, 2)
LastCell = ArrayLimit + 1
Set MyRange = .Range("A2:S" & LastCell)
MyRange = WorksheetFunction.Transpose(MyArray)
End With
End Sub
While both Application.ScreenUpdating = False and Application.Calculation = xlCalculationManual are invaluable in reducing macro runtime, I have had very positive experiences with combining those two lines with the use of abstracted data structures. It appears that data structures, in certain cases, appear to help in optimizing performance, especially where extensive line by line data extraction is involved in the macro process.