how to deal with missing value in exposure variable - random

I am examining the effect of A on B. A is exposure variable and B is outcome variable.There are almost 16777 records in the dataset. Out of which 21 records have missing value in A.
A is a binary variable with 1 and 2, and my director ask me to treat missing value as a value. Now, A has three value:1,0 and missing. however, i think missing value is scare and random. Shall i remove the observation with missing value?

Related

what is pppm_lkg, pppm_dyn in McPAT?

Is there anybody working with McPAT?
Each power component in McPAT is actually multiplied by a constant array named pppm_lkg. They did not provide any information about this. They used a constant value only.
Could anyone tell me the meaning of pppm? Where does the value come from?

SPSS- assigning mulitple numeric codes to one variable

I am trying to assign multiple codes to existing variables. I am using the syntax below, but it will only assign the first code entered for that hosp.id.number.
Syntax example:
Do if (hosp.id.number=9037) or (hosp.id.number=1058) or (hosp.id.number=11256).
Compute role_EM_communication=10.
Else if (hosp.id.number=9037.
Compute role_EM_communication=11.
End if.
Execute.
hosp.id.number needs to be coded 10 and 11, but it will only code it at 10. Anyway to rephrase so that SPSS will accept 2 or more codes for a variable such as hosp.id.number?
Your role_EM_communication variable is a single variable, but from what you are saying, I think you need it to be a set (for the same record, it could take on more than just one code). So you need to create n variables named role_EM_communication_1 to role_EM_communication_n, where n is the maximum number of codes you estimate will be possible for one record.
For your example, it would translate like this:
create the 2 variables:
vector role_EM_communication_ (2, f2.0).
do the first recode:
if any(hosp.id.number,9037,1058,11256) role_EM_communication_1=10.
very important - execute the recode
exe.
check if the first variable has data, and populate the second variable if true:
if miss(role_EM_communication_1) and any(hosp.id.number,9037) role_EM_communication_1=11.
if ~miss(role_EM_communication_1) and any(hosp.id.number,9037) role_EM_communication_2=11.
exe.

Logic for parsing names

I am wanting to solve this problem, but am kind of unsure how to correctly structure the logic for doing this. I am given a list of user names and I am told to find an extracted name for that. So, for example, I'll see a list of user names such as this:
jason
dooley
smith
rob.smith
kristi.bailey
kristi.betty.bailey
kristi.b.bailey
robertvolk
robvolk
k.b.dula
kristidula
kristibettydula
kristibdula
kdula
kbdula
alexanderson
caesardv
joseluis.lopez
jbpritzker
jean-luc.vey
dvandewal
malami
jgarciathome
christophertroethlisberger
How can I then turn each user name into an extracted name? The only parameter I am given is that every user name is guaranteed to have at least a partial person's name.
So for example, kristi.bailey would be turned into "Kristi Bailey"
alexanderson would be turned into "Alex Anderson"
So, the pattern I see is that, if I see a period I will turn that into two strings (possibly a first and last name). If I see three periods then it will be first, middle. The problem I am having trouble finding the logic for is when the name is just clumped up together like alexanderson or jgarciathome. How can I turn that into an extracted name? I was thinking of doing something like if I see 2 consonants and a vowel in a row I would separate the names, but I don't think that'll work.
Any ideas?
I'd use a string.StartsWith method and a string.EndsWith method and determine the maximum overlap on each. As long as it's more than 2 characters, call that the common name. Sort them into buckets based on the common name. It's a naive implementation, but it that's where I'd start.
Example:
string name1 = "kristi.bailey";
string name2 = "kristi.betty.bailey";
// We've got a 6 character overlap for first name:
name2.StartsWith(name1.Substring(0,6)) // this is true
// We've got a 6 character overlap for last name:
name2.EndsWith(name1.Substring(7)) // this is true
HTH!

Create variable where I can reference its values by their position COBOL

First, let me thank you for taking your time to take a look at my question.
I'm studying COBOL and I've been giving an exercise where I have to create a variable that has a number of cities from a country, I know how to create this with the hierarchy thing like:
01 COUNTRY.
02 CITY-A PIC A(5) VALUE "TOKYO".
02
02
etc, etc.
The issue here is that I somehow need these values to be able to be referenced by their position. For example, I should be able to reference the CITY-A "TOKYO" by a number. Is there any way to do this? I just can't seem to figure this out.
Any help would be greatly appreciated.
Thanks again!
You need to define your table of cities. Continuing in the style you have shown, here is an example with VALUE clauses.
01 CITY-TABLE.
05 FILLER PIC X(30) VALUE "TOKYO".
05 FILLER PIC X(30) VALUE "KYOTO".
05 FILLER PIC X(30) VALUE "AIZUWAKMATSU".
... another 47 of these
Note that all the items must be the same length. "TOKYO" is five characters plus 25 trailing spaces. "AIZUWAKMATSU" is more characters, and fewer trailing spaces, but still 30 bytes. If you define your cities with fields of different lengths, you will not be able to reference them "by a number".
You will then need to use REDEFINES to give a different mapping of the data, in this case to give it a name which can be used for different occurrences in the table.
01 FILLER REDEFINES CITY-TABLE.
05 CITY-NAME PIC X(30) OCCURS 50 TIMES.
With your data-structure defined, you can try using it.
MOVE "TOKYO" TO CITY-NAME
Actually, you can't do that. The compiler will not let you. There are 50 CITY-NAME elements, and the compiler requires that you tell it which one you want to use, any time you use CITY-NAME.
To access an element in a table, you need to use a subscript.
A subscript follows its data-name ("qualified" if necessary), and is enclosed in brackets/parentheses.
MOVE "TOKYO" TO CITY-NAME ( 1 )
Now that will compile. It is using a numeric integer literal as a subscript. Not useful in every case, as often we will be "looping" to use a table.
MOVE "TOKYO" TO CITY-NAME ( some-name )
Here some-name can be data defined by the programmer (as a numeric integer) or as an "index", a numeric integer which the compiler will manage.
To establish an "index", you code INDEXED BY some-name on the item containing the OCCURS clause.
If you use a data-name as a subscript, it is just a data-name, and you are largely unlimited with the COBOL Verbs you can use with it.
If you use an index-name (the INDEXED BY) you can only change the value of the index-name with SET, PERFORM and SEARCH.
PERFORM VARYING some-name FROM 1 BY 1
UNTIL input-byte ( some-name ) EQUAL TO SPACE
OR some-name GREATER THAN 10
...
END-PERFORM
That is not such a good loop, but it is the sort of thing which will get you started.
Once complete, some-name will either be 11 (space not found) or it will be set to the value of the table which matched the first space in the data.
In the above PERFORM, some-name could either be an index-name or a data-name. The results will be the same, what the compiler generates will be different.
However, as a beginner, you should not need to be concerned with the differences between using a data-name as a subscript and using an index-name as a subscript. Your results will be the same, the generated code will be different. Leave it at that until you have some more experience.
You can have multi-dimensional tables. You can "offset" a subscript by a positive or negative amount. There is a hierachy of use for subscripts in the case of performance. There is a whole lot of understanding before you can accurately make use of "one being faster than another" (you can easily code the fastest access, and then loose much more than you save by doing the loop in a dumb way). There is also an index data item.
All of those things can be for later. Walk well first.
A final confusion (there are lots of confusions with subscripting) is there is reference-modification. Which looks a bit like using a subscript, but isn't, but can be used as a tortuous (from the point of view of later understanding) method of accessing a textual table.
MOVE "TOKYO" CITY-TABLE ( 1 : 30 )
Note that the table-name, rather than the entry-name, is being used. The colon (:) tells that it is reference-modification. Before the colon is the start-position, after the colon is the length. Both the start-position and the length can be data-names (but not index-names).
However,
MOVE CITY-TABLE ( VAR1 : 10 )
A typical sloppy usage of reference-modification, leaves the reader wondering what is being done with that data.
A subscripted item can be reference-modified, but again that is for later.

Numeric Filter and missing values (Weka)

I'm using SMOTE to oversample my dataset (affected by class imbalance). Some of my attributes have integer values, others have only two decimals but SMOTE creates new instances with many decimals. So to solve this problems I thought to use NumericCleaner Filter and set the number of decimals I desire. This seems to work but I've got problems with missing values. Each missing values is replaced with a 0.0 value, I need to evaluate my model using missing values in dataset. So how can I use NumericCleaner (or other filters that permit to round values) and keep my missing values?
Very interesting question. Okay, here is the solution:
use SMOTE to oversample the minority group (this produces decimal points but the missing values remain missing values)
then select weka filter->unsupervised->attribute->NumericTransform
then click on this filter and set the attribute instances (where you are having decimal points features) and in the methodName instead of "abs", put "ceil".
I hope that solves the problem.

Resources