How to use multiple condition to generate a new variable in Stata? - sorting

I want to generate 3 NEW variables using these variables in my data set:
Ucod
19 variables in series by this name: Record_2, Record_3......Record_20
Both of them have values in alphanumerical format in it, basically ICD codes i.e, I150
I want to generate 3 new variables satisfying each of three new condition:
People dying primarily of COVID (Var1=1 if Ucod= U07.1)
People dying of a non-COVID condition WITH covid (Var2=1 IF Ucod != U07.1 & Record_2/20= U07.1)
People dying of a non-COVID condition WITHOUT covid (Var3=1 if Ucod != U07.1 & Record_2/20 != U07.1)
Can anyone suggest a code which can help me to generate these 3 variables using these 3 condition.

This may help. Note how I needed to define a toy dataset to give flavour to the problem.
* Example generated by -dataex-.
clear
input str5(Ucod Record_2) str4(Record_3 Record_4)
"U07.1" "U000" "U111" "U222"
"U999" "U07.1" "U444" "U333"
"U888" "U777" "U666" "U555"
end
gen wanted1 = Ucod == "U07.1"
gen count = 0
quietly foreach v of var Record_* {
replace count = count + (`v' == "U07.1")
}
gen wanted2 = Ucod != "U07.1" & count > 0
gen wanted3 = Ucod != "U07.1" & count == 0
list
+------------------------------------------------------------------------------+
| Ucod Record_2 Record_3 Record_4 wanted1 count wanted2 wanted3 |
|------------------------------------------------------------------------------|
1. | U07.1 U000 U111 U222 1 0 0 0 |
2. | U999 U07.1 U444 U333 0 1 1 0 |
3. | U888 U777 U666 U555 0 0 0 1 |
+------------------------------------------------------------------------------+

Related

Get line number where first occurrence of a value appears?

I have a CSV file like below:
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
I'm trying to use find the row number where the final column has a value below a certain range. For example, below 0.6.
Using the above CSV file, I want to return 3 because E = 3 is the first row where Mean <= 0.60. If there is no value below 0.6 I want to return 0. I am in effect returning the value in the first column based on the final column.
I plan to initialize this number as a constant in gnuplot. How can this be done? I've tagged awk because I think it's related.
In case you want a gnuplot-only version... if you use a file remove the datablock and replace $Data by your filename in " ".
Edit: You can do it without a dummy table, it can be done shorter with stats (check help stats). Even shorter than the accepted solution (well, we are not at code golf here), but additionally platform-independent because it's gnuplot-only.
Furthermore, in case E could be any number, i.e. 0 as well, then it might be better
to first assign E = NaN and then compare E to NaN (see here: gnuplot: How to compare to NaN?).
Script:
### conditional extraction into a variable
reset session
$Data <<EOD
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
EOD
E = NaN
stats $Data u ($8<=0.6 && E!=E? E=$1 : 0) nooutput
print E
### end of script
Result:
3.0
Actually, OP wants to return E=0 if the condition was not met. Then the script would be like this:
E=0
stats $Data u ($8<=0.6 && E==0? E=$1 : 0) nooutput
Another awk. You could initialize the default return value to var ret in BEGIN but since it's 0 there is really no point as empty var+0 produces the same effect. If the threshold value of 0.6 is not met before the ENDis reached, that is returned. If it is met, exit invokes the END and ret is output:
$ awk '
NR>1 && $NF<0.6 { # final column has a value below a certain range
ret=$1 # I want to return 3 because E = 3
exit
}
END {
print ret+0
}' file
Output:
3
Something like this should do the trick:
awk 'NR>1 && $8<.6 {print $1;fnd=1;exit}END{if(!fnd){print 0}}' yourfile

How to identify customers who didn't make/used incoming call, outgoing call, and internet during the churn phase?

I'm trying to solve a problem where data sets are below:
Cust_Id period Total_Incoming_Call Total_outgoing_call Net_uses
123 09/01/2018 0 0 2
234 09/02/2018 0 0 0
345 09/03/2018 1 40 1
abc1 09/04/2018 0 0 0
I'd like to get the output in below:
Cust_Id Period Total_Incoming_call Total_outgoing_call Net_uses
234 09/02/2018 0 0 0
abc1 09/04/2018 0 0 0
I know how to extract one column from pandas data frame but not sure how to extract multiple columns so I can tagged them as churn customers.
cust = pd.csv(....../.csv)
cust = cust[cust.net_uses == 0]
cust = cust[cust.Total_incoming_call ==0]
Should I used below or we have better method to do?
cust = cust[(cust.total_incoming_call==0)&(cust.net_uses ==0)]
cust = cust[(cust.total_incoming_call == 0) & (cust.net_uses == 0)] works just fine.
You can also use .loc for the same purpose:
cust = cust.loc[(cust.total_incoming_call == 0) & (cust.net_uses == 0), :]
In case you just want to replace values for which the condition is False:
cust = cust.where((cust.total_incoming_call == 0) & (cust.net_uses == 0))

IF, invalid syntax error

if volt.isalpha() or res.isalpha() or amp.isalpha():
What did I do wrong here? I get an INVALID SYNTAX, I am using this for a calculator program I am making. It calculates voltage, resistance, and amperage. But thats the easy part, I am just trying to make it fool proof. I have 3 variables in the code (volt, amp, res) that are inputted by the user. I just wanna make sure that they don't type in anything stupid. Like letters for e.g. ...
try:
float(volt) >= 0 and float(res) >= 0 and float(amp) >= 0
print("")
print("You put a value for everything. You don't need the calculator.")
allowed = 0
if volt.isalpha() or res.isalpha() or amp.isalpha():
print("You typed in characters for one of the values, this calculator doesn't use letters.")
allowed = 0
def find_voltage(a,b): # V = I * R
voltage = a * b
return(voltage)`
You don't have an except block after try - it is required. Do something like:
try:
float(volt) >= 0 and float(res) >= 0 and float(amp) >= 0
print("")
print("You put a value for everything. You don't need the calculator.")
allowed = 0
except ValueError:
print("Oops, you messed up.")
Additionally, the line
float(volt) >= 0 and float(res) >= 0 and float(amp) >= 0
doesn't do anything. You'll need to assign it to a variable, then check the results of the variable - if True, do one thing, if False, do something else.

ruby multiple loop sets but with limited rows per set

Alrightie, so I'm building an CSV file this time with ruby. The outer loop will run up to length of num_of_loops, but it runs for an entire set rather than up to the specified row. I want to change the first column of a CSV file to a new name for each row.
If I do this:
class_days = %w[Wednesday Thursday Friday]
num_of_loops = (num_of_loops / class_days.size).ceil
num_of_loops.times {
["Wednesday","Thursday","Friday"].each do |x|
data[0] = x
data[4] = classname()
# Write all to file
#
csv << data
end
}
Then the loop will run only 3 times for a 5 row request.
I'd like it to run the full 5 rows such that instead of stopping at Wed/Thurs/Fri it goes to Wed/Thurs/Fri/Wed/Thurs instead.
class_days = %w[Wednesday Thursday Friday]
num_of_loops.times do |i|
data[0] = class_days[i % class_days.size]
data[4] = classname
csv << data
end
The interesting part is here:
class_days[i % class_days.size]
We need an index into class_days that is between 0 and class_days.size - 1. We can get that with the % (modulo) operator. That operator yields the remainder after dividing i by class_days.size. This table shows how it works:
i i % 3
0 0
1 1
2 2
3 0
4 1
5 2
...
The other key part is that the times method yields indices starting with 0.

Ruby data extraction from a text file

I have a relatively big text file with blocks of data layered like this:
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
(they contain more lines and then are repeated)
I would like first to extract the numerical value after TUNE X = and output these in a text file. Then I would like to extract the numerical value of LINE FREQUENCY and AMPLITUDE as a pair of values and output to a file.
My question is the following: altough I could make something moreorless working using a simple REGEXP I'm not convinced that it's the right way to do it and I would like some advices or examples of code showing how I can do that efficiently with Ruby.
Generally, (not tested)
toggle=0
File.open("file").each do |line|
if line[/TUNE/]
puts line.split("=",2)[-1].strip
end
if line[/Line Frequency/]
toggle=1
next
end
if toggle
a = line.split
puts "#{a[1]} #{a[2]}"
end
end
go through the file line by line, check for /TUNE/, then split on "=" to get last item.
Do the same for lines containing /Line Frequency/ and set the toggle flag to 1. This signify that the rest of line contains the data you want to get. Since the freq and amplitude are at fields 2 and 3, then split on the lines and get the respective positions. Generally, this is the idea. As for toggling, you might want to set toggle flag to 0 at the next block using a pattern (eg SIGNAL CASE or ANALYSIS)
file = File.open("data.dat")
#tune_x = #frequency = #amplitude = []
file.each_line do |line|
tune_x_scan = line.scan /TUNE X = (\d*\.\d*)/
data_scan = line.scan /(\d*\.\d*E[-|+]\d*)/
#tune_x << tune_x_scan[0] if tune_x_scan
#frequency << data_scan[0] if data_scan
#amplitude << data_scan[0] if data_scan
end
There are lots of ways to do it. This is a simple first pass at it:
text = 'ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 1.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 1.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 1.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 2.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 2.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 2.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
'
require 'stringio'
pretend_file = StringIO.new(text, 'r')
That gives us a StringIO object we can pretend is a file. We can read from it by lines.
I changed the numbers a bit just to make it easier to see that they are being captured in the output.
pretend_file.each_line do |li|
case
when li =~ /^TUNE.+?=\s+(.+)/
print $1.strip, "\n"
when li =~ /^\d+\s+(\S+)\s+(\S+)/
print $1, ' ', $2, "\n"
end
end
For real use you'd want to change the print statements to a file handle: fileh.print
The output looks like:
# >> 0.2561890123390808
# >> 0.2561890123391E+00 0.204316425208E-01
# >> 0.2562865535359E+00 0.288712798671E-01
# >> 1.2561890123390808
# >> 1.2561890123391E+00 0.204316425208E-01
# >> 1.2562865535359E+00 0.288712798671E-01
# >> 2.2561890123390808
# >> 2.2561890123391E+00 0.204316425208E-01
# >> 2.2562865535359E+00 0.288712798671E-01
You can read your file line by line and cut each by number of symbol, for example:
to extract tune x get symbols from
10 till 27 on line 2
to extract LINE FREQUENCY get
symbols from 3 till 22 on line 6+n

Resources