Latent Class Model using lclogit__Error happened - overflow

I am a novice to Stata.
After conjoint analysis,
a problem occurred during Latent Class Analysis.
There are 6 conjoint properties (program, accessibility, accommodation type, facilities, privacy, price), and the levels are pro (3 levels), acc (3), stay (3), fac (3), privacy (2), price (3).
I used Stata 14.
The command syntax is like this.
lclogit choice pro pro1 acc stay stay1 fac fac1 privacy price, group(gid) id(pid) ncl(2) member(gender married household disease)
Dependent variable: choice
Independent variables: pro pro1 acc stay stay1 fac fac1 privacy price
Memberships : gender married household disease
An error message appears like this:
3,000 (group size) take 1,000 (# positives) combinations results in numeric overflow; computations cannot proceed
r(1400);
error message description is like this
error. . . . . . . . . . . . . . . . . . . . . . . . return code 1400
numerical overflow;
You have attempted something that, in the midst of the
necessary calculations, has resulted in something too large
for Stata to deal with accurately. Most commonly, this is
an attempt to estimate a model (say with `regress`) with more
than 2,147,483,647 effective observations. This effective
number could be reached with far fewer observations if you
were running a frequency-weighted model.
I changed different memberships and did it again and again, I got the same results.
Please advise.

Related

How can I make a complex ifelse algorithm which comprehend dates and time?

I have got a data-management problem. I have a database where "EDSS.1","EDSS.2",... represent a numeric variable, scaled from 0 to 10 (0.5 scale), where higher number stand for higher disability. For each EDSS variable, I have a "VISITDATE.1", "VISITDATE.2",...
EDSS
VISITDATE
Now I am interested in assessing the CONFIRMED DISABILITY PROGRESSION (CDP), which is an increased i 1 poin on the EDSS. To make things more difficult, this increment need to be confirmed in the subsequent visit (e.g. EDSS.3) which has to be >= 6 months (which is, VISITDATE.3 - VISITDATE.2 > 6 months.
To do so, I am creating a nested ifelse statement, as showed below.
prova <- prova %>% mutate(
CDP = ifelse(EDSS.2 > EDSS.1 & EDSS.3>=EDSS.2 & difftime(VISITDATE.3,VISITDATE.2,
units = "weeks") > 48,
print(ymd(VISITDATE.2)),0))
However, I am facing the following main problems:
How can I print the VISIT.DATE of my interest instead of 1 or 0?
How can I shift my code to the EDSS.2,EDSS.3, and so on? I am interested in finding all the confirmed disability progressions (CDPs).
Many thanks to everyone who find the time to answer me.

Algorithm for planning the creation of an object from dependencies

I am an alchemist. I can make things out of other things according to my recipe book. For instance:
2 lead + 1 bismuth -> 1 carbon
1 oxygen + 5 hydrogen + 3 nitrogen -> 2 carbon
5 carbon + 5 titanium -> 1 gold
...etc.
My recipe book contains thousands of recipes, each of which consumes some discrete amount of one or more inputs and produces a discrete amount of one output. Being a lazy alchemist, I don't want to remember all my recipes. I want to write a computer program to solve this problem for me. The input to the program is a description of what I want, like "2 gold", and a description of what I have in stock, like "5 titanium, 6 lead, 3 bismuth, 2 carbon, 1 gold". The output should be either "cannot be made" or a sequence of instructions for creating the thing. For the example given here, the output could be:
make 2 carbon out of 4 lead + 2 bismuth
make 1 gold out of 4 carbon + 4 titanium
Then, combined with the 1 gold I already have, I have the 2 gold I wanted.
One last note: the recipes are weighted; e.g. I prefer to make carbon out of lead and bismuth if I can.
Is there an elegant way to formulate and solve this problem? A naive recursive solution looks tempting, but I can think of recipe sets that would cause it to do an exponential amount of work.
(And, as a followup, someday my research might uncover a circular set of recipes---maybe I can make 1 hydrogen out of 1 helium and 1 helium out of 1 hydrogen---and I would like to be able to handle this case as well.)
The problem is NP-hard.
Given an instance of CNF-SAT, prepare alchemical tables with reagents for
each variable
each literal
each clause (unsatisfied version)
each clause (satisfied version)
the output.
The reactions are
variable to large supply of corresponding positive literal
variable to large supply of corresponding negative literal
clause (unsatisfied version) and satisfying literal to clause (satisfied version)
all clauses (satisfied versions) to the output.
The question is whether we can make the output given one of each variable and one of each clause (unsatisfied version).
This problem is related to the problem of determining reachability of vector addition systems/Petri nets; my reduction is based in part on reductions that appeared in that literature.

Hard to understand Haskell memory allocation behaviour

I stumbled upon Haskell and FP and got stunned by the possibilities. And the old maths nerd inside me had no trouble writing naive code for actual useful purposes. However inspite of all the reading I still really have a hard time understanding how not to hit some surprising performance bottlenecks.
So I write very short pieces of code with naive implementations and then try little changes to see how the performance responds.
And here's one example I really can't get around to understand... I wrote this function that finds a solution to Josephus problem, on purpose with a naive list implementation.
m = 3
n = 3000
main = putStr $ "Soldier #" ++ (show $ whosLeft [1..n]) ++ " survived...\n"
whosLeft [lucky] = lucky
whosLeft soldiers = whosLeft $ take (length soldiers -1) $ drop m $ cycle soldiers
The latter runs in 190 ms, with a productivity of 63% according to the RTS.
Then the first thing I wanted to try was to remove the (length soldier -1) and replace it with a decrementing integer.
The running time leaped up to 900 ms and productivity down to 16%, and uses 47 times more memory than the simpler code above ! So I added strict evaluation, forced the Int type, tried things like removing the global variables and others, yet not to much avail. And I just can't understand this slowdown.
m = 3::Int
n = 3000::Int
main = putStr $ "Soldier #" ++ (show $ whosLeft n [1..n]) ++ " survived...\n"
whosLeft 1 [lucky] = lucky
whosLeft n' soldiers = n' `seq` left `seq` whosLeft (n'-1) left
where left = take (n'-1) $ drop m $ cycle soldiers
I have sieved through performance related articles and posts, but I just don't seem to get a hint about this. Still being a Haskell noob I must be missing something big... How can this one added parameter (pre-chewed computation...) reduce the speed so much ?
PS : I know, if Josephus really had been with 3000 soldiers, they wouldn't have needed to suicide...
The first solution forces the whole spine of the soldiers list by taking its length. The second solution only forces (using seq) the head of the soldiers list. Replace the left in-between the seqs with a length left and you'll get your performance back.

I am a beginner programmer and want to design a Battleship program in LC3 assembly

I don't really know how or where to start my algorithm on showing the display grid for the game.
Design the game in pseudocode first, so that you can get the general ideas worked out (data structures, algorithms, etc). Once you have a theoretical design worked out then you can start coding.
Here's a good place to start:
http://twoguysonebit.com/2010/02/16/code-battleship-game-written-in-mips-assembly/
The simplest way would to just dump the display out as strings of text:
0 1 2 3 4 5 6 7 8 9
0 x . B B B B * x x .
1 . . . . . x . . . .
etc.
And let the map scroll.
Later, if you want to get fancy, you can (likely) output ANSI escape codes for an ANSI terminal, then you can just update the screen as you like...old school.

Aggregating automatically-generated feature vectors

I've got a classification system, which I will unfortunately need to be vague about for work reasons. Say we have 5 features to consider, it is basically a set of rules:
A B C D E Result
1 2 b 5 3 X
1 2 c 5 4 X
1 2 e 5 2 X
We take a subject and get its values for A-E, then try matching the rules in sequence. If one matches we return the first result.
C is a discrete value, which could be any of a-e. The rest are just integers.
The ruleset has been automatically generated from our old system and has an extremely large number of rules (~25 million). The old rules were if statements, e.g.
result("X") if $A >= 1 && $A <= 10 && $C eq 'A';
As you can see, the old rules often do not even use some features, or accept ranges. Some are more annoying:
result("Y") if ($A == 1 && $B == 2) || ($A == 2 && $B == 4);
The ruleset needs to be much smaller as it has to be human maintained, so I'd like to shrink rule sets so that the first example would become:
A B C D E Result
1 2 bce 5 2-4 X
The upshot is that we can split the ruleset by the Result column and shrink each independently. However, I cannot think of an easy way to identify and shrink down the ruleset. I've tried clustering algorithms but they choke because some of the data is discrete, and treating it as continuous is imperfect. Another example:
A B C Result
1 2 a X
1 2 b X
(repeat a few hundred times)
2 4 a X
2 4 b X
(ditto)
In an ideal world, this would be two rules:
A B C Result
1 2 * X
2 4 * X
That is: not only would the algorithm identify the relationship between A and B, but would also deduce that C is noise (not important for the rule)
Does anyone have an idea of how to go about this problem? Any language or library is fair game, as I expect this to be a mostly one-off process. Thanks in advance.
Check out the Weka machine learning lib for Java. The API is a little bit crufty but it's very useful. Overall, what you seem to want is an off-the-shelf machine learning algorithm, which is exactly what Weka contains. You're apparently looking for something relatively easy to interpret (you mention that you want it to deduce the relationship between A and B and to tell you that C is just noise.) You could try a decision tree, such as J48, as these are usually easy to visualize/interpret.
Twenty-five million rules? How many features? How many values per feature? Is it possible to iterate through all combinations in practical time? If you can, you could begin by separating the rules into groups by result.
Then, for each result, do the following. Considering each feature as a dimension, and the allowed values for a feature as the metric along that dimension, construct a huge Karnaugh map representing the entire rule set.
The map has two uses. One: research automated methods for the Quine-McCluskey algorithm. A lot of work has been done in this area. There are even a few programs available, although probably none of them will deal with a Karnaugh map of the size you're going to make.
Two: when you have created your final reduced rule set, iterate over all combinations of all values for all features again, and construct another Karnaugh map using the reduced rule set. If the maps match, your rule sets are equivalent.
-Al.
You could try a neural network approach, trained via backpropagation, assuming you have or can randomly generate (based on the old ruleset) a large set of data that hit all your classes. Using a hidden layer of appropriate size will allow you to approximate arbitrary discriminant functions in your feature space. This is more or less the same idea as clustering, but due to the training paradigm should have no issue with your discrete inputs.
This may, however, be a little too "black box" for your case, particularly if you have zero tolerance for false positives and negatives (although, it being a one-off process, you get an arbitrary degree of confidence by checking a gargantuan validation set).

Resources