Splunk associate: specific Reference_Value of a Reference_Key is not being returned in the restul - correlation

I'm trying to correlate two variables X_1, X_2, each is a binomial variable.
I'm running this command: | associate supcnt=1 improv=0 supfreq=0 on my query results, but for Reference_Key=X_1, only one of the possible two Reference_Values is being displayed. I'm pretty sure I've verified that the support count of both X_1=x_a and X_1=x_b are greater than 1, x_a and x_b being the only two possible values of X_1.
Why would it not display the result for one of the reference values?

Related

SPSS: generate 'fake' survey data using rv.uniform without losing value labels

I have a pretty straightforward survey dataset. Each row is a respondent, and each column is a question. Responses have a value that is a whole number, and each number has a label.
Now, I need to replace all of those values with fake data to use in a training. I need something that looks and feels like the original dataset, but isn't actually client data.
I started by replacing my variables with random number values:
COMPUTE Q1=RV.UNIFORM(1,2).
EXECUTE.
COMPUTE Q2=RV.UNIFORM(1,36).
EXECUTE.
COMPUTE Q3=RV.NORMAL(50, 13).
EXECUTE.
(rv.normal/rv.uniform depending on what kind of data I'm trying to fake - age versus multiple-choice question, for example).
This works, but then when I try and generate crosstabs, export the dataset w value labels, etc., the labels aren't applied to the columns with fake data. As far as I can tell, my fake numbers are in the exact same format they were in before - numeric, no decimals, width of 2, nominal. The labels still appear in the variable view, but they aren't actually being applied.
I'd really prefer not to have to manually re-label every one of these columns, because there's quite a few of them. Any ideas for how to get around this issue? Or is there a smarter way to generate fake data?
Your problem is the RV.UNIFORM and the RV.NORMAL functions do not generate integers - they generate decimal numbers. You may have your display hide the decimal numbers by having 0 decimals in the variable view, but they are still there (you can check this by adding decimals in the variable view).
So you neen another step of turning your decimals into integers. For example, the following are two ways to get a random 1 or 2 (integers):
COMPUTE Q1=rnd(RV.UNIFORM(1,2)).
or
COMPUTE Q1=trunc(RV.UNIFORM(1,3)).
Once the numbers generated are integers corresponding to the value labels definition, you should be able to see the labels in the output.

Randomly assign a value from a list to each agent entering the process

In my Anylogic model I have a option list "Issue". In "Issue" I put twelve values ("a", "b" and so on...). How can I assign randomly one of these values to each agent (in my model agents are customers) and be sure that value assigned is different for each agent entered?
If you want a random option list value (say for each agent you create from a Source block), use a Custom Distribution which returns each option list value with a given probability. (In the Custom Distribution properties interface, it talks about "Number of observations" but these can also just be probabilities; it just uses the relative values of these settings to determine how likely each outcome is.)
If you want to uniquely assign the option list values (but in a random order) you'll need to use some Java to do so:
Store the option list values (which you can get an array of via the option list's values() function) in a list (AnyLogic Collection). Use a LinkedList because that is more efficient to remove them from.
Store the number of alternatives remaining in a variable. (Let's call this n for simplicity.)
Each time you want to allocate a value, sample a random number from 1 to n (sample from a discrete uniform distribution using uniform_discr(1,n)) and remove that entry from the list (list's remove(int index) function), assigning it to the agent. Decrement the num-alternatives-remaining variable.
Obviously you have to ensure that the number of agents created does not exceed the number of option list values (or have some scheme to 'reset' the situation in some way at that point).

Outcome difference: using list & for-loop vs. single parameter input

This is my first question, so please let me know if I'm not giving enough details or asking a question that is not relevant on this platform!
I want to compute the same formula over a grid running from 0 to 4.0209, therefore I'm using a for-loop with an defined array using numpy.
To be certain that the for-loop is right, I've computed a selection of values by just using specific values for the radius an input in the formula.
Now, the outcomes with the same input of the radius is just slightly different. Do I interpret my grid wrongly? Or is there an error in my script?
It probably is something pretty straightforward, but maybe some of you can find a minute to help me out.
Here I use a selection of values for my radius parameter.
Here I use a for-loop to compute over a distance
Here are the differences in the outcomes:
Outcomes computed with for-loop:
9.443,086753902220000000
1.935,510475232510000000
57,174050755727700000
1,688894026484580000
0,020682674424032700
Outcomes computed with selected radii:
9.444,748178731630000000
1.938,918526458330000000
57,476599453309800000
1,703815523775800000
0,020957378277984600

Stata: order a dataset using a custom sorting order

I have a dataset where numeric variable VARSORT takes only 3 values: 10, 20 and 30 (there are no missings).
I would like to sort observations based on VARSORT but where the custom sort order would be the following : 20 first, then 10, then 30.
Is it possible to do that?
You just need to sort on a variable with the desired order, which could be, among many other solutions,
gen varsort2 = cond(varsort == 20, -10, varsort)
There is no option to specify a custom order without specifying a variable. Clearly Stata has the idea that a dataset may be sorted by one or more variables. If that's so, then keeping track of such variables is crucial to Stata noting whether a dataset has changed (which includes a change in the sort order). That mechanism could not work in the same way if a variable or variables were not used to indicate sort order.

Convert sequence of numbers to random-looking IDs?

I'm working on an application where I need to generate unique, non-sequential IDs. One of the constraints I have is that they must consist of 3 digits followed by 2 letters (only about 600k IDs). Given my relatively small pool of IDs I was considering simply generating all possible IDs, shuffling them and putting them into a database. Since, internally, I'll have a simple, sequential, ID to use, it'll be easy to pluck them out one at a time & be sure I don't have any repeats.
This doesn't feel like a very satisfying solution. Does anyone out there have a more interesting method of generating unique IDs from a limited pool than this 'lottery' method?
This can be done a lot of different ways, depending on what you are trying to optimize (speed, memory usage, etc.).
ID pattern = ddd c1c[0]
Option 1 (essentially like hashing, similar to Zak's):
1 Generate a random number between 0 and number of possibilities (676k).
2- Convert number to combination
ddd = random / (26^2)
c[0] = random % (26)
c[1] = (random / 26) % 26
3- Query DB for existence of ID and increment until a free one is found.
Option 2 (Linear feedback shift register, see wikipedia):
1- Seed with a random number in range (0,676k). (See below why you can't seed with '0')
2- Generate subsequent random numbers by applying the following to the current ID number
num = (num >> 1) ^ (-(num & 1u) & 0x90000u);
3- Skip IDs larger than range (ie 0xA50A0+)
4- Convert number into ID format (as above)
*You will need to save the last number generated that was used for an ID, but you won't need to query the DB to see if it is used. This solution will enumerate all possible IDs except [000 AA] due to the way the LFSR works.
[edit] Since your range is actually larger than you need, you can get back [000 AA] by subtracting 1 before you convert to the ID and have your valid range be (0,0xA50A0]
Use a finite group. Basically, take a 32 or 64-bit integer, and find a large number that is coprime to the maximum value for your integer; call this number M. Then, for all integers n, n * M will result in a unique number that has lots of digits.
This has the advantage that you don't need to pre-fill the database, or run a separate select query -- you can do this all from within one insert statement, by having your n just be an auto-increment, and have a separate ID column that defaults to the n * M.
You could generate a random ID conforming to that standard, do a DB select to see if it exists already, then insert it into a DB to note it has been "used". For the first 25% of the life of that scheme (or about 150k entries), it should be relatively fast to generate new random ID's. After that though, it will take longer and longer, and you might as well pre-fill the table to look for free IDs.
Depending on what you define as sequential, you could just pick a certain starting point on the letters, such as 'aa', and just loop through the three digits, so it would be:
001aa
002aa
003aa
Once you get to zz then increment the number part.
You could use modular arithmetic to generate ids. Pick a number that is coprime with 676,000 and for a seed. id is the standard incrementing id of the table. Then the following pseudocode is what you need:
uidNo = (id * seed) % 676000
digits = uidNo / 676
char1 = uidNo % 26
char2 = (uidNo / 26) % 26
uidCode = str(digits) + chr(char1+65) + chr(char2+65)
If a user has more than one consecutively issued id, they could guess the algorithm and the seed and generate all the ids in order. This may mean the algorithm is not secure enough for your use case.

Resources