I would like to take a value in the element JobNumberString, which is listed multiple times, and compare it to the previous JobNumberString to see if it is 1 digit higher? Is this possible?
EXAMPLE:
JobNumberString [1] = 00
JobNumberString [2] = 01
JobNumberString [3] = 03
Give an Error, since this is not 1 higher than previous value.
Related
What do the last two lines do? As far as I understand, these lines loop through the list h_nwave and calculate the weighted quantiles, if syear2digit == 'nwave' , i.e. calculate 5 quantiles for each year. But I'm not sure if my understanding is correct. Also is this equivalent to using group() function?
h_nwave "91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15"
generate quantile_ip = .
forvalues number = 1(1)15 {
local nwave : word `number' of `h_nwave'
xtile quantile_ip_`nwave' = a_ip if syear2digit == `nwave' [ w = weight ], nq(5)
replace quantile_ip = quantile_ip_`nwave' if syear2digit == `nwave'
}
I try to convert this into R with forloop, mutate, xtile (statar package required) and case_when. However, so far I cannot find a suitable way to get similar result.
There is no source or context for this code.
Detail: The first command is truncated and presumably should have been
local h_nwave 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
Detail: The first list contains 25 values, presumably corresponding to years 1991 to 2015. But the second list implies 15 values, so we are only looking at 91 to 05.
Main idea: xtile bins to quintile bins on variable a_ip, with weights. So the lowest 20% of observations (taking weighting into account) should be in bin 1, and so on. In practice observations with the same value must be assigned to the same bin, so 20-20-20-20-20 splits are not guaranteed, quite apart from the small print of whether sample size is a multiple of 5. So, the result is assignment to bins 1 to 5, and not quintiles themselves, or any other kind quantiles.
This is done separately for each survey wave.
The xtile command is documented for everyone at https://www.stata.com/manuals/dpctile.pdf regardless of personal or workplace access to Stata.
In R, you may well be able to produce quintile bins for all survey years at once. I have no idea how to do that.
Otherwise put, the loop arises because xtile doesn't work on separate subsets in one command call. There are community-contributed Stata commands that allow that. This kind of topic is much discussed on Statalist.
Suppose that I have the file data.dat with follow content:
Days 1 2 4 6 10 15 20 30
Group 01 37.80 30.67 62.88 86.06 26.24 98.49 65.42 61.28
Group 02 38.96 72.99 38.24 74.11 39.54 91.59 81.14 91.22
Group 03 82.34 75.25 82.58 28.22 39.21 81.30 41.30 42.48
Group 04 75.52 42.83 66.80 20.50 94.08 74.78 95.09 53.16
Group 05 89.32 56.78 30.05 68.07 59.18 94.18 39.77 67.56
Group 06 70.03 78.71 37.59 60.55 46.40 82.73 67.34 93.38
Group 07 67.83 88.73 48.01 62.19 49.40 67.68 25.97 58.98
Group 08 61.15 96.06 59.62 39.42 60.06 94.18 76.06 32.02
Group 09 65.61 72.39 54.07 92.79 56.58 39.14 81.81 39.16
Group 10 59.65 77.81 40.51 68.49 66.15 80.33 87.31 42.07
The final intention is create a histogram using histogram clustered.
Besides the graph, I need of some values from data.dat such as
size_x, size_y, min, max, and mean. To achieve the last task I used
set datafile separator tab
stats 'data.dat' skip 1 matrix
The summed up output was:
* MATRIX: [9 X 10]
Minimum: 0.0000 [ 0 0 ]
Maximum: 98.4900 [ 6 0 ]
Mean: 56.0549
The size_x and size_y values are correct – 9 columns and 10 rows – but the min is not.
This is due to the fact that the first column is string-type.
When I include every
set datafile separator tab
stats 'data.dat' skip 1 matrix every ::1
to skip the first column, the summed up output is:
* MATRIX: [9 X 8]
Minimum: 20.5000 [ 0 3 ]
Maximum: 98.4900 [ 5 0 ]
Mean: 63.0617
This time the min and max values are right, but the size_y (shown 8, expected 9) and index from min (expected [ 3 3 ]) is not.
What is going on? I made some mistake? I'm not noticing something?
The program tries to read a value from the first field of each row, sees "Group xx" and ends up filling in 0 for that entry. You need to tell it to skip the first column.
Amended answer
I think there is a bug here, as well as confusion between documentation and the actual implementation. The matrix rows and columns as implemented by the every selector are indexed from 0 to N-1 as they would be for C language arrays. The documentation incorrectly states or at least implies that the first row and column is matrix[1][1] rather than [0][0]. So the full command needed for your case is
gnuplot> set datafile sep tab
gnuplot> stats 'data.dat' every 1:1:1:1 matrix
warning: matrix contains missing or undefined values
* FILE:
Records: 80
Out of range: 0
Invalid: 0
Header records: 0
Blank: 10
Data Blocks: 1
* MATRIX: [9 X 8]
Mean: 63.0617
Std Dev: 20.6729
Sample StdDev: 20.8033
Skewness: -0.1327
Kurtosis: 1.9515
Avg Dev: 17.4445
Sum: 5044.9400
Sum Sq.: 352332.2181
Mean Err.: 2.3113
Std Dev Err.: 1.6343
Skewness Err.: 0.2739
Kurtosis Err.: 0.5477
Minimum: 20.5000 [ 0 3 ]
Maximum: 98.4900 [ 5 0 ]
I.e. every 1:1:1:1 tells it for both rows and columns the index increment is 1 and the submatrix starts at [1][1] rather than at the origin [0][0].
The output values are all correct, but the indices shown for the size [9 x 8] and the min/max entries are wrong. I will file a bug report for both issues.
I got sidetracked trying to characterize the bug revealed by the original answer and forgot to mention a simpler alternative. For this specific case of one row of column headers and one column of rowheaders, gnuplot provides a special syntax that works without error:
set file separator tab
stats 'data.dat' matrix rowheaders columnheaders
this is my first time on here. I searched and couldn't find anything relevant. Trying to work something out:
Where a=1, b=2, c=3 ... z=26
If you were to create a series where it goes through every possible outcome of letters and using 1 character length in numerical order, the total possible number of outcomes is 26 (26^1). You easily figure "e" would be on line 5 of the series. "y" would be line 25.
If you set the parameters to a 2 character length, the total number of combinations is 676 (26^2), "aa" would be line 1, "az" would be line 26, "ba" would be line 27, "zz" would be line 676. This is easily calculated, and can be done no matter what the character length is, you will always find what line it would be on in the series.
My question is how do you do it in reverse? Using the same parameters, 1 will obviously be "aa", 31 will be "be". How do you work out with a formula that 676 will be "zz"? 676, based on the parameters set, can only be "zz", it can't be any other set of characters. So there should be a way of calculating this, no matter how long the number is, as long as you know the parameters of the series.
If length of characters was 10, what characters would be on line 546,879,866, for example?
Is this even doable? Thanks so much in advance
It is enough to translate 546,879,866 into 26-base number. For example in bash:
echo 'obase=26 ; 546879866' | bc
01 20 00 19 03 23 00
And if your prefere 10 caracters you should fill the number from the beginning:
00 00 00 01 20 00 19 03 23 00
Just note that numeration starts from 0 which is mean a=00, b=01, … z=25.
I'm looking for a hint to an algorithm or pseudo code which helps me calculate sequences.
It's kind of permutations, but not exactly as it's not fixed length.
The output sequence should look something like this:
A
B
C
D
AA
BA
CA
DA
AB
BB
CB
DB
AC
BC
CC
DC
AD
BD
CD
DD
AAA
BAA
CAA
DAA
...
Every character above represents actually an integer, which gets incremented from a minimum to a maximum.
I do not know the depth when I start, so just using multiple nested for loops won't work.
It's late here in Germany and I just can't wrap my head around this. Pretty sure that it can be done with for loops and recursion, but I have currently no clue on how to get started.
Any ideas?
EDIT: B-typo corrected.
It looks like you're taking all combinations of four distinct digits of length 1, 2, 3, etc., allowing repeats.
So start with length 1: { A, B, C, D }
To get length 2, prepend A, B, C, D in turn to every member of length 1. (16 elements)
To get length 3, prepend A, B, C, D in turn to every member of length 2. (64 elements)
To get length 4, prepend A, B, C, D in turn to every member of length 3. (256 elements)
And so on.
If you have more or fewer digits, the same method will work. It gets a little trickier if you allow, say, A to equal B, but that doesn't look like what you're doing now.
Based on the comments from the OP, here's a way to do the sequence without storing the list.
Use an odometer analogy. This only requires keeping track of indices. Each time the first member of the sequence cycles around, increment the one to the right. If this is the first time that that member of the sequence has cycled around, then add a member to the sequence.
The increments will need to be cascaded. This is the equivalent of going from 99,999 to 100,000 miles (the comma is the thousands marker).
If you have a thousand integers that you need to cycle through, then pretend you're looking at an odometer in base 1000 rather than base 10 as above.
Your sequence looks more like (An-1 X AT) where A is a matrices and AT is its transpose.
A= [A,B,C,D]
AT X An-1 ∀ (n=0)
sequence= A,B,C,D
AT X An-1 ∀ (n=2)
sequence= AA,BA,CA,DA,AB,BB,CB,DB,AC,BC,CC,DC,AD,BD,CD,DD
You can go for any matrix multiplication code like this and implement what you wish.
You have 4 elements, you are simply looping the numbers in a reversed base 4 notation. Say A=0,B=1,C=2,D=3 :
first loop from 0 to 3 on 1 digit
second loop from 00 to 33 on 2 digits
and so on
i reversed i output using A,B,C,D digits
loop on 1 digit
0 0 A
1 1 B
2 2 C
3 3 D
loop on 2 digits
00 00 AA
01 10 BA
02 20 CA
03 30 DA
10 01 AB
11 11 BB
12 21 CB
13 31 DB
20 02 AC
21 12 BC
22 22 CC
...
The algorithm is pretty obvious. You could take a look at algorithm L (lexicographic t-combination generation) in fascicle 3a TAOCP D. Knuth.
How about:
Private Sub DoIt(minVal As Integer, maxVal As Integer, maxDepth As Integer)
If maxVal < minVal OrElse maxDepth <= 0 Then
Debug.WriteLine("no results!")
Return
End If
Debug.WriteLine("results:")
Dim resultList As New List(Of Integer)(maxDepth)
' initialize with the 1st result: this makes processing the remainder easy to write.
resultList.Add(minVal)
Dim depthIndex As Integer = 0
Debug.WriteLine(CStr(minVal))
Do
' find the term to be increased
Dim indexOfTermToIncrease As Integer = 0
While resultList(indexOfTermToIncrease) = maxVal
resultList(indexOfTermToIncrease) = minVal
indexOfTermToIncrease += 1
If indexOfTermToIncrease > depthIndex Then
depthIndex += 1
If depthIndex = maxDepth Then
Return
End If
resultList.Add(minVal - 1)
Exit While
End If
End While
' increase the term that was identified
resultList(indexOfTermToIncrease) += 1
' output
For d As Integer = 0 To depthIndex
Debug.Write(CStr(resultList(d)) + " ")
Next
Debug.WriteLine("")
Loop
End Sub
Would that be adequate? it doesn't take much memory and is relatively fast (apart from the writing to output...).
A code is the assignment of a unique string of characters (a
codeword) to each character in an alphabet.
A code in which the codewords contain only zeroes and ones is
called a binary code.
All ASCII codewords have the same length. This ensures that an
important property called the prefix property holds true for the
ASCII code.
The encoding of a string of characters from an alphabet (the
cleartext) is the concatenation of the codewords corresponding to
the characters of the cleartext, in order, from left to right. A code
is uniquely decodable if the encoding of every possible cleartext
using that code is unique.
Based on the above information I was trying to do some exercises:
Considering the following matrix:
Code1 Code2 Code3 Code4
A 0 0 1 1
B 100 1 01 01
C 10 00 001 001
D 11 11 0001 000
The confusions:
Are all the above assignment considered as codes since they have a unique string of characters???
I understand that code 1 and code 2 are prefix free since they do not have equal length. Having said that, if you have a look at code 4 for alphabets D and C it cosists of 3 digits. Would code 4 be considered prefix free too?
Is code 3 the only uniquely decodable code?
I think you have misunderstood the prefix property - it isn't mainly about length (but enforcing the same length n on each code point will make the code prefix-free - you cannot have unique codes otherwise).
Rather, it is about uniquely being able to identify each code point so that a decoder greedily can take the first translation that matches. In the case of fixed length, the decoder knows that it has to read n digits.
In the case of variable length code like Code1, you don't know upon reading 10 if that can be translated to C or if it is the first two digits of the three-digit B - 10 is a prefix of 100. The same holds true for Code2: 0 is a prefix for 00 and 1 is a prefix of 11.
Consider reading the sequence 100 one digit at a time:
Code1:
Read 1
; "1" does not match any code - Remember the 1 and continue.
Read 0
; "10" matches reduction "C" - or is this the beginning of a "B"? Darn!
Read 0
; Ok, this was either "CA" or "B" - but there is no way of knowing which one.
Hope this helps you forward!