Combining add cases and add variables by merging files in SPSS - syntax

I would like to merge different SPSS files. The PAID indicates different persons. The files also contain the variable ID which indicates the moment of measurement. So ID=1 means that the data are results of measurement one (ID=2 ; measurement two etc.). However, not all data files contain the same moments of measurement.
I have already read the following post, but that has not completely answered my question:
SPSS - merging files with duplicate cases of ID variable and new cases/variables
Example data files
Data file 1:
PAID ID X1 X2 X3 X4
1 1 3 4 4 5
2 1 3 4 5 6
3 1 3 4 4 6
4 1 . . . .
Data file 2:
PAID ID X5 X6 X7
1 1 1 1 2
1 2 1 2 1
2 1 1 2 2
2 2 2 2 2
3 1 1 1 1
3 2 1 . .
4 1 1 1 1
4 2 2 2 2
I want the following result:
PAID ID X1 X2 X3 X4 X5 X6 X7
1 1 3 4 4 5 1 1 2
1 2 . . . . 1 2 1
2 1 3 4 5 6 1 2 2
2 2 . . . . 2 2 2
3 1 3 4 4 6 1 1 1
3 2 . . . . 1 . .
4 1 . . . . 1 1 1
4 2 . . . . 2 2 2
I think I have to use some combination of the functions add cases and add variables. However, is this possible within SPSS? And if so, how can I do this?
Thanks in advance!

This will do the job:
match files /file='path\DataFile1.sav' /file='path\DataFile2.sav'/by paid id.
Please note though, both files need to be sorted by paid id before running the match.
To demonstrate with your sample data:
*first preparing demonstration data.
DATA LIST list/paid id x1 to x4 (6f).
begin data.
1,1,3,4,4,5
2,1,3,4,5,6
3,1,3,4,4,6
4,1, , , ,
end data.
* instead of creating the data, you can can get your original data:
* get file="path\file name 1.sav".
sort cases by paid id.
dataset name DataFile1.
DATA LIST list/paid id x5 to x7 (5f).
begin data.
1,1,1,1,2
1,2,1,2,1
2,1,1,2,2
2,2,2,2,2
3,1,1,1,1
3,2,1, ,
4,1,1,1,1
4,2,2,2,2
end data.
sort cases by paid id.
dataset name DataFile2.
match files /file=DataFile1 /file=DataFile2/by paid id.
exe.
the result looks like this:
paid id x1 x2 x3 x4 x5 x6 x7
1 1 3 4 4 5 1 1 2
1 2 1 2 1
2 1 3 4 5 6 1 2 2
2 2 2 2 2
3 1 3 4 4 6 1 1 1
3 2 1
4 1 1 1 1
4 2 2 2 2

Related

Find unrelated partitions of a complete binary tree

I have a complete binary tree of height 'h'.
How do I find 'h' number of unrelated partitions for this ?
NOTE:
Unrelated partition means no child can be present with its immediate parent.
There is a constraint on the number of nodes in each partition.
The difference of the maximum number nodes in a partition and the minimum number of nodes in the partition can either be 0 or 1.
Also, root is excluded from including in the partitions.
Who devised the problem probably had a more elegant solution in mind, but the following works.
Let's say we have h partitions numbered 1 to h, and that the nodes of partition n have value n. The root node has value 0, and does not participate in the partitions. Let's call a partition even if nis even, and odd if n is odd. Let's also number the levels of the complete binary tree, ignoring the root and starting from level 1 with 2 nodes. Level n has 2n nodes, and the complete tree has 2h+1-1 nodes, but only P=2h+1-2 nodes belong to the partitions (because the root is excluded). Each partition consists of p=⌊P/h⌋ or p=⌈P/h⌉ nodes, such that ∑ᵢpᵢ=P.
If the height h of the tree is even, put all even partitions into the even levels of the left subtree and the odd levels of the right subtee, and put all odd partitions into the odd levels of the left subtree and the even levels of the right subtree.
If h is odd, distribute all partitions up to partition h-1 like in the even case, but distribute partition h evenly into the last level of the left and right subtrees.
This is the result for h up to 7 (I wrote a tiny Python library to print binary trees to the terminal in a compact way for this purpose):
0
1 1
0
1 2
2 2 1 1
0
1 2
2 2 1 1
1 1 3 3 2 2 3 3
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 4 4 4 4 4 4 4 1 3 3 3 3 3 3 3
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 4 4 1 1 1 1 1 1 3 3
3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
0
1 2
2 2 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
And this is the code that generates it:
from basicbintree import Node
for h in range(1, 7 + 1):
root = Node(0)
P = 2 ** (h + 1) - 2 # nodes in partitions
p = P // h # partition size (may be p or p + 1)
if h & 1: # odd height
t = (p + 1) // 2 # subtree tail nodes from split partition
n = (h - 1) // 2 # odd or even partitions in subtrees except tail
else: # even height
t = 0 # no subtree tail nodes from split partition
n = h // 2 # odd or even partitions in subtrees
s = P // 2 - t # subtree nodes excluding tail
r = s - n * p # partitions of size p + 1 in subtrees
x = [p + 1] * r + [p] * (n - r) # nodes indexed by subtree partition - 1
odd = [1 + 2 * i for i, c in enumerate(x) for _ in range(c)] + [h] * t
even = [2 + 2 * i for i, c in enumerate(x) for _ in range(c)] + [h] * t
for g in range(1, h + 1):
start = 2 ** (g - 1) - 1
stop = 2 ** g - 1
if g & 1: # odd level
root.set_level(odd[start:stop] + even[start:stop])
else: # even level
root.set_level(even[start:stop] + odd[start:stop])
print('```none')
root.print_tree()
print('```')
All trees produced up to height 27 have been programmatically confirmed to meet the specifications.
Some parts of the algorithm would need a proof, like, e.g., that it's always possible to choose an even size for the split partition in the odd height case, but this and other proofs are left as an exercise to the reader ;-)

Maximum k such that A[0]<A[k], A[1]<A[k+1], ..., A[k-1]<A[2*k-1], after sorting each k-sized window

I need the efficient algorithm for this problem (time comlexity less than O(n^2)), please help me:
a[i..j] is called a[i..j] < b[i..j] if a[i]<b[i], a[i+1]<b[i+1], ..., a[j]<b[j] after sorting these 2 arrays.
Given array A[1..n], (n<= 10^5, a[i]<= 1000). Find the maximum of k that A[1..k] < A[k+1..2k]
For example, n=10: 2 2 1 4 3 2 5 4 2 3
the answer is 4
Easily to see that k <= n/2. So we can use brute-forces (k from n/2 to 1), but not binary search.
And I don't know what to do with a[i] <= 1000. Maybe using map???
Use a Fenwick tree with range updates. Each index in the tree represents the count of how many numbers in window A are smaller than it. For the windows to be valid, each element in B (the window on the right) must have a partner in A (the window on the left). When we shift a number x into A, we add 1 to the range, [x+1, 1000] in the tree. For the element shifted from B to A, add 1 in its tree index. For each new element in B, add -1 to its index in the tree. If an index drops below zero, the window is invalid.
For the example, we have:
2 2 1 4 3 2 5 4 2 3
2 2
|
Tree:
add 1 to [3, 1000]
add -1 to 2
idx 1 2 3 4 5
val 0 -1 1 1 1 (invalid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4
|
Tree:
add 1 to [3, 1000]
add 1 to 2 (remove 2 from B)
add -1 to 1
add -1 to 4
idx 1 2 3 4 5
val -1 0 2 1 2 (invalid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2
|
Tree:
add 1 to [2, 1000]
add 1 to 1 (remove 1 from B)
add -1 to 3
add -1 to 2
idx 1 2 3 4 5
val 0 0 2 2 3 (valid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2 5 4
|
Tree:
add 1 to [5, 1000]
add 1 to 4 (remove 4 from B)
add -1 to 5
add -1 to 4
idx 1 2 3 4 5
val 0 0 2 2 3 (valid)
2 2 1 4 3 2 5 4 2 3
2 2 1 4 3 2 5 4 2 3
|
Tree:
add 1 to [4, 1000]
add 1 to 3 (remove 3 from B)
add -1 to 2
add -1 to 3
idx 1 2 3 4 5
val 0 -1 2 3 4 (invalid)

MATLAB: How to re-label a matrix based on another matrix?

Say I have matrix A:
A = [1 1 1 2 2 3 3 3;
1 1 1 2 2 3 3 3;
1 1 1 2 2 4 4 5;
2 2 2 2 2 5 5 5]
and matrix B with the same labels, just in different positions and not always with the same elements in each cluster:
B = [3 3 3 3 5 1 1 1:
3 3 3 3 5 1 1 1;
3 3 3 3 5 2 2 4:
5 5 5 5 5 4 4 4]
and I want matrix C to look like this
C = [1 1 1 1 2 3 3 3;
1 1 1 1 2 3 3 3;
1 1 1 1 2 4 4 5;
2 2 2 2 2 5 5 5]
Basically, I want the clusters in B that have a similar position to A to also have the same label as A, even if the clusters in B don't have the same exact amount of elements as the clusters in A. This is just a basic example because what I'm really working on are two images that have different labellings.
example of the image I'm working on

Algorithm - Check if there is a single type with the given word

Given a dictionary list and a input word, return true if the input word has a single typo with the same length as the vocabulary in the dictionary.
dictionary = ["apple", "testing", "computer"];
singleType(dictionary, "adple") // true
singleType(dictionary, "addle") // false
singleType(dictionary, "apple") // false
singleType(dictionary, "apples") // false
I proposed a solution that runs in linear time, if we ignore the pre-process time needed for the hashmap.
O(k*26) => O(k), where k = length of the input word
My linear solution goes like, convert the dictionary list into a hash-map, where the key is the word and the value is a boolean, then loop through every character in the input word, and replace every character with 1 of the 26 alphabet and check if it maps to the hash-map.
But they say I could do better than O(k*26), but how?
You could extend the dictionary with all the variants of the word containing a single typo, but instead of the actual typo, you just put some "wildcard" character like ? or * in that place. Then, you can check whether (a) the word is not in the set of correctly spelled words, and (b) replacing any of the letters in the word with the same wildcard symbol, the word can be found in the set of words having one typo.
Example in Python:
>>> dictionary = ["apple", "testing", "computer"]
>>> wildcard = lambda w: [w[:i]+"?"+w[i+1:] for i in range(len(w))]
>>> onetypo = {x for w in dictionary for x in wildcard(w)}
>>> correct = {w for w in dictionary}
>>> word = "apxle"
>>> word not in correct and any(w in onetypo for w in wildcard(word))
True
This reduces the complexity of lookup to just O(k), i.e. still linear in the number of letters, but without the high constant factor. It does, however, greatly blow up the dictionary by a factor equal to the average number of letters in the words.
For a single lookup, I would filter the dictionary by word length, and then iterate the words, counting the errors, and bail out of each word, as soon as the error count is > 1.
val dictionary = List ("affen", "ample", "apple", "appse", "ipple", "appl", "pple", "mapple", "apples")
#annotation.tailrec
def oneError (w1: String, w2:String, err: Int) : Boolean = w1.length match {
case 0 => err == 1
case _ => if (err > 1) false else {
if (w1(0) == w2(0)) oneError (w1.substring (1), w2.substring (1), err) else
oneError (w1.substring (1), w2.substring (1), err + 1)
}
}
scala> dictionary.filter (_.length == 5).filter (s => oneError ("appxe", s, 0))
res5: List[String] = List(apple, appse)
For processing a longer text, I would preprocess the dictionary and split it into Maps (word.length -> List (words)).
For natural language, which is highly redundant, I would build a Set of unique words from the text, to lookup every word just once.
For a single word lookup, the worst case is n calls to the initial function, with n=max (dictionary.groupBy (w.length)).
Each word lookup (of words longer 1) will take at least 2 steps until failure, but most words, supposed no pathological input and dictionary, are only visited for 2 steps. From the remaining ones, most are excluded after 3 steps and so on.
Here is a version, which shows how deep it looks:
def oneError (word: String) : Array[String] = {
#tailrec
def oneError (w1: String, w2:String, steps: Int, err: Int) : Boolean = w1.length match {
case 0 => {print (s"($steps) "); err == 1}
case _ => if (err > 1) {print (s"$steps "); false } else {
if (w1(0) == w2(0)) oneError (w1.substring (1), w2.substring (1), steps +1, err) else
oneError (w1.substring (1), w2.substring (1), steps + 1, err + 1)
}
}
val d = dict (word.length)
println (s"Info: ${d.length} words of same length")
d.filter (entry => oneError (word, entry, 0, 0))
}
Sample output, redacted:
scala> oneError ("fuck")
Info: 3352 words of same length
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 (4) 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 (4) (4) 3 3 3 3
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 3 3 (4) (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) 3 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 (4) (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 (4) 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 (4) 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
res53: Array[String] = Array(Buck, Huck, Puck, buck, duck, funk, luck, muck, puck, suck, tuck, yuck)
It sounds like you are looking for the edit distance of 1 of your pattern with respect to the dictionary entry. For example, an edit distance of 1 would result if the pattern is "adple" and your dictionary entry is "apple". You have an additional constraint that the pattern is the same length as the dictionary entry, but this is easy to implement.

Julia: sorting a matrix by 2 columns in different orders

I need to sort a four column matrix in Julia by the third column in ascending order then by the fourth column in descending order.
The easiest way to do chained lexicographic sorting on columns in an arbitrary order is to pass a transformation by function: sortrows(A, by=x->(x[3],x[4]))… but that's just lexicographic with both columns ascending. In order to do fancier behaviors, you can pass a custom comparison function to sortrows:
julia> A = rand(1:3,6,4)
6x4 Array{Int64,2}:
3 1 1 2
1 1 3 1
1 1 2 1
2 1 3 3
1 3 3 1
2 3 2 3
julia> sortrows(A, lt=(x,y)->isless(x[3],y[3]) || (isequal(x[3],y[3]) && isless(y[4],x[4])))
6x4 Array{Int64,2}:
3 1 1 2
2 3 2 3
1 1 2 1
2 1 3 3
1 1 3 1
1 3 3 1

Resources