spss search for value in dataset

spss search for value in dataset - full-text-search

I'd like to find any cases of a value (e.g., 0) in any cell in an SPSS database. What syntax would accomplish this?
(I came across a python script but don't have that option.)

It is still not very clear how you want to select those cases. But the below syntax will list in the output any cases which have ate least one "0" in any of the variables var1,var2 or var3. I am assuming CaseID is the case identifier variable.
TEMPORARY.
SELECT IF ANY(0,var1,var2,var3).
LIST CaseID var1 var2 var3.
You can use as many variables as you want in the ANY function, and also on the LIST command.

The following syntax will create a list of appearances of 0 within your data - In a separate file:
First creating some fake data to demonstrate on.
data list list/ID (a6) test1 to test6 (6f2).
begin data
ID_001 2 3 2 3 0 3
ID_002 3 4 0 4 3 4
ID_003 0 4 2 4 2 4
ID_004 7 0 1 2 8 3
ID_005 5 5 5 0 5 5
ID_006 4 5 4 5 4 0
end data.
dataset name origData.
Now to create the list:
dataset copy ForList.
dataset activate ForList. /* the list will be created from a copy of the data.
varstocases /make vals from test1 to test6/index testNum(vals).
select if vals=0.
You can use the list in the new file, or put it in the output window:
list ID testNum.

Related

Most efficiently insert a number in a maintained large sorted variable

I need to most efficiently insert a number in a maintained large sorted variable. Is there a better method than test1?
test1 is quite a bit faster vs test2 which is just to append a variable then resort.
q←1000000⍴0 ⋄ q←10 9 8 7 6 5 4 3 2,q ⍝q is kept sorted
test1←{
y←⍺(⍳∘1≤)⍵ ⍝ very fast
(y↑⍺),⍵,(y↓⍺) ⍝ is there a tacit version here and without copying?
}
10↑q test1 6
10 9 8 7 6 6 5 4 3 2
cmpx 'q test1 6'
3.2E¯4
test2←{y←⍵,⍺ ⋄ y[⍒y]}
10↑q test2 6
10 9 8 7 6 6 5 4 3 2
cmpx 'q test2 6'
1.5E¯3
I tried presorted variable. With test1 is quicker than appending then sorting. Perhaps test1 refactored with better tacit?

Possibly not the answer you are looking for, but in a production application, if access to these sorted keys with frequent appends was an important performance consideration in a Dyalog APL application, you might resort to something like the following class. The strategy is to have an unsorted variable data which can be appended to efficiently using a method called Append. Sorting is done on demand, if needed (there is room for further optimisation by checking whether the appended value is greater than the last element in the list, which would be worthwhile if that was a common case).
:Class Sorted
:Property Values
:Access Public
∇Set value
data←value
sorted←0
∇
∇r←Get value
:If ~sorted
sorteddata←data[⍒data]
sorted←1
:EndIf
r←sorteddata
∇
:EndProperty
∇ Make initial
:Implements Constructor
:Access Public
data←initial
sorted←0
∇
∇ r←Append values
:Access Public
data,←values
r←sorted←0
∇
:EndClass
Usage would be along the lines of:
s←⎕NEW Sorted (10 9 8 7 6 5 4 3 2,1E6⍴0)
s.Append 6
s.Append 7
≢s.Values
100011

Create values of new data frame variable based on other column values

I have a question about data set preparation. In a survey, the same people were asked about a number of different variables at two points of measurement. This resulted in a dataset in long format, i.e. information from each participant is stored in two rows. Each row represents the data of this person at the respective time of measurement (see example). Individuals have individual participation codes. The same participation code thus indicates that the data is from the same person.
code
time
risk_perception
DB6M
1
6
DB6M
2
4
TH4D
1
2
TH4D
2
3
Now I would like to create a new variable "risk_perception.complete", which shows me whether the information for each participant is complete. It could be that a person has not given any information at both measurement times or only at one of the two measurement times and therefore values are missing (NAs).In the new variable I would like to check and code this information for each person. If the person has one or more NAs, then a 0 should be coded there. If the person has no NAs, then there should be a 1 (see example).
code
time
risk_perception
risk_perception.complete
DB6M
1
6
1
DB6M
2
4
1
TH4D
1
2
1
TH4D
2
3
1
SU6H
1
NA
0
SU6H
2
3
0
VG9S
1
NA
0
VG9S
2
NA
0
Can anyone tell me the best way to program this?
Here is reproducible example:
data <- data.frame(
code = c("AH6M","AH6M","BD7M","BD7M","SH9L","SH9L"),
time = c(1,2,1,2,1,2),
risk = c(6,7,NA,3,NA,NA))
Thank you in advance and best regards!

How to show top N number of results with customization in spark rdd?

val sorting = sc.parallelize(List(1,1,1,2,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,7,8,8,8,8,8))
sorting.map(x=>(x,1)).reduceByKey((a,b)=>a+b).map(x=>(x._1,"==>",x._2)).sortBy(s=>s._2,false).collect.foreach(println)
output:
(8,==>,5)
(1,==>,3)
(2,==>,4)
(3,==>,3)
(4,==>,4)
(5,==>,3)
(6,==>,2)
(7,==>,1)
I want to show only top 3 results and remove , (comma) from the result.

use take(3) instead of collect to get the top 3 results, and then clean up the output manually:
sorting.map(x=>(x,1)).reduceByKey((a,b)=>a+b).sortBy(s=>s._2,false).map(x=>s"${x._1} ${x._2}").take(3).foreach(println)
8 5
2 4
4 4

Nested level logic

I have to write a logic for a problem and use it on a landing page. I am unable to write it.
1 - IT 1
2 - IT 1
3 - IT 1
4 - IT 2
5 - IT 2
6 - IT 2
7 - IT 2
8 - IT 3
9 - IT 4
Problem Statement:
-Till someone is selecting 1 or 2 or 3, only IT 1 is suggested
-When someone chooses 4 or 5 or 6 or 7 and anything with value 3 or below , IT 2 is suggested
-When someone chooses 8 or or 8 + anything below 8 then IT 3 is suggested
-When someone chooses 9 and anything with a value below 9 then IT 4 is suggested.
I was using if condition but it seems that whenever IT2 is satisfied IT3 is also satisfied. How to write the logic?

https://jsfiddle.net/bhanusingh/7fxet35h/9/
Don't nest your ifs. Just write four ifs like in your problem description and make a helper function to make it easier to read.
Below is a pseudo code where numbers represent references to checkboxes. isAnySelected is a helper function that takes a list of checkbox references and returns true if any of those checkboxes are checked.
if (isAnySelected([1,2,3]) )
return IT1
if (isAnySelected([4,5,6,7]) AND isAnySelected([1,2,3]) )
return IT2
if (isAnySelected([8]) AND NOT isAnySelected([9]))
return IT3
if (isAnySelected([9]) AND isAnySelected([1,2,3,4,5,6,7,8]))
return IT4
Note that I added "not 9" rule to #3 so that 8 and 9 selected produces IT4
Solved on Reddit: https://www.reddit.com/r/learnprogramming/comments/bpukvf/i_got_a_very_complex_problem_for_me_i_can_only/enxst62?utm_source=share&utm_medium=web2x

bash: add column if row name is repeated

I have a file with several variables in rows and values of these variables in columns. Some rows are repeated and only contain data for some of the columns (e.g. is the example below, the second time "A" appears, it only contains data in columns S1 and S2)
Example:
Variable S1 S2 S3
A 3 5 6
B 4 5 6
A some_string another_string
C 2 5 6
What I want is to add another (or several) columns that contain the data from the repeated row
Output example:
Variable S1 S2 S3 new_column1 new_column2
A 3 5 6 some_string another_string
B 4 5 6
C 2 5 6
I am thinking that something like the code below could get me there, but it's still erroneous and I'm not sure if it is even possible to do in bash?
My code would only be able to create ONE new column and I don't know how I can add the data to that new column.
I found those pieces of code in an other question that was similar, but not quite what I want, so I would appreciate any help!
awk 'NR==1{$5="new_column";print;next} seen[$1]++ {$5=$2}' file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

spss search for value in dataset - full-text-search

I'd like to find any cases of a value (e.g., 0) in any cell in an SPSS database. What syntax would accomplish this? (I came across a python script but don't have that option.)

Related

Most efficiently insert a number in a maintained large sorted variable

Create values of new data frame variable based on other column values

How to show top N number of results with customization in spark rdd?

Nested level logic

bash: add column if row name is repeated

Categories

Resources