How to sum columns in SPSS Syntax - syntax

I have a table in SPSS that contains multiple columns, like this:
+--------+-------+-------+-------+-------+
| | Col 1 | Col 2 | Col 3 | Total |
+--------+-------+-------+-------+-------+
| Data 1 | 10 | 1 | 30 | 41 |
| Data 2 | 4 | 10 | 10 | 24 |
| Data 3 | 3 | 40 | 1 | 44 |
| Data 4 | 10 | 5 | 3 | 18 |
+--------+-------+-------+-------+-------+
I want to add a row at the bottom that calculates the total of each column. in the end, it would look something like this:
+--------+-------+-------+-------+-------+
| | Col 1 | Col 2 | Col 3 | Total |
+--------+-------+-------+-------+-------+
| Data 1 | 10 | 1 | 30 | 41 |
| Data 2 | 4 | 10 | 10 | 24 |
| Data 3 | 3 | 40 | 1 | 44 |
| Data 4 | 10 | 5 | 3 | 18 |
| TOTAL | 27 | 56 | 44 | 127 |
+--------+-------+-------+-------+-------+
Does anyone know what I would have to add to my current code to achieve this?
EDIT: Here is my current code:
TEMPORARY.
SELECT IF Remove = 0.
CTABLES
/VLABELS VARIABLES=TBI1 ME1 BFCE1 CFCE1 RTWPM1 VPA1 NPS1 NPA1 PROV
DISPLAY=LABEL
/TABLE TBI1 [C] + ME1 [C] + BFCE1 [C] + CFCE1 [C] + RTWPM1 [C] + VPA1 [C] + NPS1 [C] + NPA1 [C] BY PROV [C]
[COUNT F40.0, ROWPCT.COUNT PCT40.1]
/CATEGORIES VARIABLES=TBI1 ME1 BFCE1 CFCE1 RTWPM1 VPA1 NPS1 NPA1 [1.00] EMPTY=INCLUDE
/CATEGORIES VARIABLES=PROV ORDER=A KEY=VALUE EMPTY=EXCLUDE TOTAL=YES
/TITLES TITLE='Brain Injury' CAPTION='Type of assessment and volume of each service by provider.'.

You can get output like this from the CROSSTABS procedure (Analyze > Descriptive Statistics > Crosstabs)

Here is one option you may want to try.
Create two new variables in syntax window.
COMPUTE ALL_TAB_COL = 1.
COMPUTE ALL_TAB_ROW = 1.
EXECUTE.
VARIABLE LABELS ALL_TAB_COL 'All Cols'
/ALL_TAB_ROW 'All Rows'.
Go to Main Menu Analyze > Tables > Custom Tables
From Variable List
Drag ALL_TAB_COL to Column Area
Drag ALL_TAB_ROW to Row Area (if needed)
Thereafter you can drag other column to appear after ALL_TAB_COL;
and row variables to appear below ALL_TAB_ROW.
In Summary statistics submenu if you choose Count, the ALL_TAB_COL shows, Column Sum of all Counts against the ALL_TAB_ROW and further break ups by other variables.

Related

Laravel leftJoin returns null from 2nd table

I have 2 table duty_sheets
centerId | centerName | p1 | p2 | p3 | p4 | ...p22 | examiId
1 | xyz | 1 | 5 | 8 | 7 | 1 | 1
2 | abc | 9 | 1 | 6 | 6 | 1 | 1
and feedback
id | centerId | inspectorId | A | B | C | examiId
1 | 1 | 1 | 1 | 5 | 8 | 1
2 | 2 | 9 | 9 | 1 | 6 | 1
here is my code
$center = DutySheet::select('duty_sheets.centerId', 'duty_sheets.centerName','feedback.id')
->leftJoin('feedback', function ($leftJoin) {
$leftJoin->on('duty_sheets.examId', 'feedback.examId')
->where("duty_sheets.centerId", 'feedback.centerId')
->where("feedback.inspectorId", 1);
})
->where("duty_sheets.examId", 1)
->where("p20", 1)
->get();
dd($center);
to retrieve "All rows from DutySheet where p20 = 1 and dutysheet.examId = 1, and relevant rows from feedback depend on centerId, inspectorId and examId.
The problem is that the query return feedback.id as null while the record exist in feedback table with the ids.
Laravel version = 9
The problem is in left Join
->where("duty_sheets.centerId", 'feedback.centerId')
This build a where against the value 'feedback.centerId'
duty_sheets.centerId='feedback.centerId'
You need use
->on("duty_sheets.centerId",'=', 'feedback.centerId')
Or
->whereColumn("duty_sheets.centerId", 'feedback.centerId')

SAS Hive SQL (Hadoop) version of Proc Transpose?

I was wondering if there is a version of 'Proc Transpose' in SAS Hive SQL (Hadoop) ?
Otherwise I can see the only other (long winded) way is creating a lot of separate tables to then join back together, which I'd rather avoid.
Any assistance most welcome!
Sample table to Transpose > Intention to put Month along the top of the table so the rates are split by month:
+------+-------+----------+----------+-------+
| YEAR | MONTH | Geog | Category | Rates |
+------+-------+----------+----------+-------+
| 2018 | 1 | National | X | 32 |
| 2018 | 1 | National | Y | 43 |
| 2018 | 1 | National | Z | 47 |
| 2018 | 1 | Regional | X | 52 |
| 2018 | 1 | Regional | Y | 38 |
| 2018 | 1 | Regional | Z | 65 |
| 2018 | 2 | National | X | 63 |
| 2018 | 2 | National | Y | 14 |
| 2018 | 2 | National | Z | 34 |
| 2018 | 2 | Regional | X | 90 |
| 2018 | 2 | Regional | Y | 71 |
| 2018 | 2 | Regional | Z | 69 |
+------+-------+----------+----------+-------+
Sample output:
+------+----------+----------+----+----+
| YEAR | Geog | Category | 1 | 2 |
+------+----------+----------+----+----+
| 2018 | National | X | 32 | 63 |
| 2018 | National | Y | 43 | 14 |
| 2018 | National | Z | 47 | 34 |
| 2018 | Regional | X | 52 | 90 |
| 2018 | Regional | Y | 38 | 71 |
| 2018 | Regional | Z | 65 | 69 |
+------+----------+----------+----+----+
The typical wallpaper SQL technique for transposing (or pivoting) is a group+transform to pivot case statements sub-query within a group aggregating query that collapses the sub-query. The group represents a single resultant pivot row.
For example your group is year, geog, category and min is used to collapse:
proc sql;
create view want_pivot as
select year, geog, category
, min(rate_m1) as rate_m1
, min(rate_m2) as rate_m2
from
( select
year, geog, category
, case when month=1 then rates end as rate_m1
, case when month=2 then rates end as rate_m2
from have
)
group by year, geog, category
;
Here is the same concept, a little more generically where data is repeated within the group at the detail level and mean is used to collapse over the repeats.
data have;
input id name $ value;
datalines;
1 a 1
1 a 2
1 a 3
1 b 2
1 c 3
2 a 2
2 d 4
2 b 5
3 e 1
run;
proc sql;
create view have_pivot as
select
id
, mean(a) as a
, mean(b) as b
, mean(c) as c
, mean(d) as d
, mean(e) as e
from
(
select
id
, case when name='a' then value end as a
, case when name='b' then value end as b
, case when name='c' then value end as c
, case when name='d' then value end as d
, case when name='e' then value end as e
from have
)
group by id
;
quit;
When the column names are not known apriori, you will need to write a code generator that passes over all the data to determine the name values, writes the wall paper query which will perform a second pass over the data returning the pivot.
Also, many contemporary data bases have a PIVOT clause that can be leveraged via pass through.
The Hadoop Mania post "TRANSPOSE/PIVOT a Table in Hive" shows the use of collect_list and map in a similar wallpapery manner:
select b.id, b.code, concat_ws('',b.p) as p, concat_ws('',b.q) as q, concat_ws('',b.r) as r, concat_ws('',b.t) as t from
(select id, code,
collect_list(a.group_map['p']) as p,
collect_list(a.group_map['q']) as q,
collect_list(a.group_map['r']) as r,
collect_list(a.group_map['t']) as t
from ( select
id, code,
map(key,value) as group_map
from test_sample
) a group by a.id, a.code) b;
if your sample dataset is representative of real dataset then you can use a simple inner join as shown below. Year geo and categoty makes unique combination below code should work.
select a.YEAR ,
a.Geog ,
a.Category ,
a.Rates ,
a.month as month_1,
b.month as month_2
from have a
inner join
have b
on a.year = b.year
and a.Geog = b.Geog
and a.Category = b.category
where a.month ne b.month;

Hive Query for ROlling total based on 2 fields

I have a table a show below
Date | Customer | Count | Daily_Count | ITD_Count
d1 | A | 3 | 3 |
d2 | B | 4 | 4 |
d3 | A | 7 | 16 |
d3 | B | 9 | 16 |
d4 | A | 8 | 9 |
d4 | B | 1 | 9 |
Descrption of Fields:
Date : date
customer : name of customer
Count : # of customers
daily_Count : # of customers on daily basis calculated as
SUM(count) OVER (partition BY date )as Daily_Count
Question :
How do I calculate the Running Total or Rolling Total in the ITD_Count ?
The output should look like
Date | Customer | Count | Daily_Count | ITD_Count
d1 | A | 3 | 3 | 3
d2 | B | 4 | 4 | 7
d3 | A | 7 | 16 | 23
d3 | B | 9 | 16 | 23
d4 | A | 8 | 9 | 31
d4 | B | 1 | 9 | 31
I have tried several variations of using the Window functionality.. But hit a road-block in all my attempts.
Attempt 1 ;
SUM(daily_COunt) OVER (partition BY date order by date rows between unbounded preceding and current row ) as ITD_account_linking
Attempt 2 :
SUM(daily_COunt) OVER (partition BY date, daily_count order by date rows between unbounded preceding and current row ) as ITD_account_linking
and several more attempts following this. :(
Any possible suggestions to guide me in the right direction are welcome.
Please let me know if you need more details.
Use Hive Windowing and Analytics functions.
SELECT Date, Customer, Count, Daily_Count,
SUM(Daily_Count) OVER (ORDER BY Date ROWS UNBOUNDED PRECEDING) AS ITD_Count
FROM table;

SphinxSE distinct empty result

I run this query in sphinx se console:
SELECT #distinct FROM all_ips GROUP BY ip1;
I get this result:
+------+--------+
| id | weight |
+------+--------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 9 | 1 |
| 15 | 1 |
| 16 | 1 |
| 17 | 1 |
| 20 | 1 |
| 21 | 1 |
| 25 | 1 |
| 26 | 1 |
| 27 | 1 |
| 31 | 1 |
| 32 | 1 |
| 38 | 1 |
| 39 | 1 |
| 40 | 1 |
| 46 | 1 |
| 50 | 1 |
| 51 | 1 |
+------+--------+
20 rows in set (0.57 sec)
How can i get number of unique values? Why #distinct column doesn't show up in results?
1) I dont think that is sphinxSE - do you really mean sphinxQL? That looks more like sphinxQL.
2) Distinct of what column? You need to sell sphinx what attribute you want to count the distinct values in. In sphinxQL use COUNT(DISTINCT column_name)
You will require simple SQL statement for getting count. Something like this
SELECT count(ip1),ip1
FROM all_ips
GROUP BY ip1;

80% Rule Estimation Value in PL/SQL

Assume a range of values inserted in a schema table and in the end of the month i want to apply for these records (i.e. 2500 rows = numeric values) the algorithm: sort the values descending (from the smallest to highest value) and then find the 80% value of the sorted column.
In my example, if each row increases by one starting from 1, the 80% value will be the 2000 row=value (=2500-2500*20/100). This algorithm needs to be implemented in a procedure where the number of rows is not constant, for example it can varries from 2500 to 1,000,000 per month
Hint: You can achieve this using Oracle's cumulative aggregate functions. For example, suppose your table looks like this:
MY_TABLE
+-----+----------+
| ID | QUANTITY |
+-----+----------+
| A | 1 |
| B | 2 |
| C | 3 |
| D | 4 |
| E | 5 |
| F | 6 |
| G | 7 |
| H | 8 |
| I | 9 |
| J | 10 |
+-----+----------+
At each row, you can sum the quantities so far using this:
SELECT
id,
quantity,
SUM(quantity)
OVER (ORDER BY quantity ROWS UNBOUNDED PRECEDING)
AS cumulative_quantity_so_far
FROM
MY_TABLE
Giving you:
+-----+----------+----------------------------+
| ID | QUANTITY | CUMULATIVE_QUANTITY_SO_FAR |
+-----+----------+----------------------------+
| A | 1 | 1 |
| B | 2 | 3 |
| C | 3 | 6 |
| D | 4 | 10 |
| E | 5 | 15 |
| F | 6 | 21 |
| G | 7 | 28 |
| H | 8 | 36 |
| I | 9 | 45 |
| J | 10 | 55 |
+-----+----------+----------------------------+
Hopefully this will help in your work.
Write a query using the percentile_disc function to solve your problem. Sounds like it does what you want.
An example would be
select percentile_disc(0.8) within group (order by the_value)
from my_table

Resources