Can't invert hash to get feature names in vowpal wabbit w/ matrix factorization - vowpalwabbit

--invert_hash works fine with regression models, but when I activate matrix factorization the output is the same for both --readable_model and --invert hash. I'd like to see the latent variables for each user or item, but can't match them with the name of the item/user. Rather than show you all my full dataset, this small reproducible sample illustrates the problem:
#bash
echo "5 |u user1 |i item1
1 |u user1 |i item2
5 |u user2 |i item2
1 |u user2 |i item1" | vw -f test.vwbin --rank 2
echo "5 |u user1 |i item1
1 |u user1 |i item2
5 |u user2 |i item2
1 |u user2 |i item1" | vw -t -i test.vwbin --invert_hash test.vwih
less test.vwih
The results look like:
Version 7.7.0
Min label:0.000000
Max label:5.000000
bits:18
0 pairs:
0 triples:
rank:2
lda:0
0 ngram:
0 skip:
options:
0 0.026660 0.029663 0.066095 0.001638 0.024027
1 0.004046 0.004133 0.001141 0.035247 0.077151
2 0.071812 0.048789 0.009294 0.078689 0.055306
... (and so on until line 262143)
None of these lines contain the strings 'user1', 'user2', 'item1', or 'item2'. Am I missing something?
One more question, I couldn't find the answer in the documentation - why are there 5 latent variables per event, since rank=2?

Related

Shell script to read two columns in CSV and count how many uniq values in Column 2 per each unique value in Column 1

Shell script to read two columns in CSV and count how many unique values in column two per each unique value in column 1
I have a sheet that looks like
1 a
2 a
3 b
3 c
2 a
2 f
2 a
1 d
The output I need is this:
1 2
2 3
3 2
cut -f 1,2 | sort | uniq -c | sort
I tried the above but I am doing something wrong. New to shell scripting here.

MySQL - Top 5 rank best seller plans or courses

I sell subscriptions of my online course, as well as the courses in retail.
I would bring the "top 5" of best selling plans / courses. For this, I have a table called "subscriptionPlan", which stores the purchased plan ID, or in the case of a course, the course ID, and the amount spent on this transaction. Example:
table subscriptionPlan
sbpId | subId | plaId | couId | sbpAmount
1 | 1 | 1 | 1 | 499.99
2 | 2 | 1 | 2 | 499.99
3 | 3 | 2 | 0 | 899.99
4 | 4 | 1 | 1 | 499.99
Just for educational purposes, plaId = 1 is a plan called "Single Sale" that I created, to maintain the integrity of the DB. When the couId isn't empty, you also have bought a separate course, not a plan where you can attend any course.
My need is: List the top 5 sales. If it is a plan, display the plan name (plan table, column plaTitle). If it is a course, display its name (table course, colna couTitle). This logic that I can't code. I was able to rank a top 5 of PLANS, but it groups the courses, since the GROUP BY is by the ID of the plan. I believe the prank is here, maybe creating an IF / ELSE in this GROUPBY, but I don't know how to do this.
The query that i code, to rank my top 5 plans is:
SELECT sp.plaId, sp.couId, p.plaTitle, p.plaPermanent, c.couTitle, SUM(sbpAmount) AS sbpTotalAmount
FROM subscriptionPlan sp
LEFT JOIN plan p ON sp.plaId = p.plaId
LEFT JOIN course c ON sp.couId = c.couId
GROUP BY sp.plaId
ORDER BY sbpTotalAmount DESC
LIMIT 5
The result that i expected was:
plaId | couId | plaTitle | couTitle | plaPermanent | sbpTotalAmount
1 | 1 | Venda avulsa | Curso 01 | true | 999.98
2 | 0 | Acesso total | null | false | 899.99
3 | 2 | Venda avulsa | Curso 02 | true | 499.99
How could I get into this query formula?
When grouping you can use:
Simple columns, or
Any [complex] expression.
In your case, it seems you need to group by an expression, such as:
GROUP BY CASE WHEN sp.plaId = 1 THEN -1 ELSE sp.couId END
In this case I chose -1 as the grouping for the "Single Plan". You can replace the value for any other that doesn't match any couId.

Oracle: Validate input data with given data structure rules

I have table called data rules. Its given below with some explanation.
Data|GroupNum|GroupType|GroupMinOcc|GroupMaxOcc|DataStatus|DataMinOccWithinGroup|DataMaxOccurenceWithinGroup|IDX
ABC |GroupA |Mandatory| 1 | 1 | Mandatory| 1 | 1 |1
DEF |GroupB |Mandatory| 1 | 1 |Mandatory | 1 | 1 |2
GHI |GroupC |Mandatory| 1 | 1 |Mandatory | 1 | 1 |3
JKL |GroupD |Optional | 0 | 1 |Optional | 0 | 1 |4
FFF |Group1 |Optional | 0 | 1 |Mandatory | 1 | 1 |5
RRR |Group1 |Optional | 0 | 1 |Optional | 0 | 2 |6
MMM |Group2 |Optional | 0 | 2 |Mandatory | 1 | 1 |7
PPP |Group2 |Optional | 0 | 2 |Optional | 0 | 1 |8
CCC |Group3 |Optional | 0 | 2 |Optional | 0 | 2 |9
SSS |Group4 |Mandatory| 1 | 2 |Mandatory | 1 | 1 |10
TTT |Group4 |Mandatory| 1 | 2 |Mandatory | 0 | 2 |11
Let me explain you this data rules first.
1) A group can have multiple data records
Here as you can see GroupA is having only ABC data and Group 1 is having FFF and RRR data.
2) A group can be mandatory and optional. It means a group will appear definitely if its mandatory. Secondly If its mandatory, then its data records also having mandatory and optional status.
For example: Check group4
This group is mandatory and its First data SSS is also mandatory. It means this group is mandatory and when it will occur this data should also occur. But second data in this group is TTT which is optional. No matter group is mandatory, but this data is optional inside mandatory group so it can occur from 0 to 2 times
Lets Say this group occur two times..It would look like this
Group4 Example: Valid
SSS
TTT
TTT
SSS
TTT
Invalid Group4 occurrence
SSS
SSS
TTT
TTT
TTT
Its invalid because in second occurrence of group TTT is occurred 3 times but it should not exceed from 2
3) If group is optional it can be appear or cannot be.
So As you can see, GroupD, Group2 and Group3 are optional So after GroupC directly Group4 data also can come in input data..like this
ABC
DEF
GHI
SSS
TTT
I want to capture exact IDX number from Data rules table from their respective groups only, If input data dont follow the rules mentioned in data rules table.
For example 1st Input Data Example
ABC
DEF
GHI
JKL
JKL
SSS
As you can see here JKL is optional data in optional group. But if this optional group occurs this JKL should come only one time. But it came twice. So I want return that IDX number 4.
2nd Data Example.
ABC
DEF
GHI
TTT
Here it should return IDX number 10. Because from mandatory group4 mandatory data SSS is missing and in data rules its IDX is 10
3rd Example
ABC
DEF
GHI
SSS
SSS
TTT
In this also IDX return for SSS should be 10. Because Its occurred twice. And as you can see in data rules whole Group4 can repeat one time only and whenever it will occur SSS will come only one time. SO that's a error
Many errors can occur same time as well. SO IDX number needs to returned from their respective group data only from data rules table.
In input data, Only one column will come with data records only.
Note: Group data will appear only in series like mentioned in data rules from top to bottom. And can be appeared or not on the basis of definition mentioned in data rules table.
Any suggestions..how can I achieve this ?

Oracle hierarchical query join to selection of all decendants

I am trying to retrieve a list of each descendant with each item.
I am not sure I am making sense, I will try and explain.
Example Data:
ID | PID
--------
1 | 0
2 | 1
3 | 1
4 | 1
5 | 2
6 | 2
7 | 5
8 | 3
etc...
The desired results are:
ID | Decendant
--------------
1 | 1
1 | 2
1 | 3
1 | 4
...
2 | 2
2 | 5
2 | 6
2 | 7
3 | 3
3 | 8
etc...
This is currently being achieved by using a cursor to move through the data and inserting each descendant into a table and then selecting from them.
I was wondering if there was a better way to do these, there must be a way to right a query that would bring back the desired results.
If any one has ideas, or has figured this out before it would be very appreciated. Ordering is not important, nor is the 1 - 1, 2 -2 reference. It would be cool to have it, but not crucial.
select connect_by_root(id) as ID, id as Decendant
from table1
connect by prior id = pid
order by 1, 2
fiddle
Here is my attempt! Not sure, if I got you right!
select pid ,connect_By_root(id) as descendant from process
connect by id = prior pid
union all
select distinct pid,pid from
process
order by pid,descendant

Group Unique ID

In stata if I have a list if groups:
XYZ
ABC
ABC
BCH
JSA
BCH
XYZ
How I get each group to have a unique ID in a second column after sorting, for example:
ABC 1
BCH 2
JSA 3
XYZ 4
You need sort, then group(), which is part of egen.
sysuse auto,clear
sort make
egen make_gp = group(make)
This yields:
. list make make_gp in 1/5
+-------------------------+
| make make_gp |
|-------------------------|
1. | AMC Concord 1 |
2. | AMC Pacer 2 |
3. | AMC Spirit 3 |
4. | Buick Century 7 |
5. | Buick Electra 8 |
+-------------------------+

Resources