Is MNL the right model to use when the choice options vary across observations? - multinomial

In a survey of 100 people, I am asking each person to choose between product A and product B. I ask each person this question 3 times, but each time I present a different set of products. Say, first time, Person 1 is asked to choose between 'Phone 1' and 'Phone 2', given certain attributes of each phone. The second time the choice is again 'Phone 1' vs. 'Phone 2', but a different set of attributes for each phone.
A person is presented three attributes associated with the two phone alternatives every time the question is asked. So, each time between Phone 1 and Phone 2, the attributes of the phone such as cost, memory and camera pixels are presented so that user can choose which set of attributes is most attractive, Phone 1's or Phone 2's.
Overall, 3*100 = 300 responses; 3 responses per person. Each time the attributes cost, memory and camera pixels presented and user asked to choose the feature set they prefer.
My goal is to analyze how users value features of a phone vs. cost of the phone.
In this scenario, can I use a MNL - even though each time I asked the person a question, I only presented two choices ? My understanding is that MNL is sued when (a) there are multiple choices and (b) the choice options do not change across observations, i.e. each person is asked to choose between multiple products, say A, B, C and A, B, C do not change across observations.
In the scenario described above, the two choices varied across the three times the same person was asked the question ? If not MNL, should I rather create a binary logit model given that user only had to choose between two options when the question was asked (even though he was asked the question three times)? If I can use binary logit, should I be concerned that the choice set of products change across observations ? or should I let the attributes defined in each of the rows address the differences in product choices across observations.
I have setup the data as follows (thinking I can do MNL but may be I should set it up differently and use another modeling approach?):

I am working on designing and analyzing similar survey but mine is related to transportation. I am at the beginning level and I am still new to the whole concept, however I will give you an advice and reference maybe it is helpful.
First point: I have come cross 3 models as following from a useful video on YouTube:
MNL refers to Multinomial Logit Model. MNL is used with
alternative-invariant regressors (for example salary of participant
in the survey, or his/her gender …).
Conditional logit model is used with alternative-invariant (gender,
salary, education level …) and alternative-variant regressors (cost
of the product, memory, camera pixel …)
Mixed logit model which uses random parameters. It is also used with
alternative-invariant (gender, salary, education level …) and
alternative-variant regressors (cost of the product, memory, camera
pixel …)
Note regarding alternative-invariant and alternative-variant regressors:
The gender of person participating in the survey will NOT vary between Product A or Product P, so it is alternative-invariant regressor. While price of product could vary between Product A and Product B so it is called alternative-variant regressors.
Based on above I assume you need to use conditional logit model or mixed logit model.
For me I couldn’t find a special function in R for the conditional logit model or mixed logit model. The same mlogit function is used, refer to the examples below for the help of mlogit package:
a pure "multinomial model"
summary(mlogit(mode ~ 0 | income, data = Fish))
a pure "conditional" model
summary(mlogit(mode ~ price + catch, data = Fish))
a "mixed" model
m <- mlogit(mode ~ price+ catch | income, data = Fish)
summary(m)
same model with charter as the reference level
m <- mlogit(mode ~ price+ catch | income, data = Fish, reflevel = "charter")
From the examples above, I think (but NOT sure) that in the Manual of mlogit package, they refer to mixed logit when you used both alternative-invariant and alternative-variant regressors. While conditional model when you have only alternative-variant regressors. On the other hand, multinomial model when you have only multinomial alternative-invariant regressors.
Second point: There is something called “panel data” when you are asking the same person to choose one product for each choice-set. Same person here means that in your model you are taking into consideration, the gender, the salary, the education level … which they will stay the same for the same person. Check this: https://en.wikipedia.org/wiki/Panel_data
To use panel techniques please refer to help in mlogit package in R. I am quoting from it the following:
“panel only relevant if rpar is not NULL and if the data are repeated observations of the same unit ; if TRUE, the mixed-logit model is estimated using panel techniques”
So in my understanding, if you want to use the panel techniques you have to use random draws because panel will be true and rpar will not be NULL.
Moreover, for example about using the panel data, please refer to the below example from “Estimation of multinomial logit models in R : The mlogit Packages” by Yves Croissant
data("Train", package = "mlogit")
Tr <- mlogit.data(Train, shape = "wide", varying = 4:11, choice = "choice", sep = "_", opposite = c("price", "time", "change", "comfort"), alt.levels=c("A", "B"), id.var ="id")
Train.ml <- mlogit(choice ~ price + time + change + comfort, Tr)
Train.mxlc <- mlogit(choice ~ price + time + change + comfort, Tr, panel = TRUE, rpar = c(time = "cn", change = "n", comfort = "ln"), correlation = TRUE, R = 100, halton = NA)
Train.mxlu <- update(Train.mxlc, correlation = FALSE)
I hope that is useful to you.

Related

DAX Count measure values between given range

I am trying to create a measure that would count the instances when another measure is between given values.
The first is a measure of forecast accuracy, which is calculated over products and customers with a target value of 1. Then I would like to make a monthly report which shows for how many products the forecast accuracy is less than .85, between 0.85 and 1.15 and over 1.15.
The measure I tried for the middle category, which does not give the desired result:
var tab = SUMMARIZE(data, data[ComponentNumber], "Accuracy", [Forecast accuracy])
return SUMX(tab, IF([Accuracy] > 0.85 && [Accuracy] < 1.15, 1, 0))
The data table has also a customer number, which is why I tried first evaluating the measure [Forecast accuracy] only over components, disregarding the customers.
One source of the problem may lie in the fact that the measure [Forecast accuracy] is calculated as a division of two measures [Ordered Quantity] and [Forecast Quantity], of which the former is in another table. Does this affect the evaluation of my attempted measure?

Need to filter result set in DAX Power BI

I have following simple relationship:
I have created following visuals in Power BI:
I want to show Store Name, Orders (by Salesman selected in slicer) and Total Orders in that Store (ignoring Salesman selected in slicer). I have created two very simple measure (can be seen in above visual) and used in matrix visuals. Visual is showing All stores while I want to show only those stores where Salesman X (selected salesman in slicer) have orders i.e. I don't want Store B row.
while solving, I suspected that it is due to fact that visual is not cross filtering. I used crossfilter but it made no difference. data can be seen in below image:
Please guide. Thanks in advance.
Try to change [Total Orders] to this measure, but keep [Total Orders].
IF( ISBLANK([Orders Count]), BLANK(), [Total Orders])
By Adding VALUES('Order'[Store ID]) in measure solved the problem. complete measure definition is as follows:
Total Orders = CALCULATE(
count('Order'[Order ID]),
REMOVEFILTERS(Salesman[Salesman Name]),
VALUES('Order'[Store ID]))
This issues the problem but I could not understand how? Because VALUES bring only those stores where salesman has Order. But when salesman removed from the filter context by REMOVEFILTERS, then how come VALUES bring only stores where salesman have orders?
a) You intend to utilize Store.salesmanName from Store in a slicer, meaning whatever is selected from there, you intend that selection to be applied on Order to give you the Order.StoreName. So when X is selected only A and C are returned.
b) Once that selection happens, you intend DAX to return the total count of each Order.StoreName whether it has a corresponding Store.salesmanID in Order.salesmanID or not. In other words, in this layer of the analysis, you want the previous selection to remain applied in the outer loop but to be ignored in the inner loop.
To be able to do that, you can do this,
totalCount =
VAR _store =
MAX ( 'Order'[storeID] ) //what is the max store ID
VAR _count =
CALCULATE (
COUNT ( 'Order'[SalesmanId] ),
FILTER ( ALL ( 'Order' ), 'Order'[storeID] = _store ) //remove any filters and apply the value from above explicitly in the filter
)
RETURN
_count

Complicated Cube Query

I'm working on a fairly complicated view, which calculates the total cost of a guest's stayed based on data pulled from four different tables. The output however is not exactly what I want. My code is
CREATE OR REPLACE VIEW Price AS
SELECT UNIQUE
Booking.Booking_ID AS "Booking",
Booking.GuestID AS "Guest ID",
Room.Room_Price*(Booking.CheckOutDate-Booking.CheckInDate) AS "Room Price",
Add_Ons.Price AS "Add ons Price",
Room.Room_Price*(Booking.CheckOutDate-Booking.CheckInDate) + (Add_Ons.Price) AS "Total Price"
FROM Booking JOIN Room ON Room.Room_Num = Booking.Room_Num
JOIN Booking_Add_Ons ON Booking.Booking_ID = Booking_Add_Ons.Booking_ID
JOIN Add_ons ON Booking_Add_Ons.Add_On_ID = Add_Ons.Add_On_ID
ORDER BY Booking.Booking_ID;
Now, I'm trying to get this to return the total cost of all Addons, plus the cost of the hotel rooms as the total price, however it is returning the cost of the rooms + each of the addons on separate lines. As follows:
My question is, is it possible to use something like CUBE, or SUM to add up the rows, so that there is only one entry for each of the Bookings with the total price of all add-ons accounted for?

SQL to Relational Algebra

How do I go about writing the relational algebra for this SQL query?
Select patient.name,
patient.ward,
medicine.name,
prescription.quantity,
prescription.frequency
From patient, medicine, prescription
Where prescription.frequency = "3perday"
AND prescription.end-date="08-06-2010"
AND canceled = "Y"
Relations...
prescription
prescription-ref
patient-ref
medicine-ref
quantity
frequency
end-date
cancelled (Y/N))
medicine
medicine-ref
name
patient
Patient-ref
name
ward
I will just point you out the operators you should use
Projection (π)
π(a1,...,an): The result is defined as the set that is obtained when all tuples in R are restricted to the set {a1,...,an}.
For example π(name) on your patient table would be the same as SELECT name FROM patient
Selection (σ)
σ(condition): Selects all those tuples in R for which condition holds.
For example σ(frequency = "1perweek") on your prescription table would be the same as SELECT * FROM prescription WHERE frequency = "1perweek"
Cross product(X)
R X S: The result is the cross product between R and S.
For example patient X prescription would be SELECT * FROM patient,prescription
You can combine these operands to solve your exercise. Try posting your attempt if you have any issues.
Note: I did not include the natural join as there are no joins. The cross product should be enough for this exercise.
An example would be something like the following. This is only if you accidentally left out the joins between patient, medicine, and prescription. If not, you will be looking for cross product (which seems like a bad idea in this case...) as mentioned by Lombo. I gave example joins that may fit your tables marked as "???". If you could include the layout of your tables that would be helpful.
I also assume that canceled comes from prescription since it is not prefixed.
Edit: If you need it in standard RA form, it's pretty easy to get from a diagram.
alt text http://img532.imageshack.us/img532/8589/diagram1b.jpg

Algorithm for scoring user activity

I have an application where users can:
Write reviews about products
Add comments to products
Up / Down vote reviews
Up / Down vote comments
Every Up/Down vote is recorded in a db table.
What i want to do now is to create a ranking of the most active users in the last 4 weeks.
Of course good reviews should be weighted more than good comments. But also e.g. 10 good comments should be weighted more than just one good review.
Example:
// reviews created in recent 4 weeks
//format: [ upVoteCount, downVoteCount ]
var reviews = [ [120,23], [32,12], [12,0], [23,45] ];
// comments created in recent 4 weeks
// format: [ upVoteCount, downVoteCount ]
var comments = [ [1,2], [322,1], [0,0], [0,45] ];
// create weight vector
// format: [ reviewWeight, commentsWeight ]
var weight = [0.60, 0.40];
// signature: activties..., activityWeight
var userActivityScore = score(reviews, comments, weight);
... update user table ...
List<Users> users = "from users u order by u.userActivityScore desc";
How would a fair scoring function look like?
How could an implementation of the score() function look like? How to add a weight g to the function so that reviews are weighted heavier? How would such a function look like if, for example, votes for pictures would be added?
These kinds of algorithms can be quite complex. You might want to take a look into these books:
Collective Intelligence in Action, Satnam Alag, Manning, 2008, ISBN 1933988312
Algorithms of the Intelligent Web, Haralambos Marmanis, Manning 2009, ISBN 1933988665
Programming Collective Intelligence: Building Smart Web 2.0 Applications, Toby Segaran, O'Reilly Media 2007, ISBN 0596529325

Resources