Showing a default Distinct ID in Mixpanel instead of Mixpanel's own arbitrary choice - mixpanel

In Mixpanel, Users can have multiple Distinct IDs that can be merged under 1 identity. The User is identified with a default Distinct ID, which we’ve noticed is the first one created, not the most recent or one that we would explicitly prefer as the default.
Is it possible to set a default/primary Distinct ID?
Also related, once a user signs up, we are using email address as the Distinct ID. If the user changes their email address, would we need to create an alias (we have Identity Merge enabled, FYI), or update the Distinct ID with their new email?
I did read on this support article titled "Moving to Identity Merge" that:
You can no longer control the canonical id for users in Mixpanel
The ID merge system will now determine which distinct_id is used as the canonical id for a user in Mixpanel. Any merged id can be used to query for information about a user with our APIs, but the results of the query may return a with different canonical distinct_id value than the one used in the query.

I had a similar issue, we use a server-side implementation but this should apply to client side as well, my flow was:
send some anonymous track() events with anon_id & distinct_id uuidv4. Let's say the anon_id is: abc-123
authenticate user (now we have access to user_id)
call identify() with user_id and anon_id
send some track() events with our internal user_id as the distinct_id. For example the user_id is: 1001
We noticed that the default mixpanel canonical distinct_id would always be abc-123, not 1001 like we wanted.
After playing around with the mixpanel API I have discovered that if you create the user profile before identifying, the problem seems to be fixed.
mixpanel.people.set(distinct_id, props);
mixpanel.identify(); // our api is identify(user_id, anon_id)
Going through he flow again, the canonical distinct_id is now 1001 instead of abc-123. I believe this fixes it because creating a profile for the user sets the canonical distinct_id (no source on this though).
To be clear, the flow afterwards is:
send some anonymous track() events with anon_id & distinct_id uuidv4: abc-123
authenticate user
call mixpanel.people.set(distinct_id, props) // distinct_id = user_id
call identify() with user_id and anon_id
send some track() events with our internal user_id: 1001
Then 1001 should be the default canonical distinct_id instead of abc-123.

Related

What is the "customer's user ID"?

In the Preventing duplicate Items article, it mentions that you can use a specific combination of fields to determine if there are duplicate items or not. And specifically OAuth institutions, it says the combination of fields are: customer's user ID and institution_id. I'm confused what the customer's user ID is. I'm not familiar with this identifier. Can somebody explain?
The customer's user ID would be a value in your own application's business logic, not part of the Plaid API. In most Plaid use cases, alongside an Item, you would typically store some kind of user id that associates it with a specific user in your system. The logic here is saying that if the same end user in your system has multiple Items with the same institution, they are probably duplicate Items.

Re-Ranking Algorithm for Anonymous Users

I have a website:
10,000 pages, each page represent a category, for example: "Laptops".
On each page I am showing 20 recommended products
99% of the users are anonymous
For each user I have a context (device, user-agent and category)
For each product I have the price and the seller name
I have 2 events: outbound & purchase
I would like to re-rank (re-order, sort) the results for each new anonymous user based on the user context. I would like to re-rank based on performance (outbound & purchase).
Do you have recommendation for Specific algorithm OR tool OR service to do that? I found AWS Personalize very nice but the problem is that all of my users are anonymous so I don't believe it can be effective in my use case.
Amazon Personalize can still be used effectively when most/all users are anonymous. If you track users as visitors using a cookie or local storage, then a visitor's session ID can be considered the userId in Personalize. You will lose the continuity of stitching together the same logical user's activity across multiple sessions but you can still get in-session personalization. This requires calling PutEvents with the visitor's session ID in the sessionId field and excluding the userId field. Then when calling the GetRecommendations or GetPersonalizedRanking APIs, use the visitor's session ID as the userId field. Personalize will consider the event activity for the visitor's session when providing recommendations or reranking items.
If the visitor is a known user or later becomes known (i.e. signs in or creates an account), then pass their user ID in the userId field for PutEvents and GetRecommendations/GetPersonalizedRanking. At the next training, Personalize will associate any prior anonymous events (i.e. those with a sessionId but not a userId) to the user. The key is using a consistent sessionId across the anonymous and known events for the user for the session.

How to store sessions of a telegram bot user in my db

I wanna code a telegram bot, so when I gonna receive messages from a user I should know about last message he/she sent to me and in which step does he/she located. So I should store sessions of the user (I understood this when I searched) but I don't know what exactly should I do?
I know I need a table in a db that stores UserId, ChatId but I don't know these:
How to make a root for steps and store them in db (I mean how do I understand where the user is located now)
What are other columns that I need to store as a session?
How many messages should I store in the database? And do I need one row for each message?
If you just have to store session in your database you don't need to store messages. Maybe you could want to store also messages but it's not necessarily related.
Let's assume you have a "preferences" menu in your bot where the user can write his input. You ask for the name, age, gender etc.
How do your know when the user writes the input of it's about the name or the gender etc?
You save sessions in your db. When the bot receives the message you check in what session the user is in to run the right function.
An easy solution could be a sql database.
The primary key column is the telegram user ID ( you additionally can add a chat id column if it's intended to work both in private and group chats) and a "session" column TEXT where you log user steps. The session column can be NULL by default. If the bot expects the gender (because the user issued /gender command) you can update the column "session" with the word "gender" so when the message arrives you know how to handle it checking the gender column of that user id and as soon as you runned the right function, you update to NULL again the column "session".
you can create a db with these columns.
UserID, ChatID, State, Name, Age, Gender ...
on each incoming update you will check if user exists on you db then check the user's State and respond appropriately and update the state at the end.

Apps activities - uniquely identify user

Is there any way how to uniquely identify user who caused an event? I want to extract all events from Appsactivity service, which belongs to specific user.
The problem is, that service.activities().list() returns also activities of other users of shared file, even if this request has set userId which indicates the user to return activity for. It returns all visible activities to given user and therefore it contains activities of other users.
I tried to filter list, but it seems to be impossible - events contains simple User object which does not have userId or userEmail.
One way is to compare user's photo url which is avalaible in appactivity User object. Note, that this can be done only if url is not null, otherwise it won't uniquely identify user.

Why can't I trust a client-generated GUID? Does treating the PK as a composite of client-GUID and a server-GUID solve anything?

I'm building off of a previous discussion I had with Jon Skeet.
The gist of my scenario is as follows:
Client application has the ability to create new 'PlaylistItem' objects which need to be persisted in a database.
Use case requires the PlaylistItem to be created in such a way that the client does not have to wait on a response from the server before displaying the PlaylistItem.
Client generates a UUID for PlaylistItem, shows the PlaylistItem in the client and then issue a save command to the server.
At this point, I understand that it would be bad practice to use the UUID generated by the client as the object's PK in my database. The reason for this is that a malicious user could modify the generated UUID and force PK collisions on my DB.
To mitigate any damages which would be incurred from forcing a PK collision on PlaylistItem, I chose to define the PK as a composite of two IDs - the client-generated UUID and a server-generated GUID. The server-generated GUID is the PlaylistItem's Playlist's ID.
Now, I have been using this solution for a while, but I don't understand why/believe my solution is any better than simply trusting the client ID. If the user is able to force a PK collison with another user's PlaylistItem objects then I think I should assume they could also provide that user's PlaylistId. They could still force collisons.
So... yeah. What's the proper way of doing something like this? Allow the client to create a UUID, server gives a thumbs up/down when successfully saved. If a collision is found, revert the client changes and notify of collison detected?
You can trust a client generated UUID or similar global unique identifier on the server. Just do it sensibly.
Most of your tables/collections will also hold a userId or be able to associate themselves with a userId through a FK.
If you're doing an insert and a malicious user uses an existing key then the insert will fail because the record/document already exists.
If you're doing an update then you should validate that the logged in user owns that record or is authorized (e.g. admin user) to update it. If pure ownership is being enforced (i.e. no admin user scenario) then your where clause in locating the record/document would include both the Id and the userId. Now technically the userId is redundant in the where clause because the Id will uniquely find one record/document. However adding the userId makes sure the record belongs to the user that's doing the update and not the malicious user.
I'm assuming that there's an encrypted token or session of some sort that the server is decrypting to ascertain the userId and that this is not supplied by the client otherwise that's obviously not safe.
A nice solution would be the following: To quote Sam Newman's "Building Microservices":
The calling system would POST a BatchRequest, perhaps passing in a
location where a file can be placed with all the data. The Customer
service would return a HTTP 202 response code, indicating that the
request was accepted, but has not yet been processed. The calling
system could then poll the resource waiting until it retrieves a 201
Created indicating that the request has been fulfilled
So in your case, you could POST to server but immediately get a response like "I will save the PlaylistItem and I promise its Id will be this one". Client (and user) can then continue while the server (maybe not even the API, but some background processor that got a message from the API) takes its time to process, validate and do other, possibly heavy logic until it saves the entity. As previously stated, API can provide a GET endpoint for the status of that request, and the client can poll it and act accordingly in case of an error.

Resources