Calculate percent of total taking duplication into account - amazon-quicksight

I have the following table containing users and the devices that they use
+--------+--------+
| UserId | Device |
+--------+--------+
| user1 | PC |
| user1 | TV |
| user2 | TV |
| user2 | Phone |
| user2 | Phone |
| user3 | Phone |
| user4 | PC |
| user5 | Phone |
+--------+--------+
I want to find the percentage of user using a given device. If I use percentOfTotal(count(UserId), [Device]), the result will be as follows:
+--------+----------------+
| Device | Usage rate |
+--------+----------------+
| PC | 25% |
| TV | 25% |
| Phone | 50% |
+--------+----------------+
However, this result is not what I want since a user can use more than one device. In my opinion, the usage rate should be calculate as (count distinct users using the same device) / (count distinct all users), i.e. the result should look like this:
+--------+----------------+
| Device | Usage rate |
+--------+----------------+
| PC | 40% |
| TV | 40% |
| Phone | 60% |
+--------+----------------+
I wonder if I can calculate that using AWS Quicksight

At the moment you can define a measure that returns the number of distinct users for each device but not the total number of distinct users. Once we add ability to get total number of distinct users, you should be able to do everything in QuickSight. We are hoping to add this soon. Current workaround is to make changes in the data prep or use custom SQL to provide number of distinct users in the dataset.

Related

How accurate is this picture of how transactions are processed on the NEAR platform?

After reading more about how transactions are processed by NEAR I came up with this picture of how a few key parts are related.
I am seeking some pointers on how to correct this.
First a few key points I'm currently aware of, only some of which are illustrated below, are:
an Action must be one of 7 supported operations on the network
CreateAccount to make a new account (for a person, company, contract, car, refrigerator, etc)
DeployContract to deploy a new contract (with its own account)
FunctionCall to invoke a method on a contract (with budget for compute and storage)
Transfer to transfer tokens from one account to another
Stake to express interest in becoming a proof-of-stake validator at the next available opportunity
AddKey to add a key to an existing account (either FullAccess or FunctionCall access)
DeleteKey to delete an existing key from an account
DeleteAccount to delete an account (and transfer balance to a beneficiary account)
a Transaction is a collection of Actions augmented with critical information about their
origin (ie. cryptographically signed by signer)
destination or intention (ie. sent or applied to receiver)
recency (ie. block_hash distance from most recent block is within acceptable limits)
uniqueness (ie. nonce must be unique for a given signer)
a SignedTransaction is a Transaction cryptographically signed by the signer account mentioned above
Receipts are basically what NEAR calls Actions after they pass from outside (untrusted) to inside (trusted) the "boundary of trust" of our network. Having been cryptographically verified as valid, recent and unique, a Receipt is an Action ready for processing on the blockchain.
since, by design, each Account lives on one and only one shard in the system, Receipts are either applied to the shard on which they first appear or are routed across the network to the proper "home shard" for their respective sender and receiver accounts. DeleteKey is an Action that would never need to be routed to more than 1 shard while Transfer would always be routed to more than 1 shard unless both signer and receiver happen to have the same "home shard"
a "finality gadget" is a collection of rules that balances the urgency of maximizing blockchain "liveness" (ie. responsiveness / performance) with the safety needed to minimize the risk of accepting invalid transactions onto the blockchain. One of these rules includes "waiting for a while" before finalizing (or sometimes reversing) transactions -- this amounts to waiting a few minutes for 120 blocks to be processed before confirming that a transaction has been "finalized".
---.
o--------o | o------------------------o o-------------------o
| Action | | | Transaction | | SignedTransaction |
o--------o | | | | |
| | o--------o | | o-------------o |
o--------o | | | Action | signer | | | Transaction | |
| Action | | --> | o--------o receiver | --> | | | | ---.
o--------o | | | Action | block_hash | | | | | |
| | o--------o nonce | | | | | |
o--------o | | | Action | | | | | | |
| Action | | | o--------o | | o-------------o | |
o--------o | o------------------------o o-------------------o |
---' |
|
sent to network |
.---------------------------------------------------------------------------'
| <----------
|
| ---.
| XXX o--------o o---------o |
| XX | Action | --> | Receipt | |
| o--------------------------------o o--------o o---------o |
| | | |
| | 1. Validation (block_hash) | o--------o o---------o |
'--> | 2. Verification (signer keys) | | Action | --> | Receipt | | --.
| 3. Routing (receiver) | o--------o o---------o | |
| | | |
o--------------------------------o o--------o o---------o | |
transaction arrives XX | Action | --> | Receipt | | |
XXX o--------o o---------o | |
---' |
|
applied locally OR propagated to other shards |
.---------------------------------------------------------------------------'
| <----------
|
|
| --. .-------. .--. .--. .--. o-----------o
| o---------o | | | | | | | | | | |
'--> | Receipt | | Shard | | | | | | | | | |
o---------o | A | | | | | | | | | |
| --' | | | | | | | | | |
| | | | | | | | | | |
| --. | | | | | | | | | Block |
| o---------o | | Block | | | | | o o o | | | (i) |
'--> | Receipt | | | (i) | | | | | | | | finalized |
o---------o | | | | | | | | | | |
| | Shard | | | | | | | | | |
| o---------o | B | | | | | | | | | |
'--> | Receipt | | | | | | | | | | | |
o---------o | | | | | | | | | | |
--' '-------' '--' '--' '--' o-----------o
| |
'------------------------------------------------'
about 3 blocks to finality
It's unclear to me what you mean by "routed to more than one shard". A receipt can only be routed to one shard. Also I don't understand your description of finality gadget, and I don't know where you get "120 blocks" from. Normally you just need to wait for 3 blocks for a block to be finalized.
Great explanation! Core protocol devs should complete that picture and include in the low-level documentation!
There's some corrections. A Transaction with all its actions gets converted to a single Receipt. Receipts can have several actions too. Every receipt goes to a single specific shard/receiver account. In the case of a "Transfer" action inside a Transaction/Receipt, it can generate new receipts to complete the transfer:
e.g. Alice sends 100N to Bob
Receipt 1, action Transfer: acting on Alice's account. Alice's account gets 100N deducted. If that succeeds a 2nd Receipt is created:
Receipt 2- single action: act on Bob's account to "increase balance by 100N". This second receipt gets "published" to be routed to Bob's shard.
if the 2nd receipt fails (no Bob account) a 3rd Receipt is created to refund 100N to Alice. This 3rd Receipt is again published to be routed back to Alice's shard.
So every receipt (can have more than one action) but is directed to a single specific account and then a single shard.
.- At least this is what I understand 'til now -.
I'm reading the code Sherif, more details:
Even if a Transaction has more than one action, each transaction is converted to a single receipt. A Receipt can have more than one action, but a single ´receiver´.
All Receipts are validated. When routed to other shards (if the ´receiver´ account is not in the current shard) the receiving node will re-validate the receipt before processing. So there's no trusted/untrusted boundary. Everything gets re-validated in the nodes before processing.
All local receipts are processed first, then delayed receipts are checked (waiting for data), and then receipts received from other nodes are processed.
Some Recepits can be "Data Receipts", containing chunks of data required to execute other receipts. It's like sending input data for actions in chunks to other nodes. When all the data chunks are received the related "Action Receipt" is executed.
When an "Action Receipts" has all it's data, every action inside the receipt is executed: code
and code
There's a loop for every action in the receipt, and the action is applied to the receiver account.
.-to be continued-.
"Receipts are either applied to the shard on which they first appear or are routed across the network to the proper "home shard" for their respective sender and receiver accounts."
So here is my understanding; AccountID sends a transaction to the shard they are on e.g. assigned to for the given epoch since every epoch there is a reshuffling of accounts across shards. The shard (set of AccountIDs of validators etc.) verifies the transaction. If the receiver is on another shard, a receipt is created and routed to the other shard.
While the transaction from the sender can be included in the next block, it will take up to three blocks to validate it and finalize the routing to the receiver shard.

Split a single row into multiple rows with grouping data check - Hive

Now I'm using the query below in hive to split a row into multiple rows, but I also want to group a "Product" column based on "Category" column each group will match by the order of the group and have ";" to sperate each group and have "," separate item in the group.
SELECT id, customer, prodcut_split
FROM orders lateral view explode(split(product,';')) products AS prodcut_split
Here is my data look like now
| id | Customer| Category | Product |
+----+----------+---------------------------+-----------------------------------+
| 1 | John | Furniture; Technology | Bookcases, Chairs; Phones, Laptop |
| 2 | Bob | Office supplies; Furniture| Paper, Blinders; Tables |
| 3 | Dylan | Furniture | Tables, Chairs, Bookcases |
my desired result will look like:
| id | Customer| Category | Product |
+----+----------+----------------+-----------+
| 1 | John | Furniture | Bookcases |
| 1 | John | Furniture | Chairs |
| 1 | John | Technology | Phones |
| 1 | John | Technology | Laptop |
| 2 | Bob | Office supplies| Paper |
| 2 | Bob | Office supplies| Blinders |
| 2 | Bob | Furniture | Tables |
| 3 | Dylan | Furniture | Tables |
| 3 | Dylan | Furniture | Chairs |
| 3 | Dylan | Furniture | Bookcases |
I have tried this one and it's work well, all credit goes to this question: Hive - Split delimited columns over multiple rows, select based on position
select id,customer ,category, products
from
(
SELECT id, category, product
FROM tale_name
lateral VIEW posexplode(split(category,';')) category AS pos_category, category_split
lateral VIEW posexplode(split(product,';')) product AS pos_product, product_split
WHERE pos_category = pos_product) a
lateral view explode(split(product_split,',')) product_split AS products

Efficient way to join by levenshtein in Hive or Impala

I have two tables one includes about 17K (NLIST) records while the other 57K (FNAMES).
I would like to join the both by comparing the records using levenshtein formula.
Here is the example for the content of tables:
Table NLIST:
+------+-------------+
| ID | S_NAME |
+------+-------------+
| 1 | Avi |
| 2 | Moshe |
| 3 | David |
....
Table FNAMES:
+------+-------------+
| ID | NICKNAMES |
+------+-------------+
| 1 | Avile |
| 2 | Dudi |
| 3 | Moshiko |
| 4 | Avi |
| 5 | DAVE |
....
The above tables are just examples. In the real case the names column can include more than one word.
The required result should be:
+------+-------------+--------+
| ID | NICKNAMES | S_NAME |
+------+-------------+--------+
| 1 | Avile | Avi |
| 2 | Dudi | David |
| 3 | Moshiko | Moshe |
| 4 | Avi | Avi |
| 5 | DAVE | David |
...
Here is the code I use:
select FNAMES.NICKNAMES, NLIST.S_NAME
from NICKNAMES
LEFT OUTER JOIN NLIST
ON(true)
WHERE levenshtein (FNAMES.NICKNAMES, NLIST.S_NAME) <=4
The above code runs for a very long time and I stopped its running.
How can I make it run in a reasonable time?
In addition, I think the levenshtein distance depends on the length of the words. How can I find the optimal value for the distance (in this case I chose 4 arbitrarily)?
Hive Table performance is depends upon various point .
Query enginee
File format
use VECTORIZATION set hive.vectorized.execution.enabled = true;set hive.vectorized.execution.reduce.enabled = true;
If you have good server you can try with Impala and definitely it is faster than Hive.
You can do the fine tuning of impala which will give you an edge to execute this query faster .Tuning Impala for Performance

Display record count in listbox using multiple tables and fields

i need help with a query, can't get it to work correctly. What i'm trying to achieve is to have a select box displaying the number of records associated with a particular theme, for some theme it works well for some it displays (0) when infact there are 2 records, I'm wondering if someone could help me on this, your help would be greatly appreciated, please see below my actual query + table structure :
SELECT theme.id_theme, theme.theme, calender.start_date,
calender.id_theme1,calender.id_theme2, calender.id_theme3, COUNT(*) AS total
FROM theme, calender
WHERE (YEAR(calender.start_date) = YEAR(CURDATE())
AND MONTH(calender.start_date) > MONTH(CURDATE()) )
AND (theme.id_theme=calender.id_theme1)
OR (theme.id_theme=calender.id_theme2)
OR (theme.id_theme=calender.id_theme3)
GROUP BY theme.id_theme
ORDER BY theme.theme ASC
THEME table
|---------------------|
| id_theme | theme |
|----------|----------|
| 1 | Yoga |
| 2 | Music |
| 3 | Taichi |
| 4 | Dance |
| 5 | Coaching |
|---------------------|
CALENDAR table
|---------------------------------------------------------------------------|
| id_calender | id_theme1 | id_theme2 | id_theme3 | start_date | end_date |
|-------------|-----------|-----------|-----------|------------|------------|
| 1 | 2 | 4 | | 2015-07-24 | 2015-08-02 |
| 2 | 4 | 1 | 5 | 2015-08-06 | 2015-08-22 |
| 3 | 1 | 3 | 2 | 2014-10-11 | 2015-10-28 |
|---------------------------------------------------------------------------|
LISTBOX
|----------------|
| |
| Yoga (1) |
| Music (1) |
| Taichi (0) |
| Dance (2) |
| Coaching (1) |
|----------------|
Thanking you in advance
I think that themes conditions should be into brackets
((theme.id_theme=calender.id_theme1)
OR (theme.id_theme=calender.id_theme2)
OR (theme.id_theme=calender.id_theme3))
Hope this help

Communication between two applications using Environment Variables

Question
How to communicate with another program (for instance, a windows service one) through environment variables (not system or user ones)?
What do we have
Well, I have the following scheme for a data logger:
------------------------- --------------------------------
| the things to measure | | the things that do something |
------------------------- --------------------------------
| ^
| sensors | switches
V |
-------------------------------------------------------------------
| dedicated hardware |
-------------------------------------------------------------------
| ^
| | serial communication
V |
--------------- -------------
| Windows | ------------------------------------> | user |
| service | <------------------------------------ | interface |
--------------- udp communication -------------
|^ keyboard
V| and screen
--------
| user |
--------
On current development:
windows service is always running when Windows is running
user can open and close user interface (of course :p)
windows service acquires data from sensors
user interface automatic requests data to windows service every 100ms and shows it to user via udp communication through some implemented protocol (we call it GetData() command and response to it)
user can send some other commands to change the data to acquire through implemented protocol (we call it SetSensors() command and response to it)
Both user interface and windows service are developed on Borland C+ Builder 6 and use NMUDP component, from FastNet tab, for UDP communication.
What we are thinking to do
Because of some buffer issues and to free udp channel only for sending SetSensors()command and response to it, we are considering that instead of using GetData():
Windows service would get data from sensors and put them on environment variables
the user interface would read them to show to user
Scheme after doing what we are thinking
------------------------- --------------------------------
| the things to measure | | the things that do something |
------------------------- --------------------------------
| ^
| sensors | switches
V |
-------------------------------------------------------------------
| dedicated hardware |
-------------------------------------------------------------------
| ^
| | serial communication
V |
--------------- -------------
| | ------------------------------------> | |
| | environment variables | |
| | (get data from sensors) | |
| Windows | | user |
| service | | interface |
| | | |
| | ------------------------------------> | |
| | <------------------------------------ | |
--------------- udp communication -------------
(send commands to service) |^ keyboard
V| and screen
--------
| user |
--------
Any way to do that?
We would not use system and user environment variables, because it writes on Windows Registry, i.e., it will save to hard drive and it gets more slow...
As #HansPassant said, I cannot do that directly. Although I saw some ways to do that via memory mapped file, it is so easy only to add one more udp communication channel through other port. So:
------------------------- --------------------------------
| the things to measure | | the things that do something |
------------------------- --------------------------------
| ^
| sensors | switches
V |
-------------------------------------------------------------------
| dedicated hardware |
-------------------------------------------------------------------
| ^
| | serial communication
V |
--------------- -------------
| | ------------------------------------> | |
| | udp communication (port 3) | |
| | (get data from sensors) | |
| Windows | | user |
| service | | interface |
| | (port 1) | |
| | ------------------------------------> | |
| | <------------------------------------ | |
--------------- udp communication (port 2) -------------
(send commands to service) |^ keyboard
V| and screen
--------
| user |
--------
If someone provide a better solution, I'll mark it as solution in future.

Resources