View count rows as columns in query result - hadoop

First thing first: I am able to get the data one way. My purpose is to increase the readability of my query result. I am seeking if it is possible.
I have a table that fed by devices. I want to get the number of data sent on each hour that was grouped by two identical columns. Grouping these two columns is needed to determine one device type.
Table structure is like:
| identifier-1 | identifier-2 | day | hour | data_name | data_value |
|--------------|--------------|------------|------|-----------|------------|
| type_1 | subType_4 | 2016-08-25 | 0 | Key-30 | 4342 |
|--------------|--------------|------------|------|-----------|------------|
| type_3 | subType_2 | 2016-08-25 | 0 | Key-50 | 96 |
|--------------|--------------|------------|------|-----------|------------|
| type_6 | subType_2 | 2016-08-25 | 1 | Key-44 | 324 |
|--------------|--------------|------------|------|-----------|------------|
| type_2 | subType_1 | 2016-08-25 | 1 | Key-26 | 225 |
|--------------|--------------|------------|------|-----------|------------|
I'm going to use one specific data_name which was sent by all devices, and getting the count of this data_name will give me the data sent on each hour. It is possible to get the number in 24 rows as grouping by identifier-1,identifier-2, day and hour. However, they will repeat for each device type.
| identifier-1 | identifier-2 | day | hour | count |
|--------------|--------------|------------|------|-------|
| type_6 | subType_2 | 2016-08-25 | 0 | 340 |
|--------------|--------------|------------|------|-------|
| type_6 | subType_2 | 2016-08-25 | 1 | 340 |
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
| type_1 | subType_4 | 2016-08-25 | 0 | 32 |
|--------------|--------------|------------|------|-------|
| type_1 | subType_4 | 2016-08-25 | 1 | 30 |
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
I want to view the result like this:
| identifier-1 | identifier-2 | day | count_of_0 | count_of_1 |
|--------------|--------------|------------|------------|------------|
| type_6 | subType_2 | 2016-08-25 | 340 | 340 |
|--------------|--------------|------------|------------|------------|
| type_1 | subType_4 | 2016-08-25 | 32 | 30 |
|--------------|--------------|------------|------------|------------|
|--------------|--------------|------------|------------|------------|
In SQL, it is possible to get subqueries and columns in result but it is not possible on Hive. I guess it is called correlated subqueries.
Hive column as a subquery select
Answer of this question did not work for me.
Do you have any idea or suggestion?

You can do this using conditional aggregation:
select identifier1, identifier2, day,
sum(case when hour = 0 then data_value else 0 end) as cnt_0,
sum(case when hour = 1 then data_value else 0 end) as cnt_1
from t
where data_name = ??
group by identifier1, identifier2, day
order by identifier1, identifier2, day

Related

DAX Query with multiple filters in powerbi

I have two tables 'locations' and 'markets', where, a many to many relationship exists between these two tables on the column 'market_id'. A report level filter has been applied on the column 'entity' from 'locations' table. Now, I'm supposed to distinctly count the 'location_id' from 'markets' table where 'active=TRUE'. How can I write a DAX query such that the distinct count of location_id dynamically changes with respect to the selection made in the report level filter?
Below is an example of the tables:
locations:
| location_id | market_id | entity | active |
|-------------|-----------|--------|--------|
| 1 | 10 | nyc | true |
| 2 | 20 | alaska | true |
| 2 | 20 | alaska | true |
| 2 | 30 | miami | false |
| 3 | 40 | dallas | true |
markets:
| location_id | market_id | active |
|-------------|-----------|--------|
| 2 | 20 | true |
| 2 | 20 | true |
| 5 | 20 | true |
| 6 | 20 | false |
I'm fairly new to powerbi. Any help will be appreciated.
Here you go:
DistinctLocations = CALCULATE(DISTINCTCOUNT(markets[location_id]), markets[active] = TRUE())

JOIN of 3 tables and SUM ORACLE

I have 3 tables:
Product:
+----------------------------------------+
| ID_product | name_product | Amount |
+----------------------------------------+
| 0 | Door | 450 |
+----------------------------------------+
| 1 | Fence | 1500 |
+----------------------------------------+
Operation:
+----------------------------------------+
| ID_operation | name_operation | cost |
+----------------------------------------+
| 0 | Repair | 250 |
+----------------------------------------+
| 1 | Build | 320 |
+----------------------------------------+
Process:
+----------------------------------------+
| ID_product | ID_operation |
+----------------------------------------+
| 0 | 0 |
+----------------------------------------+
| 0 | 1 |
+----------------------------------------+
| 1 | 0 |
+----------------------------------------+
| 1 | 1 |
+----------------------------------------+
And need to calculate the sum of costs for each product like this:
Result table:
+-----------------------------------+
| name_product | TOTAL_COSTS |
+-----------------------------------+
| Door | 570 (250+320) |
+-----------------------------------+
| Fence | 570 |
+-----------------------------------+
But i don't have any idea how. I think I need some JOINS like below but I don't know how to handle the sum.
SELECT name_product, operation.cost
FROM product
JOIN process ON product.ID_product = process.ID_product
JOIN operation ON operation.ID_operation = process.ID_operation
ORDER BY product.ID_product;
Try the below Query
SELECT P.NAME_PRODUCT,SUM(O.COST)COST
FROM PROCESS PR,PRODUCT P,OPERATION O
WHERE PR.ID_PRODUCT=P.ID_PRODUCT
AND PR.ID_OPERATION=O.ID_OPERATION
GROUP BY P.NAME_PRODUCT;
You are almost there. Your JOINs are OK, you just need to add a GROUP BY clause with aggregate function SUM.
SELECT product.name_product, SUM(operation.cost) total_costs
FROM product
JOIN process ON product.ID_product = process.ID_product
JOIN operation ON operation.ID_operation = process.ID_operation
GROUP BY product.ID_product, product.name_product
ORDER BY product.ID_product;

Compute value based on next row in BIRT

I am creating a BIRT Report where each row is a receipt matched with a purchase order. There are usually more than one receipt per purchase order. My client wants the qty_remaining on the purchase order to show only on the last receipt for each purchase order. I am not able to alter the data before BIRT gets it. I see two possible solutions, but I am unable to find how to implement either. This question will deal with first possible solution.
If I can compare the purchase order number(po_number) with the next row, then I can set the current row's qty_remaining to 0 if the po_numbers match else show the actual qty_remaining. Is it possible to access the next row?
Edit
The desired look is similar to this:
| date | receipt_number | po_number | qty_remaining | qty_received |
|------|----------------|-----------|---------------|--------------|
| 4/9 | 723 | 6026 | 0 | 985 |
| 4/9 | 758 | 6026 | 2 | 1 |
| 4/20 | 790 | 7070 | 58 | 0 |
| 4/21 | 801 | 833 | 600 | 0 |
But I'm currently getting this:
| date | receipt_number | po_number | qty_remaining | qty_received |
|------|----------------|-----------|---------------|--------------|
| 4/9 | 723 | 6026 | 2 | 985 |
| 4/9 | 758 | 6026 | 2 | 1 |
| 4/20 | 790 | 7070 | 58 | 0 |
| 4/21 | 801 | 833 | 600 | 0 |
I think you looking at this the wrong way. If you want behavior that resembles for loops you should use grouping and aggregate functions. You can build quite complex stuff by using (or not using) the group headers and footers.
In your case I would try to group the receipts on po_number. Order them by receipt_number then have a aggregate function like MAX or LAST on the receipts_number and name it 'last_receipt'. It should aggregate on the group, not the whole table. This 'total' is available on every row within the group.
Then you can use the visibitly setting to only show the qty_remaining when the row['receipt_number'] == row['last_receipt']

Birt-Crosstab with empty columns

so I'm a BIRT beginner, and I just tried to get a real simple report from one of my tables of a postgres DB.
So I defined a flat table as datasource which looks like:
+----------------+--------+----------+-------+--------+
| date | store | product | value | color |
+----------------+--------+----------+-------+--------+
| 20160101000000 | store1 | productA | 5231 | red |
| 20160101000000 | store1 | productB | 3213 | green |
| 20160101000000 | store2 | productX | 4231 | red |
| 20160101000000 | store3 | productY | 3213 | green |
| 20160101000000 | store4 | productZ | 1223 | green |
| 20160101000000 | store4 | productK | 3113 | yellow |
| 20160101000000 | store4 | productE | 213 | green |
| .... | | | | |
| 20160109000000 | store1 | productA | 512 | green |
+----------------+--------+----------+-------+--------+
So I would like to add a table / crosstab to my birt report which creates a table (and after that a page break) for EVERY store which looks like:
**Store 1**
+----------------+----------+----------+----------+-----+
| | productA | productB | productC | ... |
+----------------+----------+----------+----------+-----+
| 20160101000000 | 3120 | 1231 | 6433 | ... |
| 20160102000000 | 6120 | 1341 | 2121 | ... |
| 20160103000000 | 1120 | 5331 | 1231 | ... |
+----------------+----------+----------+----------+-----+
--- PAGE BREAK ---
....
So what I tried in first was: Getting to work the standard CrossTab tutorial-template of BIRT.
I defined the DataSource, and created a datacube with dimension-group of 'store' and 'product' , and as SUM / detail -data the 'value' and for this example I just selected ONE day.
But the result looks like this:
+--------+----------+----------+----------+----------+-----+----------+
| | productA | productC | productD | productE | ... | productZ |
+--------+----------+----------+----------+----------+-----+----------+
| Store1 | 213 | | 3234 | 897 | ... | 6767 |
| Store2 | 513 | 2213 | 1233 | | ... | 845 |
| Store3 | 21 | | | 32 | ... | |
| Store4 | 123 | 222 | 142 | | ... | |
+--------+----------+----------+----------+----------+-----+----------+
It's because not every product is selled in every store, but the crosstab creates the columns by selecting ALL products available.
So, I just have no idea how to generate dynamicly different tables with different (but also dynamic) amount of columns.
The second step then would be to get the dates (days) to work.
But thanks in advance for every hint ot tutorial link to question one ;-)
You can just add a table with the complete datasource. Select the table and a group. Group by StoreID. You can set the pagebreak options for each grouping. Set the property for after to "always exluding last".
BIRT will add a group header. You can add multiple groupheader rows get the layout you're after.
For crosstabs it works in a similar way. After you added the crosstab to your page and set the info for the groups on rows and columns and added summaries. You can view the data. Select the crosstab and View the Row Area properties, select the pagegroup settings and add a new pagebreak. You can select on which group you want to break, choose your storeID group and select after: "always excluding last"

SphinxSE distinct empty result

I run this query in sphinx se console:
SELECT #distinct FROM all_ips GROUP BY ip1;
I get this result:
+------+--------+
| id | weight |
+------+--------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 9 | 1 |
| 15 | 1 |
| 16 | 1 |
| 17 | 1 |
| 20 | 1 |
| 21 | 1 |
| 25 | 1 |
| 26 | 1 |
| 27 | 1 |
| 31 | 1 |
| 32 | 1 |
| 38 | 1 |
| 39 | 1 |
| 40 | 1 |
| 46 | 1 |
| 50 | 1 |
| 51 | 1 |
+------+--------+
20 rows in set (0.57 sec)
How can i get number of unique values? Why #distinct column doesn't show up in results?
1) I dont think that is sphinxSE - do you really mean sphinxQL? That looks more like sphinxQL.
2) Distinct of what column? You need to sell sphinx what attribute you want to count the distinct values in. In sphinxQL use COUNT(DISTINCT column_name)
You will require simple SQL statement for getting count. Something like this
SELECT count(ip1),ip1
FROM all_ips
GROUP BY ip1;

Resources