Compute value based on next row in BIRT - birt

I am creating a BIRT Report where each row is a receipt matched with a purchase order. There are usually more than one receipt per purchase order. My client wants the qty_remaining on the purchase order to show only on the last receipt for each purchase order. I am not able to alter the data before BIRT gets it. I see two possible solutions, but I am unable to find how to implement either. This question will deal with first possible solution.
If I can compare the purchase order number(po_number) with the next row, then I can set the current row's qty_remaining to 0 if the po_numbers match else show the actual qty_remaining. Is it possible to access the next row?
Edit
The desired look is similar to this:
| date | receipt_number | po_number | qty_remaining | qty_received |
|------|----------------|-----------|---------------|--------------|
| 4/9 | 723 | 6026 | 0 | 985 |
| 4/9 | 758 | 6026 | 2 | 1 |
| 4/20 | 790 | 7070 | 58 | 0 |
| 4/21 | 801 | 833 | 600 | 0 |
But I'm currently getting this:
| date | receipt_number | po_number | qty_remaining | qty_received |
|------|----------------|-----------|---------------|--------------|
| 4/9 | 723 | 6026 | 2 | 985 |
| 4/9 | 758 | 6026 | 2 | 1 |
| 4/20 | 790 | 7070 | 58 | 0 |
| 4/21 | 801 | 833 | 600 | 0 |

I think you looking at this the wrong way. If you want behavior that resembles for loops you should use grouping and aggregate functions. You can build quite complex stuff by using (or not using) the group headers and footers.
In your case I would try to group the receipts on po_number. Order them by receipt_number then have a aggregate function like MAX or LAST on the receipts_number and name it 'last_receipt'. It should aggregate on the group, not the whole table. This 'total' is available on every row within the group.
Then you can use the visibitly setting to only show the qty_remaining when the row['receipt_number'] == row['last_receipt']

Related

Limit rows examined in Oracle

My table has millions of records. In this query below, can I make Oracle 12c examine the first X rows only instead of doing a full table scan?
The value of X, I imagine should be Offset + Fetch Next , so in this case 15
SELECT * FROM table OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
Thanks in advance
Edit 1
These are the tables involved and this is the actual query
Orders - This table has 113k records in my test DB ( and over 8 million in prod db like my original question mentioned)
--------------------------
| Id | SKUField1|SKUField2|
--------------------------
| 1 | Value1 | Value2 |
| 2 | Value2 | Value2 |
| 3 | Value1 | Value3 |
--------------------------
Products - This table has 2 million records in my test DB ( prod db is similar)
---------------
| PId| SKU_NUM|
---------------
| 1 | Value1 |
| 2 | Value2 |
| 3 | Value3 |
---------------
Note that values of Orders.SKUField1 and Orders.SKUField2 come from the Products.SKU_NUM values
Actual Query:
SELECT /*+ gather_plan_statistics */ Id, PId, SKUField1, SKUField2, SKU_NUM
FROM Orders
LEFT JOIN (
-- this inner query reduces size of Products from 2 million rows down to 1462 rows
select * from Products where SKU_NUM in (
select SKUField1 from Orders
)
) p1 ON SKUField1 = p1.SKU_NUM
LEFT JOIN (
-- this inner query reduces size of table B from 2 million rows down to 459 rows
select * from Products where SKU_NUM in (
select SKUField2 from Orders
)
) p4 ON SKUField2 = p4.SKU_NUM
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY
Execution Plan:
--------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.06 | 8013 | | | |
|* 1 | VIEW | | 1 | 00:00:01 | 10 |00:00:00.06 | 8013 | | | |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 00:00:01 | 15 |00:00:00.06 | 8013 | 27M| 1904K| |
|* 3 | HASH JOIN RIGHT OUTER | | 1 | 00:00:01 | 15 |00:00:00.06 | 8013 | 1162K| 1162K| 1344K (0)|
| 4 | VIEW | | 1 | 00:00:01 | 1462 |00:00:00.04 | 6795 | | | |
| 5 | NESTED LOOPS | | 1 | 00:00:01 | 1462 |00:00:00.04 | 6795 | | | |
| 6 | NESTED LOOPS | | 1 | 00:00:01 | 1462 |00:00:00.04 | 5333 | | | |
| 7 | SORT UNIQUE | | 1 | 00:00:01 | 1469 |00:00:00.04 | 3010 | 80896 | 80896 |71680 (0)|
| 8 | TABLE ACCESS FULL | Orders | 1 | 00:00:01 | 113K|00:00:00.02 | 3010 | | | |
|* 9 | INDEX UNIQUE SCAN | UIX_Product_SKU_NUM | 1469 | 00:00:01 | 1462 |00:00:00.01 | 2323 | | | |
| 10 | TABLE ACCESS BY INDEX ROWID | Products | 1462 | 00:00:01 | 1462 |00:00:00.01 | 1462 | | | |
|* 11 | HASH JOIN RIGHT OUTER | | 1 | 00:00:01 | 15 |00:00:00.02 | 1218 | 1142K| 1142K| 1335K (0)|
| 12 | VIEW | | 1 | 00:00:01 | 459 |00:00:00.02 | 1213 | | | |
| 13 | NESTED LOOPS | | 1 | 00:00:01 | 459 |00:00:00.02 | 1213 | | | |
| 14 | NESTED LOOPS | | 1 | 00:00:01 | 459 |00:00:00.02 | 754 | | | |
| 15 | SORT UNIQUE | | 1 | 00:00:01 | 462 |00:00:00.02 | 377 | 24576 | 24576 |22528 (0)|
| 16 | INDEX FAST FULL SCAN | Orders_SKUField2_IDX6 | 1 | 00:00:01 | 113K|00:00:00.01 | 377 | | | |
|* 17 | INDEX UNIQUE SCAN | UIX_Product_SKU_NUM | 462 | 00:00:01 | 459 |00:00:00.01 | 377 | | | |
| 18 | TABLE ACCESS BY INDEX ROWID| Products | 459 | 00:00:01 | 459 |00:00:00.01 | 459 | | | |
| 19 | TABLE ACCESS FULL | Orders | 1 | 00:00:01 | 15 |00:00:00.01 | 5 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------
Hence, based on the "A-Rows" column values for row Ids 8 and 16 in the execution plan, it seems like there are full table scans on the Orders table (though row id 16 atleast seems to be using an index). So my question is is it true that there is a full table scan on the orders table even though I am using Offset/Fetch Next
Although your FETCH clause may use a full table scan, Oracle will still only fetch the first X rows from the table.
In the following example, the "TABLE ACCESS FULL" operation does start to read the entire table, but it gets cutoff part of the way through by the "WINDOW NOSORT STOPKEY" operation. Not all full table scans actually scan the full table. You would see similar behavior if your code ended with WHERE ROWNUM <= 50.
CREATE TABLE some_table AS SELECT * FROM all_objects;
EXPLAIN PLAN FOR SELECT * FROM some_table OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
SELECT * FROM TABLE(dbms_xplan.display);
Plan hash value: 2559837639
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 7410 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 15 | 7410 | 2 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 15 | 2010 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | SOME_TABLE | 15 | 2010 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=15 AND
"from$_subquery$_002"."rowlimit_$$_rownumber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=15)
The performance implications get more complicated if you also want to order the results. If that is the case, you may want to post the full query and execution plan.
(EDIT: 2022-09-25)
Yes, there is a full table scan on the ORDERS table happening on line 8 of the execution plan. As you mentioned, you can look at the "A-rows" column to tell what's really happening.
But the third full table scan of ORDERS, on line 19, is not a "full" full table scan. The operation "WINDOW NOSORT STOPKEY" stops that full table scan as soon as the 15 necessary rows are read. So the FETCH syntax is helping at least a little.
Applying a FETCH to a query does not mean that every single table will be limited. Although, in your query, it does seem like there ought to be a way to reduce the full table scans. Perhaps an index on SKUField1 would help?
Since Oracle as I know don't provide something like limit or top you can created by yourself like the following:
what is happening here, the inner query gets all the first 10 records and the outer query get those, you can still use any clauses like where or order or any others
SELECT * FROM (
SELECT * FROM Customers WHERE CustomerID <= 10 ORDER BY CustomerID
)
The full article will be found about this topic here at Oracle-Fetch
I am using Online Oracle so you can try it from your end, please let me know if you still have a problem.

How to pivot data in Hive?

First, I've checked other topics on the subject like this one How to transpose/pivot data in hive? but that doesn't match with what I want.
So this is the table I have
| ID | Day | Status |
| 1 | 1 | A |
| 2 | 10 | B |
| 3 | 101 | A |
| 3 | 322 | B |
| 3 | 102 | C |
| 3 | 354 | D |
And i'd like to concat the different Status for each IDs ordering by the Day, in order to have this :
| ID | Status |
| 1 | A |
| 2 | B |
| 3 | A,C,B,D |
The thing is that I don't know how many status I can have, so i can't create as many columns I want for the days since I don't know how many day/status I'll have, so the answers from other topics with group_map or others, I don't know how to adapt it for my problem.
Thank's for helping me ^^
use collect_set (for distinct values) or collect_list to aggregate array and concatenate it using concat_ws:
select ID, concat_ws(',',collect_list(Status)) as Status
from table
group by ID;

View count rows as columns in query result

First thing first: I am able to get the data one way. My purpose is to increase the readability of my query result. I am seeking if it is possible.
I have a table that fed by devices. I want to get the number of data sent on each hour that was grouped by two identical columns. Grouping these two columns is needed to determine one device type.
Table structure is like:
| identifier-1 | identifier-2 | day | hour | data_name | data_value |
|--------------|--------------|------------|------|-----------|------------|
| type_1 | subType_4 | 2016-08-25 | 0 | Key-30 | 4342 |
|--------------|--------------|------------|------|-----------|------------|
| type_3 | subType_2 | 2016-08-25 | 0 | Key-50 | 96 |
|--------------|--------------|------------|------|-----------|------------|
| type_6 | subType_2 | 2016-08-25 | 1 | Key-44 | 324 |
|--------------|--------------|------------|------|-----------|------------|
| type_2 | subType_1 | 2016-08-25 | 1 | Key-26 | 225 |
|--------------|--------------|------------|------|-----------|------------|
I'm going to use one specific data_name which was sent by all devices, and getting the count of this data_name will give me the data sent on each hour. It is possible to get the number in 24 rows as grouping by identifier-1,identifier-2, day and hour. However, they will repeat for each device type.
| identifier-1 | identifier-2 | day | hour | count |
|--------------|--------------|------------|------|-------|
| type_6 | subType_2 | 2016-08-25 | 0 | 340 |
|--------------|--------------|------------|------|-------|
| type_6 | subType_2 | 2016-08-25 | 1 | 340 |
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
| type_1 | subType_4 | 2016-08-25 | 0 | 32 |
|--------------|--------------|------------|------|-------|
| type_1 | subType_4 | 2016-08-25 | 1 | 30 |
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
I want to view the result like this:
| identifier-1 | identifier-2 | day | count_of_0 | count_of_1 |
|--------------|--------------|------------|------------|------------|
| type_6 | subType_2 | 2016-08-25 | 340 | 340 |
|--------------|--------------|------------|------------|------------|
| type_1 | subType_4 | 2016-08-25 | 32 | 30 |
|--------------|--------------|------------|------------|------------|
|--------------|--------------|------------|------------|------------|
In SQL, it is possible to get subqueries and columns in result but it is not possible on Hive. I guess it is called correlated subqueries.
Hive column as a subquery select
Answer of this question did not work for me.
Do you have any idea or suggestion?
You can do this using conditional aggregation:
select identifier1, identifier2, day,
sum(case when hour = 0 then data_value else 0 end) as cnt_0,
sum(case when hour = 1 then data_value else 0 end) as cnt_1
from t
where data_name = ??
group by identifier1, identifier2, day
order by identifier1, identifier2, day

Get duplicate rows based on one column using BIRT

I have one table in BIRT Report :
| Name | Amount |
| A | 200 |
| B | 100 |
| A | 150 |
| C | 80 |
| C | 100 |
I need to summarize this table in to another table as : I name is same and add corresponding values.
Summarized table would be :
| A | 350 |
| B | 100 |
| C | 180 |
Here A = 200 + 150 , B = 100 , C = 80 + 100
How I can summarize table from another table present in BIRT Report ?
That is quite easy. Just add another table to your report, select the same datasource as the first table (on the tab binding)
Go to the tab groups and add a group on the your 'Name' column.
You'll see the table change. It added group header row and group footer row. The header will also have an element on which you grouped (in this case name)
Now right click next to name in the amount column. Select Insert->Aggregation.
Select function SUM, expression should be amount, Aggregate On should be your newly created group.
Now you can see the results but it will be something like:
| A | 350 |
| A | 200 |
| A | 150 |
| B | 100 |
| B | 100 |
| C | 180 |
| C | 100 |
| C | 80 |
If you delete the detail row from the table, you'll have the result your after.
For you information:
Have a play with this, its good excersise. Move the new aggregation to the group footer, add a top border to that cell, put a label total in front if it and you'll have something like this:
| A | |
| A | 200 |
| A | 150 |
----------
| total | 350 |
| B | |
| B | 100 |
----------
| total | 100 |
| C | |
| C | 100 |
| C | 80 |
----------
| total | 180 |
Also, you don't have to select the datasource as the binding, you can also select your first table for the bindings:
select the table, open the tab biding, select report item and pick your first table from the dropdown.
This can create very complex situations, therefor I usually try to work from the original dataset.

Birt-Crosstab with empty columns

so I'm a BIRT beginner, and I just tried to get a real simple report from one of my tables of a postgres DB.
So I defined a flat table as datasource which looks like:
+----------------+--------+----------+-------+--------+
| date | store | product | value | color |
+----------------+--------+----------+-------+--------+
| 20160101000000 | store1 | productA | 5231 | red |
| 20160101000000 | store1 | productB | 3213 | green |
| 20160101000000 | store2 | productX | 4231 | red |
| 20160101000000 | store3 | productY | 3213 | green |
| 20160101000000 | store4 | productZ | 1223 | green |
| 20160101000000 | store4 | productK | 3113 | yellow |
| 20160101000000 | store4 | productE | 213 | green |
| .... | | | | |
| 20160109000000 | store1 | productA | 512 | green |
+----------------+--------+----------+-------+--------+
So I would like to add a table / crosstab to my birt report which creates a table (and after that a page break) for EVERY store which looks like:
**Store 1**
+----------------+----------+----------+----------+-----+
| | productA | productB | productC | ... |
+----------------+----------+----------+----------+-----+
| 20160101000000 | 3120 | 1231 | 6433 | ... |
| 20160102000000 | 6120 | 1341 | 2121 | ... |
| 20160103000000 | 1120 | 5331 | 1231 | ... |
+----------------+----------+----------+----------+-----+
--- PAGE BREAK ---
....
So what I tried in first was: Getting to work the standard CrossTab tutorial-template of BIRT.
I defined the DataSource, and created a datacube with dimension-group of 'store' and 'product' , and as SUM / detail -data the 'value' and for this example I just selected ONE day.
But the result looks like this:
+--------+----------+----------+----------+----------+-----+----------+
| | productA | productC | productD | productE | ... | productZ |
+--------+----------+----------+----------+----------+-----+----------+
| Store1 | 213 | | 3234 | 897 | ... | 6767 |
| Store2 | 513 | 2213 | 1233 | | ... | 845 |
| Store3 | 21 | | | 32 | ... | |
| Store4 | 123 | 222 | 142 | | ... | |
+--------+----------+----------+----------+----------+-----+----------+
It's because not every product is selled in every store, but the crosstab creates the columns by selecting ALL products available.
So, I just have no idea how to generate dynamicly different tables with different (but also dynamic) amount of columns.
The second step then would be to get the dates (days) to work.
But thanks in advance for every hint ot tutorial link to question one ;-)
You can just add a table with the complete datasource. Select the table and a group. Group by StoreID. You can set the pagebreak options for each grouping. Set the property for after to "always exluding last".
BIRT will add a group header. You can add multiple groupheader rows get the layout you're after.
For crosstabs it works in a similar way. After you added the crosstab to your page and set the info for the groups on rows and columns and added summaries. You can view the data. Select the crosstab and View the Row Area properties, select the pagegroup settings and add a new pagebreak. You can select on which group you want to break, choose your storeID group and select after: "always excluding last"

Resources