How can I retrieve the distinct values from an internal table?
I am using the SORT and DELETE ADJACENT DUPLICATES to get what I need, but I would like to improve these kind of selections.
The point is: imagine you have an internal table with two purchase orders information, where each one has two items. How can I get the distinct purchase orders number?
For instance: I've selected the following information from EKPO:
ebeln | ebelp
---------- | -----
1234567890 | 00010
1234567890 | 00020
1234567891 | 00010
1234567891 | 00020
To get distinct ebeln values:
ebeln
----------
1234567890
1234567891
For that, I need to sort the table and apply the DELETE ADJACENT DUPLICATES. I would like to know if there is any trick to replace these commands.
COLLECT also results distinct values
DATA: lt_collect like table of lt_source-some_field.
LOOP AT lt_source INTO ls_source.
COLLECT ls_source-some_field INTO lt_collect.
ENDLOOP.
* lt_collect has distinct values of lt_source-some_field
To get distinct EBELN what you need to do is simply
SELECT DISTINCT ebeln
FROM ekpo
INTO TABLE lt_distinct_ebeln
WHERE (your_where_condition).
That's all it takes.
An option would be to create a loop and select when the values change. For this to work as you mention, the table must be sorted by the field you are looking for.
loop at GT_TABLE into WA_TABLE.
on change FIELD.
*Operation
endon.
endloop.
Another option is to use the same but with a AT. In order for AT to work, the values from the field select in AT declaration to the left of the table must be the same.
loop at GT_TABLE into WA_TABLE.
at new WA_TABLE-FIELD.
*Operation
endat.
endloop.
Related
I have one table where each row has three columns. The first two columns are a prefix and a value. The third column is what I'm trying to get a distinct count for columns one/two.
Basically I'm trying to get to this.
Account
Totals
prefix & value1
101
prefix & value2
102
prefix & value3
103
I've tried a lot of different versions but I'm basically noodling around this.
select prefix||value as Account, count(distinct thirdcolumn) as Totals from Transactions
It sounds like you want
SELECT
prefix||value Account,
count(distinct thirdcolumn) Totals
FROM Transactions
GROUP BY prefix, value
The count(distinct thirdcolumn) says you want a count of the distinct values in the third column. The GROUP BY prefix, value says you want a row returned for each unique prefix/value combination and that the count applies to that combination.
Note that "thirdcolumn" is a placeholder for the name of your third column, not a magic keyword, since I didn't see the actual name in the post.
If you want the number of rows for each prefix/value pair then you can use:
SELECT prefix || value AS account,
COUNT(*) AS totals
FROM Transactions
GROUP BY prefix, value
You do not want to count the DISTINCT values for prefix/value as if you GROUP BY those values then different values for the pairs will be in different groups so the COUNT of DISTINCT prefix/value pairs would always be one.
Currently the code looks something like this:
LOOP AT lt_orders ASSIGNING <fs_order>.
SELECT COUNT(*) AS cnt
FROM order_items
INTO <fs_order>-cnt
WHERE order_id = <fs_order>-order_id.
ENDLOOP.
It is the slowest part of the report. I want to speed it up.
How can I use FOR ALL ENTRIES with GROUP BY?
Check the documentation. You can't use GROUP BY. Maybe in this case, you could try selecting your items with FAE outside of the loop, then count them using a parallel cursor:
REPORT.
TYPES: BEGIN OF ty_result,
vbeln TYPE vbeln,
cnt TYPE i.
TYPES: END OF ty_result.
DATA: lt_headers TYPE SORTED TABLE OF ty_result WITH UNIQUE KEY vbeln,
lv_tabix TYPE sy-tabix VALUE 1.
"get the headers
SELECT vbeln FROM vbak UP TO 100 ROWS INTO CORRESPONDING FIELDS OF TABLE lt_headers.
"get corresponding items
SELECT vbeln, posnr FROM vbap FOR ALL ENTRIES IN #lt_headers
WHERE vbeln EQ #lt_headers-vbeln
ORDER BY vbeln, posnr
INTO TABLE #DATA(lt_items).
LOOP AT lt_headers ASSIGNING FIELD-SYMBOL(<h>).
LOOP AT lt_items FROM lv_tabix ASSIGNING FIELD-SYMBOL(<i>).
IF <i>-vbeln NE <h>-vbeln.
lv_tabix = sy-tabix.
EXIT.
ELSE.
<h>-cnt = <h>-cnt + 1.
ENDIF.
ENDLOOP.
ENDLOOP.
BREAK-POINT.
Or join header/item with a distinct count on the item id (whichever column that would be in your table).
You should be able to do something like
SELECT COUNT(order_item_id) AS cnt, order_id
FROM order_items
INTO CORRESPONDING FIELDS OF TABLE lt_count
GROUP BY order_id.
Assuming that order_item_id is a key in the order_items table. And assuming that lt_count has two fields: cnt of type int8 and order_id of same type as your other order_id fields
PS: then you can loop over lt_count and move the counts to lt_orders. Or the other way around. To speed up the loop, sort one of the tables and use READ ... BINARY SEARCH
I did with table KNB1 (customer master in company code), where we have customers, which are created in several company codes.
Please note, because of FOR ALL ENTRIES you have to SELECT the full key.
TYPES: BEGIN OF ty_knb1,
kunnr TYPE knb1-kunnr,
count TYPE i,
END OF ty_knb1.
TYPES: BEGIN OF ty_knb1_fae,
kunnr TYPE knb1-kunnr,
END OF ty_knb1_fae.
DATA: lt_knb1_fae TYPE STANDARD TABLE OF ty_knb1_fae.
DATA: lt_knb1 TYPE HASHED TABLE OF ty_knb1
WITH UNIQUE KEY kunnr.
DATA: ls_knb1 TYPE ty_knb1.
DATA: ls_knb1_db TYPE knb1.
START-OF-SELECTION.
lt_knb1_fae = VALUE #( ( kunnr = ... ) ). "add at least one customer which is created in several company codes
ls_knb1-count = 1.
SELECT kunnr bukrs
INTO CORRESPONDING FIELDS OF ls_knb1_db
FROM knb1
FOR ALL ENTRIES IN lt_knb1_fae
WHERE kunnr EQ lt_knb1_fae-kunnr.
ls_knb1-kunnr = ls_knb1_db-kunnr.
COLLECT ls_knb1 INTO lt_knb1.
ENDSELECT.
Create a range table for your lt_orders, like lt_orders_range.
Do select order_id, count( * ) where order_id in lt_orders_range.
If you think this is too much to create a range table, you will save a lot of performance by running just one select for all orders instead of single select for each order id.
Not directly, only through a CDS view
While all of the answers provide a faster solution than the one in the question, the fastest way is not mentioned.
If you have at least Netweaver 7.4, EHP 5 (and you should, it was released in 2014), you can use CDS views, even if you are not on HANA.
It still cannot be done directly, as OpenSQL does not allow FOR ALL ENTRIES with GROUP BY, and CDS views cannot handle FOR ALL ENTRIES. However, you can create one of each.
CDS:
#AbapCatalog.sqlViewName: 'zorder_i_fae'
DEFINE VIEW zorder_items_fae AS SELECT FROM order_items {
order_id,
count( * ) AS cnt,
}
GROUP BY order_id
OpenSQL:
SELECT *
FROM zorder_items_fae
INTO TABLE #DATA(lt_order_cnt)
FOR ALL ENTRIES IN #lt_orders
WHERE order_id = #lt_orders-order_id.
Speed
If lt_orders contains more than about 30% of all possible order_id values from table ORDER_ITEMS, the answer from iPirat is faster. (While using more memory, obviously)
However, if you need only a couple hunderd order_id values out of millions, this solution is about 10 times faster than any other answer, and 100 times faster than the original.
I have a scenario to be implemented in informatica where I need to remove duplicate records from a table based on PK. But I need to keep the 1st occurrence of the PK values and remove the others(in case of duplicate PK).
For example, If my source has 1,1,1,2,3,3,4,5,4. I want to see my target data as 1,2,3,4,5. I have to read data from the same table and need to load into the same table., no new table can be introduced. please help me with your inputs.
Thanks in Advance!
I suppose you want the first occurrence because there are other (data) columns in addition to the key you entered. Therefore you want
1,b
1,c
1,a
2,d
3,c
3,d
4,e
5,f
4,b
Turned into
1,b
2,d
3,c
4,e
5,f
??
In that case try this mapping layout:
SRC -> SQ -> SRT -> AGG -> TGT
SEQ /
Where the sorter is set to sort on the KEY,sequence_port (desc)
And the aggregator is set to group by the KEY, and the sequence_port should not go out of the sorter
Hope you can follow me :)
There are multiple ways to do this, the simplest would be too do it in the SQL override.
Assuming the example quoted above, the SQL would be like this. General idea is to set a row number for a primary key ( so if you have 3 rows with same pk they will have 1,2,3 as row numbers before being reset for the next pk)
SQL:
select * from (
Select primary_key,column2 row_number() over (partition by primary_key order by primary_key) as distinct_key) where distinct_key=1
Before:
1,b
1,c
1,a
2,d
3,c
3,d
Intermediate query:
1,c,1
1,a,2
2,d,1
3,c,1
3,d,2
output:
1,c
2,d
3,d
I am able to achieve this by following the below steps.
1. Passing Sorted data(keys are EMP_ID, MOBILE, DEPTID) to an expression.
2. Creating the following variable ports in the expression and getting the counts.
V_CURR_EMP_ID = EMP_ID
V_CURR_MOBILE = MOBILE
V_CURR_DEPTID = DEPTID
V_COUNT =
IIF(V_CURR_EMP_ID=V_PREV_EMP_ID AND V_CURR_MOBILE=V_PREV_MOBILE AND V_CURR_DEPTID=V_PREV_DEPTID ,V_COUNT+1,1)
V_PREV_EMP_ID = EMP_ID
V_PREV_MOBILE = MOBILE
V_PREV_DEPTID = DEPTID
O_COUNT =V_COUNT
3. In the next transformation which is filter, I am taking only the records which have count more than 1 and deleting them using update strategy(DD_DELETE).
Here is the mapping flow.
SQ->SRTR->EXP->FIL->UPD->TGT
Also, when I tried to delete them using aggregator , it is deleting only the first occurrence of duplicates but not all.
Thanks again for your inputs!
I know there is IN as alternative to multiple ORs:
select loginid from customer where code IN ('TEST1','TEST2','TEST3','TEST4'))
This will return all loginids with code that mach any of the four TEST elements.
Is there something similar for AND? I will need to find out all loginids that have code: TEST10,TESTA,TEST1,TESTB,AIFK,AICK....(there are 20 codes)
You cannot compare that. With ORs or IN you look for records that match one of the values. With AND you would look for a record where the column matches all those values, but this field can of course only hold one value, so will never find any record.
So obviously you are looking for something entirely else. Probably you want to aggregate your data, to find IDs for which records for each of the values exist. This would be:
select loginid
from customer
where code IN ('TEST1','TEST2','TEST3','TEST4')
group by loginid
having count(distinct code) = 4;
You could do the following:
SELECT loginid
FROM customer
WHERE code IN ('TEST1', ... , 'TEST20')
GROUP BY loginid
HAVING COUNT(DISTINCT code) = 20;
The difference between Jon's answer and this one is that if you use other codes in the table, my query will return all loginids for which there are rows for these 20 codes, and Jon's answer will return all loginids for which there are 20 distinct codes.
No
A short answer, I know, but sometimes it's the only real answer.
However
Depending on what exactly you're trying to achieve, you may be able to use a count of a query grouping the items, to check that all match.
This assumes that your CustomerCodes are kept in a separate relational table.
SELECT loginID
FROM Customer
WHERE loginID IN (
SELECT loginID, count(*) as codeCount
FROM CustomerCodes
GROUP BY loginID
HAVING codeCount = 20
)
It doesn't work if you have Code1, Code2 Code3 etc fields... you'd have to split the data out into a separate table, for example:
loginID | code
---------------
1 | code1
1 | code2
I am selecting a table that has multiple of the same records (same REQUEST_ID) with different VERSION_NO. So I want to sort it descending so I can take the highest number (latest record).
This is what I have...
IF it_temp2[] IS NOT INITIAL.
SELECT request_id
version_no
status
item_list_id
mod_timestamp
FROM ptreq_header INTO TABLE it_abs3
FOR ALL ENTRIES IN it_temp2
WHERE item_list_id EQ it_temp2-itemid.
ENDIF.
So version_no is one of the SELECT field but I want to sort that field (descending) and only take the first row.
I was doing some research and read that SORT * BY * won't work with FOR ALL ENTRIES. But that's just my understanding from reading up.
Please let me know how I can make this work. Thanks
You can simply sort the itab after the select and delete all adjecent duplicates afterwards, if wanted:
SORT it_abs3 BY request_id [ASCENDING] version_no DESCENDING.
DELETE ADJACENT DUPLICATES FROM it_abs3 COMPARE request_id.
Depending on the amount of expected garbage (to be deleted lines) in the itab an SQL approach is better. See Used_By_Already's answer.
If you are using the term "latest" to indicate "the most recent entry", then the field mod_timestamp appears to be relevant and you could use it this way to choose only the most recent records for each request_id.
SELECT
request_id
, version_no
, status
, item_list_id
, mod_timestamp
FROM ptreq_header h
INNER JOIN (
SELECT
request_id
, MAX(mod_timestamp) AS latest
FROM ptreq_header
GROUP BY
request_id
) l
ON h.request_id = l.request_id
AND h.mod_timestamp = l.latest
If you want the largest version_no, then instead of MAX(mod_timestamp) use MAX(version_no)
Just declare the it_abs3 as a sorted table with key that would consist of the columns you want to sort by.
You can also sort the table after the query.
SORT it_abs3 BY ...