In Pig getting error as 'Error compiling operator POLocalRearrange' - hadoop

I am practicing on cloudera yarn VMware Player(non commercial use).
My script in pig is,
a1 = load '/user/training/my_hdfs/id' using PigStorage('\t') as(id:int,name:chararray,desig:chararray);
a2 = load '/user/training/my_hdfs/trips' using PigStorage('\t') as(id:int,place:chararray,no_trips:int);
a3 = join a1 by id,a2 by id;
a4 = group a3 by a1::id;
illustrate a4;
After illustrate it is showing message as,
2017-08-21 07:52:11,926 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception : Error compiling operator POLocalRearrange
Dataset is,
Table id
101 aaa executive
102 bbb manager
104 hhh manager
106 ccc trainee
109 hhh trainee
Table trips
101 pune 1
101 hyd 2
102 pune 2
102 hyd 3
102 bang 4

When i tried running you program with the provided data,i too get some error as the delimiter in you files is not consistent. Some where its space and some where its tab(may be its because copy pasting). I make the delimiter universal(using tab) and everything works perfectly fine.
Try to use dump a1 or dump a2 and see if you can see data in correct columns.
For me it worked perfect after making delimiter universal and illustrate a4 gives below output:
------------------------------------------------------------------
| a1 | id:int | name:chararray | desig:chararray |
------------------------------------------------------------------
| | 101 | aaa | executive |
| | 101 | aaa | executive |
------------------------------------------------------------------
----------------------------------------------------------------
| a2 | id:int | place:chararray | no_trips:int |
----------------------------------------------------------------
| | 101 | pune | 1 |
| | 101 | hyd | 2 |
----------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------
| a3 | a1::id:int | a1::name:chararray | a1::desig:chararray | a2::id:int | a2::place:chararray | a2::no_trips:int |
------------------------------------------------------------------------------------------------------------------------------------------------
| | 101 | aaa | executive | 101 | pune | 1 |
| | 101 | aaa | executive | 101 | hyd | 2 |
| | 101 | aaa | executive | 101 | pune | 1 |
| | 101 | aaa | executive | 101 | hyd | 2 |
------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| a4 | group:int | a3:bag{:tuple(a1::id:int,a1::name:chararray,a1::desig:chararray,a2::id:int,a2::place:chararray,a2::no_trips:int)} |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | 101 | {(101, ..., 1), ..., (101, ..., 2)} |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Related

Join two tables in Oracle concatenating results by id

I want to join two tables in Oracle and list the results concatenated if they have the same parent id in this way:
Table All
|id | other fields|service_id|
|------|-------------|----------|
| 827 |xxxxxxxx |null |
| 828 |xxxxxxxx |327 |
| 829 |xxxxxxxx |328 |
| 860 |xxxxxxxx |null |
| 861 |xxxxxxxx |326 |
Table Services
| id | parent_id |
| ---- | -------|
| 326 | 860 |
| 327 | 827 |
| 328 | 827 |
I want a query that returns this
|id | sub_id |
|------|---------|
| 827 | 828,829 |
| 828 | null |
| 829 | null |
| 860 | 861 |
| 861 | null |
Thanks a lot!
By joining the all table to service and then back to all in an alias you can get the ID and sub id's in a list that then just need to be combined from multiple rows into 1 column per ID.
This should get you the "Raw data" that then needs to be aggregrated...
TESTED: Rextester working example
NOTE: Since you have different "LEVELS" of depth for your sub_ID i'm not really sure what you want so 860 and 123 isn't included because it's a completely different field than the source of 827
SELECT A.ID as ID, A2.ID as Sub_ID
FROM ALL A
LEFT JOIN SERVICES S
on S.Pareint_ID = A.ID
LEFT JOIN All A2
on A2.Service_ID = S.ID
Now... if we assume that you have a version of oracle which supports ListAgg
SELECT A.ID as ID, ListAgg(A2.ID,',') within group (Order by A2.ID) as Sub_ID
FROM ALL A
LEFT JOIN SERVICES S
on S.Parent_ID = A.ID
LEFT JOIN All A2
on A2.Service_ID = S.ID
GROUP BY A.ID
Giving Us:
+----+-----+---------+
| | ID | SUB_ID |
+----+-----+---------+
| 1 | 827 | 828,829 | -- These are all.id's...
| 2 | 828 | NULL |
| 3 | 829 | NULL |
| 4 | 860 | NULL | --> Now why is 123 present as it's a service.id
| 5 | 861 | NULL |
+----+-----+---------+
**Note all is a reserved word and either needs to be escaped or if your table name really isn't all; adjust accordingly.
LISTAGG Docs

Converting Column headings into Row data

I have a table in an Access Database that has columns that I would like to convert to be Row data.
I found a code in here
converting database columns into associated row data
I am new to VBA and I just don't know how to use this code.
I have attached some sample data
How the table currently is set up it is 14 columns long.
+-------+--------+-------------+-------------+-------------+----------------+
| ID | Name | 2019-10-31 | 2019-11-30 | 2019-12-31 | ... etc ... |
+-------+--------+-------------+-------------+-------------+----------------+
| 555 | Fred | 1 | 4 | 12 | |
| 556 | Barney| 5 | 33 | 24 | |
| 557 | Betty | 4 | 11 | 76 | |
+-------+--------+-------------+-------------+-------------+----------------+
I would like the output to be
+-------+------------+-------------+
| ID | Date | HOLB |
+-------+------------+-------------+
| 555 | 2019-10-31| 1 |
| 555 | 2019-11-30| 4 |
| 555 | 2019-12-31| 12 |
| 556 | 2019-10-31| 5 |
| 556 | 2019-11-30| 33 |
| 556 | 2019-12-31| 24 |
+-------+--------+-------------+---+
How can I modify this code into a Module and call the module in a query?
Or any other idea you may have.

Update Statement using Join Condition

Below is an example. TABLE 1 is manually created where the first three columns
are loaded here from an external file. Fourth column(SHOWROOM_ID) will be taken from TABLE2
and the rest of the columns in TABLE 1 will be updated based on criteria.
TABLE 1
NAME |OLD_CPR_NO |OLD_COS_NO |SHOWROOM_ID|NM_CPR_COS_MAT|NM_CPR_MAT|COS_CPR_MAT|
------------------------------------------------------------------------------------
FORD | 45 | 487 | | | |
TOYOTA | 78 | 562 | | | |
BENZ | 55 | 789 | | | |
JEEP | 66 | 124 | | | |
HONDA | 34 | 142 | | | |
KIA | 12 | 962 | | | |
GM | 89 | 7787 | | | |
CHRYSLER | 45 | 236 | | | |
AUDI | 67 | 4789 | | | |
TABLE 2
PK|NAME |OLD_CPR_NO |OLD_COS_NO |SHOWROOM_ID
---------------------------------------------
1 |FORD | 45 | 487 | 1
2 |TOYOTA | 78 | 562 | 2
3 |CIAT | 55 | 789 | 3
4 |JEEP | 66 | 124 | 5
5 |HONDA | 34 | 456 | 6
6 |MUSTANG | 12 | 962 | 7
7 |GM | 89 | 56 | 8
8 |CHRYSLER | 45 | 236 | 9
9 |AUDI | 67 | 4789 | 10
STEP 1: Update NM_CPR_COS_MAT column from table 1. This is an indicator field
where NAME,OLD_CPR_NO,OLD_COS_NO matches from TABLE 1 and TABLE 2 then assign indicator 'Y'
I was able to attain the results based on my below query:
UPDATE TABLE_1 TAB1
SET NM_CPR_COS_MAT = (SELECT 'Y'
FROM
TABLE_2 TAB2
WHERE
TRIM(TAB1.NAME) = TRIM(TAB2.NAME)
AND TRIM(TAB1.OLD_CPR_NO) = TRIM(TAB2.OLD_CPR_NO)
AND TRIM(TAB1.OLD_COS_NO) = TRIM(TAB2.OLD_COS_NO)
;
COMMIT;
UPDATE TABLE_1 TAB1
SET SHOWROOM_ID= (SELECT TAB2.SHOWROOM_ID
FROM
TABLE_2 TAB2
WHERE
TRIM(TAB1.NAME) = TRIM(TAB2.NAME)
AND TRIM(TAB1.OLD_CPR_NO) = TRIM(TAB2.OLD_CPR_NO)
AND TRIM(TAB1.OLD_COS_NO) = TRIM(TAB2.OLD_COS_NO)
AND TRIM(TAB1.NM_CPR_COS_MAT) = 'Y'
;
COMMIT;
RESULT:
TABLE 1
NAME |OLD_CPR_NO |OLD_COS_NO |SHOWROOM_ID|NM_CPR_COS_MAT|NM_CPR_MAT|COS_CPR_MAT|
------------------------------------------------------------------------------------
FORD | 45 | 487 | 1 | Y | |
TOYOTA | 78 | 562 | 2 | Y | |
BENZ | 55 | 789 | | | |
JEEP | 66 | 124 | 5 | Y | |
HONDA | 34 | 142 | | | |
KIA | 12 | 962 | | | |
GM | 89 | 7787 | | | |
CHRYSLER | 45 | 236 | 9 | Y | |
AUDI | 67 | 4789 | 10 | Y | |
But I am getting errors if I tried to use the join statements.
UPDATE TABLE_1 TAB1
SET NM_CPR_COS_MAT = 'Y'
FROM
TABLE_2 TAB2 JOIN
TABLE_1 TAB1 ON
TRIM(TAB1.OLD_CPR_NO) = TRIM(TAB2.OLD_CPR_NO)
WHERE
TRIM(TAB1.NAME) = TRIM(TAB2.NAME)
AND TRIM(TAB1.OLD_COS_NO) = TRIM(TAB2.OLD_COS_NO)
;
COMMIT;
ORA-00933: SQL command not properly ended.
From the below resulting table, I have to again UPDATE SHOWROOM_ID column and NM_CPR_MAT
TABLE 1
NAME |OLD_CPR_NO |OLD_COS_NO |SHOWROOM_ID|NM_CPR_COS_MAT|NM_CPR_MAT|COS_CPR_MAT|
------------------------------------------------------------------------------------
FORD | 45 | 487 | 1 | Y | |
TOYOTA | 78 | 562 | 2 | Y | |
BENZ | 55 | 789 | | | |
JEEP | 66 | 124 | 5 | Y | |
HONDA | 34 | 142 | | | |
KIA | 12 | 962 | | | |
GM | 89 | 7787 | | | |
CHRYSLER | 45 | 236 | 9 | Y | |
AUDI | 67 | 4789 | 10 | Y | |
STEP 2:
UPDATE TABLE_1 TAB1
SET NM_CPR_MAT = (SELECT 'Y'
FROM
TABLE_2 TAB2
WHERE
TRIM(TAB1.NAME) = TRIM(TAB2.NAME)
AND TRIM(TAB1.OLD_CPR_NO) = TRIM(TAB2.OLD_CPR_NO)
AND TRIM(NM_CPR_COS_MAT) IS NULL
;
COMMIT;
UPDATE TABLE_1 TAB1
SET SHOWROOM_ID= (SELECT TAB2.SHOWROOM_ID
FROM
TABLE_2 TAB2
WHERE
WHERE
TRIM(TAB1.NAME) = TRIM(TAB2.NAME)
AND TRIM(TAB1.OLD_CPR_NO) = TRIM(TAB2.OLD_CPR_NO)
AND TRIM(NM_CPR_COS_MAT) IS NULL
AND TRIM(NM_CPR_MAT) = 'Y'
;
COMMIT;
I AM GETTING THE BELOW RESULTS.I AM GETTING THE CORRECT 'Y' IN NM_CPR_MAT COLUMNS
AND ALSO THE CORRECT NUMBERS IN SHOWROOM_ID FOR THE NEW UPDATE STATEMENT BUT THE NUMBERS
THAT WAS UPDATED IN THE UPDATED STATEMENT WERE GONE.
TABLE 1
NAME |OLD_CPR_NO |OLD_COS_NO |SHOWROOM_ID|NM_CPR_COS_MAT|NM_CPR_MAT|COS_CPR_MAT|
------------------------------------------------------------------------------------
FORD | 45 | 487 | | Y | |
TOYOTA | 78 | 562 | | Y | |
BENZ | 55 | 789 | | | |
JEEP | 66 | 124 | | Y | |
HONDA | 34 | 142 | 6 | | Y |
KIA | 12 | 962 | | | |
GM | 89 | 7787 | 8 | | Y |
CHRYSLER | 45 | 236 | | Y | |
AUDI | 67 | 4789 | | Y | |
Incorrect syntax, missing (SELECT before 'Y', and don't forget the matching closing bracket at the end.
UPDATE TABLE_1 TAB1
SET NM_CPR_COS_MAT = (SELECT 'Y'
FROM
TABLE_2 TAB2 JOIN
TABLE_1 TAB1 ON
TRIM(TAB1.OLD_CPR_NO) = TRIM(TAB2.OLD_CPR_NO)
WHERE
TRIM(TAB1.NAME) = TRIM(TAB2.NAME)
AND TRIM(TAB1.OLD_COS_NO) = TRIM(TAB2.OLD_COS_NO));

How to setup a header in a pivot (CrossTab) report (MS Report Designer)

I have the following table which I'd like to turn into a report:
ClientGroup | Product | Client | Quantity
-----------------------------------------
Gr1 | P1 | C1 | 10
Gr1 | P1 | C2 | 20
Gr1 | P1 | C3 | 30
Gr1 | P2 | C1 | 40
Gr1 | P2 | C2 | 50
Gr1 | P2 | C3 | 60
Gr2 | P1 | C4 | 70
Gr2 | P1 | C5 | 80
Gr2 | P1 | C6 | 90
Gr2 | P2 | C4 | 100
Gr2 | P2 | C5 | 110
Gr2 | P2 | C6 | 120
The report would have the following layout:
--------------------
| G1 |
--------------------
Client | P1 | P2 |
--------------------
C1 | 10 | 40 |
C2 | 20 | 50 |
C3 | 30 | 60 |
--------------------
Total | 60 |150 |
--------------------
| G2 |
--------------------
Client | P1 | P2 |
--------------------
C4 | 70 | 100 |
C5 | 80 | 110 |
C5 | 90 | 120 |
--------------------
Total | 240 | 330 |
--------------------
What I'm doing is to create a Matrix, add a row group on ClientGroup, a sub group row on Client, a column group on Product with Quantity as detail. In the designer it looks somewhat like this:
---------------------------------------------
| ClientGroup | Client | [Product] |
---------------------------------------------
| [ClientGroup] | [Client] | Sum([Quantity])|
---------------------------------------------
I then hide the ClientGroup column and it seems I'm almost there. What I can't figure out is how to have a header over the columns Client and [Product] displaying the current ClientGroup.
Is it possible? Any ideas?
You can get pretty close:
Set the Headings row to be hidden.
Right-click the [Client] cell and select Insert Row > Outside Group - Above, twice.
Copy [ClientGroup] into the left-hand cell on the first new row, and set the BorderStyle-Right of the cell to be None.
Select the right-hand cell on the first new row, and set the BorderStyle-Left and -Right of the cell to be None.
Copy the heading Client into the left-hand cell on the second new row.
Copy [Product] into the right-hand cell on the second new row.
Your report should look something like this in the designer:
--------------------------------------------------
| ClientGroup | Client | [Product] |
--------------------------------------------------
| [ClientGroup] | [ClientGroup] | |
| |---------------------------------
| | Client | [Product] |
| |---------------------------------
| | [Client] | Sum([Quantity])|
--------------------------------------------------
If you preview it, the results should be pretty close to the desired layout.

LINQ query to get result

I have Student list with below data
/*-----------------------
|Student |
-------------------------
| ID | Name | Dept |
-------------------------
| 101 | Peter | IT |
| 102 | John | IT |
| 103 | Ronald | Mech |
| 104 | Sam | Comp |
-----------------------*/
Other list say Extra with below data
/*----------------------
| StudentId | Dept |
------------------------
| 101 | Civil |
| 103 | Chemical |
----------------------*/
Now I want following result
/*-------------------------
|Student |
---------------------------
| ID | Name | Dept |
---------------------------
| 101 | Peter | Civil |
| 102 | John | IT |
| 103 | Ronald | Chemical |
| 104 | Sam | Comp |
-------------------------*/
Currently I have written below logic:
foreach(item in Extra)
{
//Search item in Student list
//Update it
}
I need more efficient way (don't want to use iteration) using LINQ.
Try something like this. Didn't test it though.
var query = from s in student
join e in extra on s.ID == e.StudentId
select new {s.ID, s.Name, (e.Dept != null) ? e.Dept:s.Dept};
LINQ is using iteration internally, too, so there is no performance benefit in using LINQ. Because LINQ has very general implementations, it might even be slower, because it can't make assumptions about your data, which you can make in your custom loop.

Resources