I am creating a "temporary" (transition) table : "Abis" which must be the copy (identical structure) of a table "A" with the addition of data and the update of several fields via other tables more recent (B, C, D, E).
I have a primary key based on 2 fields in "A" (A.a and A.b) that is present in "Abis" (Abis.a and Abis.b) as well as in "B" (B.a and B.b).
I made a full join between A and B: A.a = B.a and A.b = B.b.
What mapping I have to put to feed my "Abis" table on Abis.a and Abis.b, recovering all key combinations of A (A.a + A.b) as well as all key combinations of B (B.a + B.b) that aren't present in A.
I tested with
"Case When A.a Not In B.a Than A.a Else B.a End"
But the query turns indefinitely.
To sum up:
Target Datastore: Abis
Diagram: A, B, C, D, E
Join: A.a = B.a and A.b = B.b (Full join)
Number of row: Table A ~ 6000, Table B ~ 40000
Software: ODI 10.1.3.5 (Oracle Data Integrator)
Thanks :)
Ok I almost solved my problem with the DECODE function
I tried the NVL function but it did not give exactly what I wanted.
The similar function to NVL is DECODE.
https://www.techonthenet.com/oracle/functions/decode.php
Mapping on Abis.a > Decode (A.a, 0, B.a, A.a)
Mapping on Abis.b > Decode (A.b, 0, B.b, A.b)
I want to Union/Merge two files using pig. But, this is a different union than a usual union. Following are my files (h* are header of files) :
F1 :
h1,h2,h3,h4
a01,a02,a03,a04
a11,a12,a13,a14
F2 :
h3,h4,h5,h6
a23,a24,b01,b02
a33,a34,b11,b12
The resulting output must be a Union of these files like this :
FR :
h1,h2,h3,h4,h5,h6
a01,a02,a03,a04,,
a11,a12,a13,a14,,
,,a23,a24,b01,b02
,,a33,a34,b11,b12
One more difficulty is I want to make it generic so that it works for dynamic number of common columns. Currently there are two common columns, it could have 3 or 1 common column or even no common column at all. For example :
F1 :
h1,h2,h3,h4
a1,a2,a3,a4
F2
h5,h6,h7,h8
b1,b2,b3,b4
FR
a1,a2,a3,a4
,,,,b1,b2,b3,b4
Any hint/help is appreciable.
Here is how you can do it statically:
F1full = FOREACH F1 GENERATE h1,h2,h3,h4, NULL as h5, NULL as h6;
F2full = FOREACH F2 GENERATE NULL as h1,NULL as h2,h3,h4, h5, h6;
FR = F1full UNION F2full;
Pig is not very flexible, so I don't think it is possible to generate this dynamically/for the generic case.
If you would want a solution for the generic case, you could use a language like python to build the required command based on metadata of stored tables/files.
I tried to solve the problem using following approach :
1) Load both of the files.
2) Add counter to generate a unique field (ID).
3) Start the counter for file B where counter for A ended.
4) Cogroup both files with common columns, including counteer.
5) Take all group columns in a different schema.
6) Generate uncommon columns from both files, along with the counter.
7) First join uncommon columns from file A with group columns on counter.
8) Join the result of step 7 with uncommon columns from file B on counter.
Following is the pig script to do the same. As this script is generic, I have mentioned what all parameters will be required before running the script.
-- Parameters required : $file1_path, $file2_path, $file1_schema, $file2_schema, $COUNT_A (number of rows in file A), $CMN_COLUMN_A (common columns in A), $CMN_COLUMN_B, $UNCMN_COLUMN_A(Unique columns in file A), $UNCMN_COLUMN_B.
A = LOAD '$file1_path' USING org.apache.pig.piggybank.storage.CSVExcelStorage('~', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as ($file1_schema);
B = LOAD '$file2_path' USING org.apache.pig.piggybank.storage.CSVExcelStorage('~', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as ($file2_schema);
RANK_A = RANK A;
RANK_B = RANK B;
COUNT_RANK_B = FOREACH RANK_B GENERATE ($0+(long)'$COUNT_A') as rank_B, $1 ..;
COGRP_RANK_AB = COGROUP RANK_A BY($CMN_COLUMN_A), COUNT_RANK_B BY ($CMN_COLUMN_B);
CMN_COGRP_RANK_AB = FOREACH COGRP_RANK_AB GENERATE FLATTEN(group) AS ($CMN_COLUMN_A);
UNCMN_RB = FOREACH COUNT_RANK_B GENERATE $UNCMN_COLUMN_B;
JOIN_CMN_UNCMN_A = JOIN CMN_COGRP_RANK_AB BY(rank_A) LEFT OUTER, UNCMN_RA by rank_A;
JOIN_CMN_UNCMN_B = JOIN JOIN_CMN_UNCMN_A BY(CMN_COGRP_RANK_AB::rank_A) LEFT OUTER, UNCMN_RB by rank_B;
STORE FINAL_DATA INTO '$store_path' USING org.apache.pig.piggybank.storage.CSVExcelStorage('~', 'NO_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');
I have a table which contain sample CDR data in that column A and column B having calling person and called person mobile number
I need to find whose having maximum number of calls made(column A)
and also need to find to which number(column B) called most
the table structure is like below
calling called
889578226 77382596
889582256 77382596
889582256 7736368296
7785978214 782987522
in the above table 889578226 have most number of outgoing calls and 77382596 is most called number in such a way need to get the output
in hive i run like below
SELECT calling_a,called_b, COUNT(called_b) FROM cdr_data GROUP BY calling_a,called_b;
what might be the equalent code for the above query in pig?
Anas, Could you please let me know this is what you are expecting or something different?
input.txt
a,100
a,101
a,101
a,101
a,103
b,200
b,201
b,201
c,300
c,300
c,301
d,400
PigScript:
A = LOAD 'input.txt' USINg PigStorage(',') AS (name:chararray,phone:long);
B = GROUP A BY (name,phone);
C = FOREACH B GENERATE FLATTEN(group),COUNT(A) AS cnt;
D = GROUP C BY $0;
E = FOREACH D {
SortedList = ORDER C BY cnt DESC;
top = LIMIT SortedList 1;
GENERATE FLATTEN(top);
}
DUMP E;
Output:
(a,101,3)
(b,201,2)
(c,300,2)
(d,400,1)
Scenario:
I have database table that stores the hierarchy of another table's many-to-many relationship. An item can have multiple children and can also have more than one parent.
Items
------
ItemID (key)
Hierarchy
---------
MemberID (key)
ParentItemID (fk)
ChildItemID (fk)
Sample hierarchy:
Level1 Level2 Level3
X A A1
A2
B B1
X1
Y C
I would like to group all of the child nodes by each parent node in the hierarchy.
Parent Child
X A1
A2
B1
X1
A A1
A2
B B1
X1
Y C
Notice how there are no leaf nodes in the Parent column, and how the Child column only contains leaf nodes.
Ideally, I would like the results to be in the form of IEnumerable<IGrouping<Item, Item>> where the key is a Parent and the group items are all Children.
Ideally, I would like a solution that the entity provider can translate in to T-SQL, but if that is not possible then I need to keep round trips to a minimum.
I intend to Sum values that exist in another table joined on the leaf nodes.
Since you are always going to be returning ALL of the items in the table, why not just make a recursive method that gets all children for a parent and then use that on the in-memory Items:
partial class Items
{
public IEnumerable<Item> GetAllChildren()
{
//recursively or otherwise get all the children (using the Hierarchy navigation property?)
}
}
then:
var items =
from item in Items.ToList()
group new
{
item.itemID,
item.GetAllChildren()
} by item.itemID;
Sorry for any syntax errors...
Well, if the hierarchy is strictly 2 levels you can always union them and let LINQ sort out the SQL (it ends up being a single trip though it needs to be seen how fast it will run on your volume of data):
var hlist = from h in Hierarchies
select new {h.Parent, h.Child};
var slist = from h in Hierarchies
join h2 in hlist on h.Parent equals h2.Child
select new {h2.Parent, h.Child};
hlist = hlist.Union(slist);
This gives you an flat IEnumerable<{Item, Item}> list so if you want to group them you just follow on:
var glist = from pc in hlist.AsEnumerable()
group pc.Child by pc.Parent into g
select new { Parent = g.Key, Children = g };
I used AsEnumerable() here as we reached the capability of LINQ SQL provider with attempting to group a Union. If you try it against IQueryable it will run a basic Union for eligable parents then do a round-trip for every parent (which is what you want to avoid). Whether or not its ok for you to use regular LINQ for the grouping is up to you, same volume of data would have to come through the pipe either way.
EDIT: Alternatively you could build a view linking parent to all its children and use that view as a basis for tying Items. In theory this should allow you/L2S to group over it with a single trip.
I don't see a lot of examples on how to persist with linq/flinq- I may ultimately write a proc to dowhat I need it to, however the 1->* relationship between tableA and tableC makes that tricky. Can you persist with flinq? Is there a example published somewhere I could follow? Below is what I have tried (or rather the most logical variant of what I have tried).
Thank you in advance.
TableA (1) -> (1) TableB
TableA (1) -> (*) TableC
// add the report
let b = TableB()
b.Name <- getName()
// add the authors
let authorSet = Data.Linq.EntitySet<TableC>()
getAuthorIds document.Authors |> Seq.iter
(fun id ->
let c = TableC()
c.Id <- id
authorSet.Add c)
// add the tagged report w/ associated reoprt
let a = TableA()
a.field1 <- "Something"
a.tableB = b
a.TableC <- authorSet
let docSet = Data.Linq.EntitySet<TableA>()
docSet.Add doc
db.TableA.InsertAllOnSubmit([doc])
let cf = db.ChangeConflicts
let cm = db.GetChangeSet
I am not sure what exactly you are trying to achieve and what fails, but at least one piece that is missing from your code is this call:
db.SubmitChanges()
In general, types TableA, TableB etc represent one row in the database, and type Table<TableA>, Table<TableB> etc represent the whole database. So if you want to add a row to a database TableA, you do something along the lines of
let rowA = TableA(Name = "Foo", Salary = 42)
db.TableA.InsertOnSubmit(rowA)
db.SubmitChanges()
You can of course have more changes in object model before you do SubmitChanges (i.e. you can and must add all relevant foreign-keyed entities with 1:1 relationship).
There is not much F#-specific here - MSDN is a great source for information on this, see e.g. http://msdn.microsoft.com/en-us/library/bb386941.aspx.