how to use bincode operator in group function in pig

how to use bincode operator in group function in pig - hadoop

I need to group below data on fname and lastname.
(fname,lname,id)
abc,xyz,I
abc,xyz,N
ppp,xxx,I
ppp,XXX,I
in id field i am expecting only 2 values i.e N or I so if I get both N and I for same fname,lname combination then I should use id as N else need to use value for id field as it is given in the group.
I am expecting below results:
abc,xyz,N
ppp,xxx,I
I have tried below code and its working fine
in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);
grp = group in by (fname,lname);
z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) >1 ? ('N') :BagToTuple(in.id))as id;
However now I need to check the values of id field instead of counts:
z = foreach grp generate FLATTEN(group) AS (fname,lname),((in.id == 'N' or in.id == 'I') ? ('N') :BagToTuple(in.id))as id;
however its giving below error:
(Name: Equal Type: null Uid: null)incompatible types in Equal Operator left hand side:bag :tuple(id:chararray) right hand side:chararray
however its giving below error:
Two inputs of BinCond must have compatible schemas. left hand side: #31:tuple(#32:chararray) right hand side: org.apache.pig.builtin.bagtotuple_3#35:tuple(id#36:int)
Please guide

You are loading a field that contains chars i.e. N,I into int column? Change the load statement where id column type is chararray.
in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);
grp = group in by (fname,lname);
z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) > 1 && in.id matches 'N') ? ('N') : in.id;

Related

Why isn't it using the index?

Hello kind people of the internet.
I am wrecking my head trying to figure out why the optimiser isn't using my index for my query on Amazon Aurora. The query is dynamically created based on a report users have created through an applications UI, so I can't change the query per se.
The query uses these qualifiers
WHERE
table_in_question.deleted = 0
ORDER BY
table_in_question.date_modified DESC,
table_in_question.id DESC
I have an index, "my_index", which indexes these specific fields (table_in_question fields deleted, date_modified, ID) but MySQL doesn't use it.
The query takes approx 1200 ms to run. If I add FORCE INDEX (my_index) it takes about 120ms. Arguably about 10x faster - but unless I use force index, it doesn't use it.
Around 1 million rows are returned according to EXPLAIN, so I don't think it's a case of not using the index because of a low amount of rows being returned is the case.
The full query is
SELECT
case when some_table.id IS NOT NULL then some_table.id else "" end my_favorite,
table_in_question.date_entered,
table_in_question.name,
table_in_question.description,
table_in_question.pr_is_read,
table_in_question.pr_is_approved,
table_in_question.parent_type,
table_in_question.parent_id,
table_in_question.id,
table_in_question.date_modified,
table_in_question.assigned_user_id,
table_in_question.created_by
FROM
table_in_question
INNER JOIN (
SELECT
tst.team_user_is_member_of
FROM
team_sets_teams tst
INNER JOIN team_memberships team_membershipstable_in_question ON (
team_membershipstable_in_question.team_id = tst.team_id
)
AND (team_membershipstable_in_question.user_id = 'UUID')
AND (team_membershipstable_in_question.deleted = 0)
GROUP BY
tst.team_user_is_member_of
) table_in_question_tf ON table_in_question_tf.team_user_is_member_of = table_in_question.team_user_is_member_of
LEFT JOIN systemfavourites sf_table_in_question ON (sf_table_in_question.module = 'table_in_question')
AND (sf_table_in_question.record_id = table_in_question.id)
AND (sf_table_in_question.assigned_user_id = 'UUID')
AND (sf_table_in_question.deleted = '0')
INNER JOIN opportunities jt1_table_in_question ON (table_in_question.opportunity_id = jt1_table_in_question.id)
AND (jt1_table_in_question.deleted = 0)
LEFT JOIN another_table jt1_table_in_question_cstm ON jt1_table_in_question_cstm.id_c = jt1_table_in_question.id
LEFT JOIN systemfavourites table_in_question_favorite ON (table_in_question.id = table_in_question_favorite.record_id)
AND (table_in_question_favorite.deleted = '0')
AND (table_in_question_favorite.module = 'table_in_question')
AND (table_in_question_favorite.created_by = 'UUID')
LEFT JOIN users some_table ON (
some_table.id = table_in_question_favorite.modified_user_id
)
AND (some_table.deleted = 0)
WHERE
table_in_question.deleted = 0
ORDER BY
table_in_question.date_modified DESC,
table_in_question.id DESC
;
EXPLAIN shows this
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
table_in_question
ALL
idx_table_in_question_tmst_id
968234
10.0
Using where; Using temporary; Using filesort
Can anyone help explain how I make an index it will actually use by default?
Thanks.

Filter records in Pig

Below is the data
col1,col2,col3,col4,col5
------------------------
10,20,30,40,dollar
20,30,40,50,dollar
20,30,10,50,dollar
61,62,63,64,dollar
61,62,63,64,pound
col1,col2,col3 will form the combination of unique keys. The use case is to filter the data based on col5.
For the unique key combination we need to filter the record where col5 value is "dollar", only if the same combination has "pound" value.
The expected output is
col1,col2,col3,col4,col5
------------------------
10,20,30,40,dollar
20,30,40,50,dollar
20,30,10,50,dollar
61,62,63,64,pound
How to proceed further since there is no special operators in Pig like Hive.
A = load 'test1.csv' using PigStorage(',') as (col1:int,col2:int,col3:int,col4:int,col5:chararray);
B = FILTER A BY col5 == 'pound';

Get all the records with 'pound', then get all records with 'dollar' that does not match with the id combination with 'pound' in col5. Finally, marry them off ... UNION.
B = FILTER A BY col5 == 'pound';
C = JOIN A BY (col1,col2,col3) LEFT OUTER,B BY (col1,col2,col3);
D = FILTER C BY (B::col1 is null);
E = FOREACH D GENERATE A::col1,A::col2,A::col3,A::col4,A::col5;
F = UNION B,E;
DUMP F;
Output

ORA-00918 returns from stored procedure but it works executing a query in SQL Page

I'm trying return a list from db but it gives me Error "ORA-00918: column ambiguously defined".
When I execute this query inside new SQL page, it returns true list. However, when I write it in a package as stored procedure, it returns ORA-00918 and package goes invalid status.
What is the reason for this difference?
select distinct c.customer_no, m.title, c.group_id, g.name, c.pricelist_id, p.name from db.customer c
join db.pricelist p on c.pricelist_id = p.pricelist_id
join db.master m on c.customer_no = m.customer_no
join db.group g on c.group_id = g.id
where (c.customer_no = pn_customer_no or pn_customer_no=-1)
and (c.group_id = pn_group_no or pn_group_no=-1)
and (c.pricelist_id = pn_pricelist_no or pn_pricelist_no=-1)
and (c.kom_type = ps_kom_tip)
order by c.customer_no asc

You are selecting the columns:
select distinct
c.customer_no,
m.title,
c.group_id,
g.name, -- NAME column
c.pricelist_id,
p.name -- NAME column
When you run the query in SQL/Plus or SQL Developer (or another IDE) it will output the columns:
CUSTOMER_NO TITLE GROUP_ID NAME PRICELIST_ID NAME1
and will rename the second NAME column to NAME1.
In the PL/SQL scope, it will not do this and will try to handle the two columns with the names you have given (i.e. the same names), fail and return ORA-00918.
You need to give one (or both) column an alias so they have distinct names.

New SQL page assigns your dublicate columns new temporary column names.
But stored procedures add your values a list matched column names.
Therefore, two columns have same names, it confuses which name should desired name.
Like bundle, your column name will be key to learn value and value will be value.
You should change one of them p.name or g.name or both of them.
select distinct c.customer_no, m.isim_unvan, c.group_id, g.name as groupName, c.pricelist_id, p.name as tarifeName from db.customer c
join db.pricelist p on c.pricelist_id = p.pricelist_id
join db.master m on c.customer_no = m.musteri_no
join db.group g on c.group_id = g.id
where (c.customer_no = pn_customer_no or pn_customer_no=-1)
and (c.group_id = pn_group_no or pn_group_no=-1)
and (c.pricelist_id = pn_pricelist_no or pn_pricelist_no=-1)
and (c.kom_type = ps_kom_tip)
order by c.customer_no asc

Hibernate HQL GroupBy in Oracle

I created this query using HQL with Hibernate and Oracle
select c from Cat c
left join c.kittens k
where (c.location= 1 OR c.location = 2)
and (i.activo = 1)
group
by c.id,
c.name,
c.fulldescription,
c.kittens
order by count(e) desc
The problem comes with the fact that in HQL you need to specify all fields in Cat in order to perform a Group By, but fulldescription is a CLOB, and you cannot group by by a CLOB (I get a "Not a Group By Expression" error. I've seen a few solutions around for a pure SQL sentence but none for HQL.

A serious issue GROUP BY of HQL because if you specify your object in GROUP BY and in your SELECT field list behaviours are differents. In GROUP BY has considered only id field but in SELECT field list all fields are considered.
So you can use a subquery with GROUP BY to return only id from your object, so that result becomes an input for the main query, like the follow I write for you.
Pay attention there are some alias table (i and e) not defined, so this query doesn't work, but you know as fixed.
Try this:
select c2 from Cat c2
where c2.id in (
select c.id from Cat c
left join c.kittens k
where (c.location= 1 OR c.location = 2)
and (i.activo = 1) <-- who is i alias??
group by c.id)
order by count(e) desc <-- who is e alias???

How do I write a LINQ query to combine multiple rows into one row?

I have one table, 'a', with id and timestamp. Another table, 'b', has N multiple rows referring to id, and each row has 'type', and "some other data".
I want a LINQ query to produce a single row with id, timestamp, and "some other data" x N. Like this:
1 | 4671 | 46.5 | 56.5
where 46.5 is from one row of 'b', and 56.5 is from another row; both with the same id.
I have a working query in SQLite, but I am new to LINQ. I dont know where to start - I don't think this is a JOIN at all.
SELECT
a.id as id,
a.seconds,
COALESCE(
(SELECT b.some_data FROM
b WHERE
b.id=a.id AND b.type=1), '') AS 'data_one',
COALESCE(
(SELECT b.some_data FROM
b WHERE
b.id=a.id AND b.type=2), '') AS 'data_two'
FROM a first
WHERE first.id=1
GROUP BY first.ID

you didn't mention if you are using Linq to sql or linq to entities. However following query should get u there
(from x in a
join y in b on x.id equals y.id
select new{x.id, x.seconds, y.some_data, y.type}).GroupBy(x=>new{x.id,x.seconds}).
Select(x=>new{
id = x.key.id,
seconds = x.Key.seconds,
data_one = x.Where(z=>z.type == 1).Select(g=>g.some_data).FirstOrDefault(),
data_two = x.Where(z=>z.type == 2).Select(g=>g.some_data).FirstOrDefault()
});
Obviously, you have to prefix your table names with datacontext or Objectcontext depending upon the underlying provider.

What you want to do is similar to pivoting, see Is it possible to Pivot data using LINQ?. The difference here is that you don't really need to aggregate (like a standard pivot), so you'll need to use Max or some similar method that can simulate selecting a single varchar field.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to use bincode operator in group function in pig - hadoop

Related

Why isn't it using the index?

Filter records in Pig

ORA-00918 returns from stored procedure but it works executing a query in SQL Page

Hibernate HQL GroupBy in Oracle

How do I write a LINQ query to combine multiple rows into one row?

Categories

Resources