PIG: How to remove '::' in the column name - hadoop

I have a pig relation like below:
FINAL= {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray}
I am trying to store all columns for input_md5 relation to a hive table.
like all input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray not taking test_1:: type: chararray,test_2::name:chararray
is there any command in pig which filters only columns of input_md5.Something like below:
STORE= FOREACH FINAL GENERATE all input_md5::type .
I know that pig have :
FOREACH FINAL GENERATE all input_md5::type as type syntax, but i have many columns so I cannot use as in my code.
Because when i try:
STORE= FOREACH FINAL GENERATE input_md5::type .. bus_input_md5::name;
Pig throws an error:
org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <type> at position 1, found column <input_md5::type>
Thanks in advance,

Resolved this issue , below is the fix:
Create a relation with some filter condition as below:
DUMMY_RELATION= FILTER SOURCE_TABLE BY type== ''; (I took a column named type ,this can be filtered by any column in the table , all that matters is we need its schema)
FINAL_DATASET= UNION DUMMY_RELATION,SCHEMA_1,SCHEMA_2;
(this new DUMMY_RELATIONn should be placed 1st in the union)
Now you no more have :: operator And your column names would match hive table's column names, provided your source table (to DUMMY_RELATION) and target table have same column order.
Thanks to myself :)

I implemented Neethu's example this way. May have typos, but it shows how to implement this idea.
tableA = LOAD 'default.tableA' USING org.apache.hive.hcatalog.pig.HCatLoader();
tableB = LOAD 'default.tableB' USING org.apache.hive.hcatalog.pig.HCatLoader();
--load empty table
finalTable = LOAD 'default.finalTable' USING org.apache.hive.hcatalog.pig.HCatLoader();
--example operations that end up with '::' in column names
g = group tableB by (id);
j = JOIN tableA by id LEFT, g by group;
result = foreach j generate tableA::id, tableA::col2, g::tableB;
--union empty finalTable and result
result2 = union finalTable, result;
--bob's your uncle
STORE result2 INTO 'finalTable' USING org.apache.hive.hcatalog.pig.HCatStorer();
Thanks to Neethu!

Related

Updating rows with values from another table in ClickHouse

I have two tables, one with data about counties and another with data about states. Different states can sometimes have counties with the same exact name, so I am trying to populate a unique_name column in my counties table that is the concatenation of a county name and the abbreviation of the state where that county is located (e.g.: Honolulu County, HI).
I have come up with the following query:
ALTER TABLE counties
UPDATE unique_name =
(
SELECT concat(counties.name, ', ', states.name_abbr)
FROM counties
INNER JOIN states
ON counties.statefp = states.statefp
) WHERE unique_name = ''
However, I keep getting the following error:
DB::Exception: Unknown identifier: states.statefp, context: required_names: 'states.statefp' source_tables: table_aliases: private_aliases: column_aliases: public_columns: masked_columns: array_join_columns: source_columns: .
The inner query is working perfectly fine on its own, but I don't why this error is coming up when I try to do the update. Any ideas?
ClickHouse does not support dependent joins for ALTER TABLE UPDATE. Fortunately, there is a workaround. You have to create a special Join engine table for the update. Something like this:
CREATE TABLE states_join as states Engine = Join(ANY, LEFT, statefp);
INSERT INTO states_join SELECT * from states;
ALTER TABLE counties
UPDATE unique_name = concat(name, joinGet('states_join', 'name_abbr', statefp))
WHERE unique_name = '';
DROP TABLE states_join;
Note, it only works in 19.x versions.

How to compare the column names in one table to the values in another in impala

first is the main table and second is the lookup table.
I need to compare the column names of first table to the values in the second table and if a certain column name is found in any row of the second table then fetch some fields out of second table.
Is it possible to do it in impala?
Table 1
source |location |origin
----------+----------+-------
s1 |india |xxx
Table 2
extractedfrom|lct |lkp_value|map_value
-------------+----------+---------+---------
s1 |location |india |india_x
s1 |origin |xxx |yyyyyy
i need to have something like
final view required
source |location |origin |location_ll|origin_lkp
----------+----------+----------+-----------+----------
s1 |india |xxx |india_x |yyyyy
You should edit your post to be more specific about what you are trying to do and how you wish to join the tables.
The following query should work for you given the example you provided.
SELECT t1.source,
t1.location,
t1.origin,
t2_loc.map_value AS location_lkp,
t2_ori.map_value AS origin_lkp
FROM Table1 t1
JOIN Table2 t2_loc ON t1.source = t2_loc.extractedfrom
AND t1.location = t2_loc.lkp_value
JOIN Table2 t2_ori ON t1.source = t2_ori.extractedfrom
AND t1.origin = t2_ori.lkp_value
WHERE t2_loc.lct = 'location'
AND t2_ori.lct = 'origin'
The trick is that you join to Table2 multiple times - one for each column you wish to match upon.

how to join three tables in oracle Sql Developer

Table 1 : TOY_STORE
column names :Toy_store_id, Toy_store_name,city
Table 2 : TOY_DTLS
column names :Toy_Id,Toy_name,toy_price,toy_rating
Table 3: Toy_rel
column names : toy_id,toy_store_id,qty
How to display all Toy store names and Toy names?
You can try something like this
SELECT store.Toy_Store_name
, toy.Toy_name
, rel.qty
FROM toy_rel rel
JOIN toy_dtls toy ON rel.toy_id = toy.toy_id
JOIN toy_store store ON rel.toy_store_id = store.toy_store_id
;

Assignment in Hive query

I have below query in which i need to assign one table column value to another table column.
Query:
SELECT A.aval,B.bval,B.bval1 FROM A JOIN B ON (A.aval = B.bval)
How do I assign one table column value to another table column in Hive?
Have tried
SELECT A.aval,B.bval,B.bval1, A.aval = B.bval1 FROM A JOIN B ON (A.aval = B.bval)
In results:
A.aval = B.bval1, returning false since its not assigning to A.aval.
I guess you want to write in a table ?
So You have to create a table (for example C) which contains all the fields you need.
And then you do :
INSERT [OVERWRITE] INTO TABLE C
SELECT A.aval,B.bval,B.bval1, A.aval
FROM A
JOIN B ON (A.aval = B.bval)
The result of the select will be inserted in the table C
insert overwrite table c SELECT A.aval,B.bval,B.bval1 FROM A JOIN B ON (A.aval = B.bval)

QueryDSL: How to insert or update?

I'm trying to implement https://stackoverflow.com/a/16392399/14731 for a table called "Modules" using QueryDSL. Here is my query:
String newName = "MyModule";
QModules modules = QModules.modules;
BooleanExpression moduleNotExists = session.subQuery().
from(modules).where(modules.name.eq(newName)).notExists();
SimpleSubQuery<String> setModuleName = session.subQuery().
where(moduleNotExists).unique(Expressions.constant(newName));
long moduleId = session.insert(modules).set(modules.name, setModuleName).
executeWithKey(modules.id);
I am expecting this to translate into:
insert into modules(name)
select 'MyModule'
where not exists
(select 1 from modules where modules.name = 'MyModule')
Instead, I am getting:
NULL not allowed for column "NAME"; SQL statement:
insert into MODULES (NAME)
values ((select ?
from dual
where not exists (select 1
from MODULES MODULES
where MODULES.NAME = ?)))
where ? is equal to MyModule.
Why does QueryDSL insert from dual? I am expecting it to omit from altogether.
How do I fix this query?
For the insert into select form use
columns(...).select(...)
But your error suggests that the INSERT clause is valid, but semantically not what you want.
Using InsertClause.set(...) you don't get the conditional insertion you are aiming for.
In other words with
columns(...).select(...)
you map the full result set into an INSERT template and no rows will be inserted for empty result sets, but with
set(...)
you map query results to a single column of an INSERT template and null values will be used for empty results.

Resources