Hbase put command not working for column names - hadoop

I have to put below 2 rows in my hbase table :
put 'TABLE', 'ABC::ABC::NLOC','data:document','myvalue'
put 'TABLE', 'ABC::ABC::NLOC','data:meta:test','values'
But after executing this command , i am unable to see the 2 nd command creating a column data:meta:test.
hbase(main):003:0> get 'TABLE', 'ABC::ABC::NLOC'
COLUMN CELL
data:document timestamp=1528398479692, value=profile data - POST!
data:meta timestamp=1528398532570, value=values
2 row(s) in 0.0220 seconds
How can i see the column as data:meta:test, should i use hbase put in a dieffernt way? any help please

Related

How to import data from a hbase table to hive table?

I've created a Hbase table like this,
create 'student','personal'
and I've put some data into it like this.
ROW COLUMN+CELL
1 column=personal:age, timestamp=1456224023454, value=20
1 column=personal:name, timestamp=1456224008188, value=pesronA
2 column=personal:age, timestamp=1456224891317, value=13
2 column=personal:name, timestamp=1456224868967, value=pesronB
3 column=personal:age, timestamp=1456224935178, value=21
3 column=personal:name, timestamp=1456224921246, value=personC
4 column=personal:age, timestamp=1456224951789, value=20
4 column=personal:name, timestamp=1456224961845, value=personD
5 column=personal:age, timestamp=1456224983240, value=20
5 column=personal:name, timestamp=1456224972816, value=personE
-
I want to import this data to a hive table. I wrote a hive query for that like this
CREATE TABLE hbaseStudent(key INT,name STRING,age INT) STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal:age,personal:name") TBLPROPERTIES("hbase.table.name" = "student")
But when I execute the query error comes out like this.
Driver returned: 1. Errors: OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hbase/HBaseConfiguration
what should i do?
I tried this thing and it worked try replacing all the double quotes (") with single quotes ('). It will work & also try to add terminator ; in last line.

number of rows affected in HiveQL

is there a way to get the rows affected count after running CTAS in hive?
I am running a
create table t1 as select * from t2 where ... ;
Basically , I would like to print the num of rows in new table for logging purposes.
Thanks!
Hive does report number of rows affected as part of CTAS: see example here:
Table default.errors2 stats: [num_partitions: 0, num_files: 1, num_rows: 860, total_size: 17752, raw_data_size: 16892]
More details of the output:
hive> create table errors2 as select * from errors;
..
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/tmp/hive-steve/hive_2014-12-13_06-00-40_553_7396982929134959624/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/errors2
Table default.errors2 stats: [num_partitions: 0, num_files: 1, num_rows: 860, total_size: 17752, raw_data_size: 16892]
OK
dayandhour dowandhour cnt
Time taken: 7.348 seconds
UPDATE OP asked about saving the rowcount in a variable. There is not a builtin hive command AFAIK. You could however run the command from the command line via
hive -e "<hivesql>" | grep "[num_partitions]" | <regex command to isolate the num_rows>

Complex Data Type issue in Hive

I am trying to create a table in Hive using complex data types.
One of my column is an array of strings and the other is an array of maps.
After I have loaded the data into the table, when I try to query the data, I don't get the desired result in the third column which is an array of maps.
The following is my Hive query:
Step 1:
create table transactiondb2(order_id int,billtype array<string>,paymenttype array<map<string,int>>)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '#';
Step 2:
load data local inpath '/home/xyz/data.txt' overwrite into table transactiondb2;
Step 3:
select * from transactiondb2;
And my output is as follows:
OK
1 ["A","B"] [{"credit":null,"10":null},{"cash":null,"25":null},{"emi":null,"30":null}]
2 ["C","D"] [{"credit":null,"157":null},{"cash":null,"45":null},{"emi":null,"35":null}]
3 ["X","Y"] [{"credit":null,"25":null},{"cash":null,"38":null},{"emi":null,"50":null}]
4 ["E","F"] [{"credit":null,"89":null},{"cash":null,"105":null},{"emi":null,"85":null}]
5 ["Z","A"] [{"credit":null,"7":null},{"cash":null,"79":null},{"emi":null,"105":null}]
6 ["D","Y"] [{"credit":null,"30":null},{"cash":null,"100":null},{"emi":null,"101":null}]
7 ["A","Z"] [{"credit":null,"50":null},{"cash":null,"9":null},{"emi":null,"85":null}]
8 ["B","Z"] [{"credit":null,"70":null},{"cash":null,"38":null},{"emi":null,"90":null}]
And my input file data is as follows:
1 A|B credit#10|cash#25|emi#30
2 C|D credit#157|cash#45|emi#35
3 X|Y credit#25|cash#38|emi#50
4 E|F credit#89|cash#105|emi#85
5 Z|A credit#7|cash#79|emi#105
6 D|Y credit#30|cash#100|emi#101
7 A|Z credit#50|cash#9|emi#85
8 B|Z credit#70|cash#38|emi#90
I solved it myself.
We need not mention an array of maps explicitly by default it takes values from one map after the other
Create the table as shown below and load the data, then you will get the desired output.
create table complex(id int,bill array<string>,paytype map<string,int>)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '#';

Update One table Column with Values from Another table Having Similar

Hi Guys I have Two tables (MIGADM.CORPMISCELLANEOUSINFO and CRMUSER.PREFERENCES) and Each Has a field called PREFERENCE_ID and ORGKEY. I want to Update the Preference ID for MIGADM.CORPMISCELLANEOUSINFO with Preference_ID from CRMUSER.PREFERENCES for Each Corresponding ORGKEY. SO I wrote this Query;
update migadm.CORPMISCELLANEOUSINFO s set s.PREFERENCE_ID = (
select e.PREFERENCE_ID from crmuser.preferences e where s.ORGKEY = e.ORGKEY)
But I get:
ORA-01427: single-row subquery returns more than one row
What Should I do?
It means the columns you have selected are not unique enough to identify one row in your source table. Your first step would be to identify those columns.
To see the set of rows that have this problem, run this query.
select e.origkey,
count(*)
from crmuser.preferences e
group by e.origkey
having count(*) > 1
eg : for origkey of 2, let's say there are two rows in the preferences table.
orig_key PREFERENCE_ID
2 202
2 201
Oracle is not sure which of these should be used to update the preference_id column in CORPMISCELLANEOUSINFO
identify the row where the subquery returns more than one row (You could use REJECT ERROR clause to do it for instance) or use the condition 'where rownum = 1'.

facing problems while updating rows in hbase

I've run samples : SampleUploader,PerformanceEvaluation and rowcount as given in
hadoop wiki: http://wiki.apache.org/hadoop/Hbase/MapReduce
The problem I'm facing is : table1 is my table with the column family column
>create 'table1','column'
>put 'table1','row1','column:address','SanFrancisco'
hbase(main):020:0> scan 'table1'
ROW COLUMN+CELL
row1 column=column:address, timestamp=1276351974560, value=SanFrancisco
>put 'table1','row1','column:name','Hannah'
hbase(main):020:0> scan 'table1'
ROW COLUMN+CELL
row1 column=column:address,timestamp=1276351974560,value=SanFrancisco
row1 column=column:name, timestamp=1276351899573, value=Hannah
I want both the columns to appear in the same row as a different version
similary,
if i change the name column to sarah, it shows the updated row.... but i want both the old row and the changed row to appear as 2 different versions so that i could make analysis on the data........
what is the mistake im making????
thank u a lot
sammy
To see multiple versions of the same row, you need to specify a VERSIONS option:
get 'my_table', 'my_row_key', {VERSIONS -> 4}
When the hbase shell prints out
row1 column=column:address,timestamp=1276351974560,value=SanFrancisco
row1 column=column:name, timestamp=1276351899573, value=Hannah
That's a single row with multiple columns. The text representation just happens to use multiple lines of text, one per column.

Resources