HBase getting all timestamped values for a cell - hadoop

i have the following scenario in my hbase instance
hbase(main):002:0> create 'test', 'cf'
0 row(s) in 1.4690 seconds
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1480 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0070 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0120 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value4'
0 row(s) in 0.0070 seconds
Now if you will see, the last two inserts are for the same column family, same column and same key. But if i understand hbase properly cf:c+row3 represent a cell which will have all timestamped versions of inserted value.
But a simple scan return only recent value
hbase(main):010:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1317945279379, value=value1
row2 column=cf:b, timestamp=1317945285731, value=value2
row3 column=cf:c, timestamp=1317945301466, value=value4
3 row(s) in 0.0250 seconds
How do i get all timestamped values for a cell, or how to perform time range based query?

In order to see versions of a column you need to give the version count.
scan 'test', {VERSIONS => 3}
will give you 2 versions of columns if they are available. you can use it in get aswell :
get 'test', 'row3', {COLUMN => 'cf:c', VERSIONS => 3}
for getting the value of a spesific time you can use TIMESTAMP aswell.
get 'test', 'row3', {COLUMN => 'cf:c', TIMESTAMP => 1317945301466}
if you need to get values "between" 2 timestamps you should use TimestampsFilter.

To change the number of versions allowed in a column family use the following command:
alter 'test', NAME=>'cf', VERSIONS=>2
then add another entry:
put 'test', 'row1', 'cf:a2', 'value1e'
then see the different versions:
get 'test', 'row1', {COLUMN => 'cf:a2', VERSIONS => 2}
would return something like:
COLUMN CELL
cf:a2 timestamp=1457947804214, value=value1e
cf:a2 timestamp=1457947217039, value=value1d
2 row(s) in 0.0090 seconds
Here is a link for more details:
https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/.

The row key 'row3' of cf:c for value4 should be unique otherwise it gets overwritten:
hbase(main):052:0> scan 'mytable' , {COLUMN => 'cf1:1', VERSION => 3}
ROW COLUMN+CELL
1234 column=cf1:1, timestamp=1405796300388, value=hello
1 row(s) in 0.0160 seconds
hbase(main):053:0> put 'mytable', 1234, 'cf1:1', 'wow!'
0 row(s) in 0.1020 seconds
Column 1 of cf1 having a value of 'hello' is overwritten by second put with same row key 1234 and a value of 'wow!'
hbase(main):054:0> scan 'mytable', {COLUMN => 'cf1:1', VERSION => 3}
ROW COLUMN+CELL
1234 column=cf1:1, timestamp=1405831703617, value=wow!
2 row(s) in 0.0310 seconds
Now the second insert contained a new value 'hey' for column 1 of cf1 and the scan query for last 3 versions now shows 'wow!' and 'hey', please not the versions are displayed on descending order.
hbase(main):055:0> put 'mytable', 123, 'cf1:1', 'hey'
hbase(main):004:0> scan 'mytable', {COLUMN => 'cf1:1', VERSION => 3}
ROW COLUMN+CELL
123 column=cf1:1, timestamp=1405831295769, value=hey
1234 column=cf1:1, timestamp=1405831703617, value=wow!

Related

How to SUM a column in an oracle apex collection

So I am trying to output the total of a column from a collection.
This is the first query I tried
select sum(C007),sum(C007) A FROM APEX_COLLECTIONS WHERE COLLECTION_NAME='PURCHASE'
This is the second query I tried
select sum(C005*C007),sum(C005*C007) A FROM APEX_COLLECTIONS WHERE COLLECTION_NAME='PURCHASE'
Both produce the same result which list out all the values in the column insted of suming them
Expected Results:
10
Actual Results:
2
2
2
2
2
Please help
Looks like you did something wrong.
I created sample page; it contains a button (which will just submit the page) and an item which will display total (sum of collection's values). Item gets populated by a process which contains everything (for simplicity):
if not apex_collection.collection_exists('PURCHASE') then
apex_collection.create_collection('PURCHASE');
end if;
apex_collection.add_member(
p_collection_name => 'PURCHASE',
p_c001 => 'Little',
p_c007 => 100); --> 100 ...
apex_collection.add_member(
p_collection_name => 'PURCHASE',
p_c001 => 'Foot',
p_c007 => 200); --> ... + 200 = 300
select sum(c007)
into :P7_TOTAL
from apex_collections
where collection_name = 'PURCHASE';
When ran (and after button was pressed), item's value is - as expected - 300.

Oracle: Selecting Records from Hierarchical Data where Child ID=n

I have a location table with a unary relationship, and am attempting to set up a view for use as a join in other tables referencing the location table to get the full path up to the root location value. However, after setting up a view with a hierarchical query, I am getting a "When using SYS_CONNECT_BY_PATH function, cannot have separator as part of the column" error message whenever I try to join on the primary key or use that column in the WHERE clause.
Grossly oversimplified example of the table;
LOCATION_TBL
LOCATION_ID NAME TYPE PARENT_ID [etc. . .]
----------- --------------- --------- ---------
1 'United States' 'Country' NULL
2 'France' 'Country' NULL
3 'Washington' 'Region' 1
4 'Normandie' 'Region' 2
5 'Seattle' 'City' 3
6 'Rouen' 'City' 4
The create view statement;
CREATE VIEW v_locationPath AS (
SELECT location_id
SYS_CONNECT_BY_PATH(name,'/') AS path
FROM location_tbl
START WITH parent_id IS NULL
CONNECT BY PRIOR location_id=parent_id
);
Selecting directly from the view with no WHERE clause returns the expected result;
SELECT location_id,path FROM v_locationPath;
LOCATION_ID PATH
----------- -----------------------------------
1 '/United States'
2 '/France'
3 '/United States/Washington'
4 '/France/Normandie'
5 '/United States/Washington/Seattle'
6 '/France/Normandie/Rouen'
However if I try to select a single record from the view, limited by a location id value
SELECT location_id,path FROM v_locationPath WHERE location_id=3;
I receive the error. I have double checked, and none of the name values contain the separator being used ('/' in this example).
The following queries have also returned the same error;
SELECT a.location_id,b.path
FROM location_tbl a
JOIN v_locationPath b ON a.location_id=b.location_id
WHERE a.location_id=3;
------------------------------------------------
WITH limitedLocations AS (
SELECT location_id
FROM location_tbl
WHERE location_id=3
)
SELECT a.location_id,b.path
FROM limitedLocations a
JOIN v_locationPath b ON a.location_id=b.location_id;
I have also tried encapsulating the hierarchical query of the view as a subquery in the view itself;
CREATE VIEW v_locationPath AS (
SELECT location_id,path
FROM (
SELECT location_id
SYS_CONNECT_BY_PATH(name,'/') AS path
FROM location_tbl
START WITH parent_id IS NULL
CONNECT BY PRIOR location_id=parent_id
)
);
Attempting the same select statements all return the same error message. Fiddling around, I was able to get a result to be returned, with the value I would expect, however, a record was returned for each row in the location table;
WITH limitedLocations AS (
SELECT 3 AS location_id
FROM location_tbl
)
SELECT a.location_id,b.path
FROM limitedLocations a
JOIN v_locationPath b ON a.location_id=b.location_id;
-Returned-
LOCATION_ID PATH
----------- ---------------------------
3 '/United States/Washington'
3 '/United States/Washington'
3 '/United States/Washington'
3 '/United States/Washington'
3 '/United States/Washington'
3 '/United States/Washington'
I'm a bit stymied, the error message itself doesn't seem to make any sense, since the location_id isn't in the SYS_CONNECT_BY_PATH column, nor do any of the name values contain the separator value.
--Edit--
Found the problem: not an issue with the query at all (at least not the structure). Turns out there was a record that contained the separator value in the name column.
The data would more accurately look like this:
LOCATION_TBL
LOCATION_ID NAME TYPE PARENT_ID [etc. . .]
----------- --------------- --------- ---------
1 'United States' 'Country' NULL
2 'France' 'Country' NULL
3 'Washington' 'Region' 1
4 'Normandie' 'Region' 2
5 'Seattle' 'City' 3
6 'Rouen' 'City' 4
... ... ... ...
4500 'Blighter/' 'City' 3
When I was testing the Select statement for the view in Oracle SQL Developer, the program was executing the query, while only returning the first 50 or so rows. Since it didn't throw any error, I erroneously assumed that all of the records were fine.
Once I actually ran a query to check for the existence of the separator in the name column:
SELECT * FROM location_tbl WHERE name LIKE '%/%';
I found the errant record. After swapping the separator argument for one not found in the location table, the queries worked fine.
As for SQL Developer not throwing the error, when I went back and ran the original select statement again, I did eventually get an error by scrolling down the results table, until it tried to return the record containing the separator value. The WHERE location_id=n, must have just been forcing it to look through the entire result set before returning any records.
As I explained in my edit, the problem was a name value containing the separator value '/', I didn't catch it because Oracle SQL Developer only returns a subset of a query's results unless you scroll through them.

Hbase put command not working for column names

I have to put below 2 rows in my hbase table :
put 'TABLE', 'ABC::ABC::NLOC','data:document','myvalue'
put 'TABLE', 'ABC::ABC::NLOC','data:meta:test','values'
But after executing this command , i am unable to see the 2 nd command creating a column data:meta:test.
hbase(main):003:0> get 'TABLE', 'ABC::ABC::NLOC'
COLUMN CELL
data:document timestamp=1528398479692, value=profile data - POST!
data:meta timestamp=1528398532570, value=values
2 row(s) in 0.0220 seconds
How can i see the column as data:meta:test, should i use hbase put in a dieffernt way? any help please

How to get unique rows by column value using linq

I have values in table format
IEnumerable<IEnumerable<Data>> tableResult=new IEnumerable<IEnumerable<Data>>();
//below line doesn't work
tableResult = tableResult.Distinct(row => row("ID"));
tableResult contains value in table format, say(Dummy example). Record I am getting is IEnumerable<IEnumerable<Data>> is not table. Dont expect you will get column name
ID Name Designation
1 Sam Engg
3 Mos Doc
3 Peter Driver
4 Bob Builder
Expected result is : How to get unique rows by ID using linq.
ID Name Designation
1 Sam Engg
3 Mos Doc
4 Bob Builder
You can group by id and select the first item in each ID group.
Assuming your ID column is the first element in the inner IEnumerable:
tableResult.GroupBy(x => x.First()).Select(x => x.First());
If you want to have a more specific definition of what row you want to display when you have duplicate ID:s you replace .Select(x => x.First()) with .Select(x => x.OrderBy(y => y.Skip(n).First()).First()) where n is the column number to sort on.

HBase get returns old values even with max versions = 1

I have the desire to find the columns that have not been updated for more than a specific time period.
So I want to do a scan against the columns with a timerange.
The normal behaviour of HBase is that you then get the latest value in that time range (which is not what I want).
As far as I understand the way HBase should work is that if you set the maximum number of versions for the values in a column family to '1' it should retain only the last value that was put into the cell.
What I found is different.
If I do the following commands into the hbase shell
create 't1', {NAME => 'c1', VERSIONS => 1}
put 't1', 'r1', 'c1', 'One', 1000
put 't1', 'r1', 'c1', 'Two', 2000
put 't1', 'r1', 'c1', 'Three', 3000
get 't1', 'r1'
get 't1', 'r1' , {TIMERANGE => [0,1500]}
the result is this:
get 't1', 'r1'
COLUMN CELL
c1: timestamp=3000, value=Three
1 row(s) in 0.0780 seconds
get 't1', 'r1' , {TIMERANGE => [0,1500]}
COLUMN CELL
c1: timestamp=1000, value=One
1 row(s) in 0.1390 seconds
Why does the second query return a value eventhough I've set the max versions to only 1?
The HBase version I currently have installed here is HBase 0.94.6-cdh4.4.0
It turns out to be a bug in hbase.
https://issues.apache.org/jira/browse/HBASE-10102

Resources