How to get specific rows in Hbase?

How to get specific rows in Hbase? - hadoop

My rowKeys in HBase like this;
a1s1
a1s2
a1s3
a2s1
a3s1
a3s2
...
I want to get only these data;
a1s1
a2s1
a3s1
But when I run thise query; scan 't1', {STARTROW=>'a1s1', ENDROW=>'a4s1'}
It gives me;
a1s1
a1s2
a1s3
a2s1
a3s1
But I don't want to get a1s2 and a1s3. How can I do this?

You should use STARTROW-ENDROW and another filter with RegexStringComparator. If you use only start-end row filter, hbase performs this filtration for each character in your rowkey. Because rowkey is not numeric. In Hbase shell you can try this:
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.RegexStringComparator
scan 't1', {STARTROW => 'a1s1', ENDROW => 'a4s1', FILTER => org.apache.hadoop.hbase.filter.RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new("s1$"))}

I assume, you want to get the row key starting with "a*" and ending with "s1".
So either you can use below:
scan 't1', { ENDROW=>'s1'}
Or
scan 't1', {STARTROW=>'a', ENDROW=>'s1'}
Another option is using regexString:
scan 't1', {FILTER => "RowFilter(=, 'regexstring:*s1')"}

Related

How to match exact Id within Comma seperated database column using LINQ and Lambda

Hi I have SQL Table where I am storing values like this:
Column Name: Registration_ID
180,1801,1803,18011,220
180,1801,
180,1801,1803
No I want to match exact Registration_ID and get records based on the Registration_ID. I have tried Contains but is not matching exact values.
Here is my query:
var Result=db.Entity_StudentRepository.Get(x =>
x.Registration_ID.Contains(Used_For_Id.ToString())).Select(x => x.Registration_ID).ToArray();

Could you please try the following query and let know if it works-
db.Entity_StudentRepository.AsEnumerable().Where(t=> Registration_ID.Split(',').Select(int.Parse).Contains(Used_For_Id));

Hbase Scan returning data out of range

I was doing a scan using startRowKey and StopRowKey in HBase scan using HBase shell, but the output what I am receiving is outside the range passed. Please refer the Hbase Query -
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
scan 'TableName',{ LIMIT => 2 , STARTROW => '000|9223370554721275807', STOPROW => '101|9223370554727575807', FILTER => SingleColumnValueFilter.new(Bytes.toBytes('col_family'), Bytes.toBytes('col_qualifier'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('Some Value')), COLUMNS => 'col_family:col_qualifier', REVERSED => false}
But the out what is received is outside this range -
016|9223370554960173487
021|9223370555154148992
Please let me know is my search query is correct or what could be the root cause for this?? Any help will be really appreciated.
Thanks

If you put the four rowkeys mentioned in your question in a file and sort them the result will be:
000|9223370554721275807
016|9223370554960173487
021|9223370555154148992
101|9223370554727575807
Thus the values you received are not outside the range of your scan.

Get full rows based on single column distinct

Actually I want to get the full table but it should be based on Doc_Type==distinct
Means it should only pick the records from table that has unique Doc_Type. I have tried with following but it returns me a single column into tolist() but I want to get full table. How can I do it?
var data = DB.tblDocumentTypes.Select(m => m.Doc_Type).Distinct().ToList();

You can use GroupBy
var data = DB.tblDocumentTypes.GroupBy(m => m.Doc_Type).Select(x => x.First());

RethinkDB index for filter + orderby

Lets say a comments table has the following structure:
id | author | timestamp | body
I want to use index for efficiently execute the following query:
r.table('comments').getAll("me", {index: "author"}).orderBy('timestamp').run(conn, callback)
Is there other efficient method I can use?
It looks that currently index is not supported for a filtered result of a table. When creating an index for timestamp and adding it as a hint in orderBy('timestamp', {index: timestamp}) I'm getting the following error:
RqlRuntimeError: Indexed order_by can only be performed on a TABLE. in:

This can be accomplished with a compound index on the "author" and "timestamp" fields. You can create such an index like so:
r.table("comments").index_create("author_timestamp", lambda x: [x["author"], x["timestamp"]])
Then you can use it to perform the query like so:
r.table("comments")
.between(["me", r.minval], ["me", r.maxval]
.order_by(index="author_timestamp)
The between works like the get_all did in your original query because it gets only documents that have the author "me" and any timestamp. Then we do an order_by on the same index which orders by the timestamp(since all of the keys have the same author.) the key here is that you can only use one index per table access so we need to cram all this information in to the same index.

It's currently not possible chain a getAll with a orderBy using indexes twice.
Ordering with an index can be done only on a table right now.
NB: The command to orderBy with an index is orderBy({index: 'timestamp'}) (no need to repeat the key)

The answer by Joe Doliner was selected but it seems wrong to me.
First, in the between command, no indexer was specified. Therefore between will use primary index.
Second, the between return a selection
table.between(lowerKey, upperKey[, {index: 'id', leftBound: 'closed', rightBound: 'open'}]) → selection
and orderBy cannot run on selection with an index, only table can use index.
table.orderBy([key1...], {index: index_name}) → selection<stream>
selection.orderBy(key1, [key2...]) → selection<array>
sequence.orderBy(key1, [key2...]) → array

You want to create what's called a "compound index." After that, you can query it efficiently.
//create compound index
r.table('comments')
.indexCreate(
'author__timestamp', [r.row("author"), r.row("timestamp")]
)
//the query
r.table('comments')
.between(
['me', r.minval],
['me', r.maxval],
{index: 'author__timestamp'}
)
.orderBy({index: r.desc('author__timestamp')}) //or "r.asc"
.skip(0) //pagi
.limit(10) //nation!
I like using two underscores for compound indexes. It's just stylistic. Doesn't matter how you choose to name your compound index.
Reference: How to use getall with orderby in RethinkDB

Lead and lag in Hbase

I'm trying to figure out how to do the equivalent of Oracle's LEAD and LAG in Hbase or some other sort of pattern that will solve my problem. I could write a MapReduce program that does this quite easily, but I'd love to be able to exploit the fact that the data is already sorted in the way I need it to be.
My problem is as follows: I have a rowkey and a value that looks like:
(employee name + timestamp) => data:salary
So, some example data might be:
miller, bob;2010-01-14 => data:salary=90000
miller, bob;2010-11-04 => data:salary=102000
miller, bob;2011-12-03 => data:salary=107000
monty, fred;2010-04-10 => data:salary=19000
monty, fred;2011-09-09 => data:salary=24000
What I want to do is calculate the changes of salary, record by record. I want to transform the above data into differences between records:
miller, bob;2010-01-14 => data:salarydiff=90000
miller, bob;2010-11-04 => data:salarydiff=12000
miller, bob;2011-12-03 => data:salarydiff=5000
monty, fred;2010-04-10 => data:salarydiff=19000
monty, fred;2011-09-09 => data:salarydiff=5000
I'm up for changing the rowkey strategy if necessary.

What I'd do is change the key so that the timestamp will be descending (newer salary first)
miller, bob;2011-12-03 => data:salary=107000
miller, bob;2010-11-04 => data:salary=102000
miller, bob;2010-01-14 => data:salary=90000
Now you can do a simple map job that will scan the table. Then in the map you create a new Scan to the current key. Scan.next to get the previous salary, calculate the diff and store it in a new column on the current row key
Basically in your mapper class (the one that inherits TableMapper) you override the setup method and get the configuration
#Override
protected void setup(Mapper.Context context) throws IOException,InterruptedException {
Configuration config = context.getConfiguration();
table = new HTable(config,<Table Name>);
}
Then inside the map you extract the row key from the row parmeter, create the new Scan and continue as explained above
In most cases the next record would be in the same region - occasionally it might go to another regionserver

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to get specific rows in Hbase? - hadoop

My rowKeys in HBase like this; a1s1 a1s2 a1s3 a2s1 a3s1 a3s2 ... I want to get only these data; a1s1 a2s1 a3s1 But when I run thise query; scan 't1', {STARTROW=>'a1s1', ENDROW=>'a4s1'} It gives me; a1s1 a1s2 a1s3 a2s1 a3s1 But I don't want to get a1s2 and a1s3. How can I do this?

I assume, you want to get the row key starting with "a" and ending with "s1". So either you can use below: scan 't1', { ENDROW=>'s1'} Or scan 't1', {STARTROW=>'a', ENDROW=>'s1'} Another option is using regexString: scan 't1', {FILTER => "RowFilter(=, 'regexstring:s1')"}

Related

How to match exact Id within Comma seperated database column using LINQ and Lambda

Hbase Scan returning data out of range

Get full rows based on single column distinct

RethinkDB index for filter + orderby

Lead and lag in Hbase

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to get specific rows in Hbase? - hadoop

My rowKeys in HBase like this; a1s1 a1s2 a1s3 a2s1 a3s1 a3s2 ... I want to get only these data; a1s1 a2s1 a3s1 But when I run thise query; scan 't1', {STARTROW=>'a1s1', ENDROW=>'a4s1'} It gives me; a1s1 a1s2 a1s3 a2s1 a3s1 But I don't want to get a1s2 and a1s3. How can I do this?

I assume, you want to get the row key starting with "a*" and ending with "s1". So either you can use below: scan 't1', { ENDROW=>'s1'} Or scan 't1', {STARTROW=>'a', ENDROW=>'s1'} Another option is using regexString: scan 't1', {FILTER => "RowFilter(=, 'regexstring:*s1')"}

Related

How to match exact Id within Comma seperated database column using LINQ and Lambda

Hbase Scan returning data out of range

Get full rows based on single column distinct

RethinkDB index for filter + orderby

Lead and lag in Hbase

Categories

Resources

I assume, you want to get the row key starting with "a" and ending with "s1". So either you can use below: scan 't1', { ENDROW=>'s1'} Or scan 't1', {STARTROW=>'a', ENDROW=>'s1'} Another option is using regexString: scan 't1', {FILTER => "RowFilter(=, 'regexstring:s1')"}