How to set start and end row key HBASE - hadoop

If i have row keys like
a_c
b_c
j_f
f_d
d_c
I should get all the rows matching _c. How to set start and stop row key here . I am trying to get the scan result out of start and stop row key and not with rowfilter or other filter types.

You can write your own filter function if you don't want to use RowFilter. But I suggest you to use PrefixFilter if you can't write your own filter function and don't want to use RowFilter
Example for Java:
byte[] prefixF= Bytes.toBytes("_c");
Scan scan = new Scan(prefixF));
PrefixFilter prefixFilter = new PrefixFilter(prefixF);
scan.addFilter(prefixFilter);
ResultScanner resultScanner = table.getScanner(scan);
Above code is equal to hbase> scan 'YourTablename', { FILTER => "PrefixFilter('_c')"}

You can use Hbase STARTROW and ENDROW filter. It basically scans data between the rowkey range (ENDROW excluded).
scan 'table_name', {STARTROW=>"<start_row_key>", ENDROW=>"<end_row_key>"}

Related

Should the query contains partition key if we have set partition key into FeedOption already?

I am using Document DB with partition key = "deviceId".
Is there any different between 2 code below:
var fo = new FeedOption{ PartitionKey= new PartitionKey("A1234") };
var partitionKeyInQuery= dbClient.CreateDocumentQuery(d => d.deviceId = "A1234" and d.type==1, fo);
var noPartitionKeyInQuery = dbClient.CreateDocumentQuery(d => d.type==1, fo);
When PartitionKey is applied in FeedOption, should I add "deviceId" in WHERE clause?
I believe there is no difference in performance. RequestCharge is the same and the where clause makes the query partition specific i.e eliminates cross partition query.
From the documentation:
Querying partitioned containers
When you query data in partitioned containers, Cosmos DB automatically routes the query to the partitions corresponding to the partition key values specified in the filter (if there are any). For example, this query is routed to just the partition containing the partition key "XMS-0001".
// Query using partition key
IQueryable<DeviceReading> query = client.CreateDocumentQuery<DeviceReading>(
UriFactory.CreateDocumentCollectionUri("db", "coll"))
.Where(m => m.MetricType == "Temperature" && m.DeviceId == "XMS-0001");
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-partition-data

Filter inner bag in Pig

The data looks like this:
22678, {(112),(110),(2)}
656565, {(110), (109)}
6676, {(2),(112)}
This is the data structure:
(id:chararray, event_list:{innertuple:(innerfield:chararray)})
I want to filter those rows where event_list contains 2. I thought initially to flatten the data and then filter those rows that have 2. Somehow flatten doesn't work on this dataset.
Can anyone please help?
There might be a simpler way of doing this, like a bag lookup etc. Otherwise with basic pig one way of achieving this is:
data = load 'data.txt' AS (id:chararray, event_list:bag{});
-- flatten bag, in order to transpose each element to a separate row.
flattened = foreach data generate id, flatten(event_list);
-- keep only those rows where the value is 2.
filtered = filter flattened by (int) $1 == 2;
-- keep only distinct ids.
dist = distinct (foreach filtered generate $0 as (id:chararray));
-- join distinct ids to origitnal relation
jnd = join a by id, dist by id;
-- remove extra fields, keep original fields.
result = foreach jnd generate a::id, a::event_list;
dump result;
(22678,{(112),(110),(2)})
(6676,{(2),(112)})
You can filter the Bag and project a boolean which says if 2 is present in the bag or not. Then, filter the rows which says that projection is true or not
So..
input = LOAD 'data.txt' AS (id:chararray, event_list:bag{});
input_filt = FOREACH input {
bag_filter = FILTER event_list BY (val_0 matches '2');
GENERATE
id,
event_list,
isEmpty(bag_filter.$0) ? false : true AS is_2_present:boolean;
;
};
output = FILTER input_filt BY is_2_present;

HBase Get values where rowkey in

How do I get all the values in HBase given Rowkey values?
val tableName = "myTable"
val hConf = HBaseConfiguration.create()
val hTable = new HTable(hConf, tableName)
val theget= new Get(Bytes.toBytes("1001-A")) // rowkey values (1001-A, 1002-A, 2010-A, ...)
val result = hTable.get(theget)
val values = result.listCells()
The code above only works for one rowkey.
You can use Batch operations. Please refer the link below for Javadoc : Batch Operations on HTable
Another approach is to Scan with a start row key & end row key (First & Last row keys from an sorted ascending set of keys). This makes more sense if there are too many values.
There is htable.get method that take list of Gets:
List<Get> gets = ....
List<Result> results = htable.get(gets)

Hbase - get column names for row by column name prefix

I have a Hbase Table with the following description.
For a row key, my column would be of the form a_1, a_2,a_3,b_1,c_1,C_2 and so on, a compound key format.
Suppose one of my row is of the form
row key - row1
column family - c1
columns - a_1, a_2,a_3,b_1,b_2,c_1,C_2,d_9,d_99
Can I, by any operation retrieve a,b,c,d as the columns corresponding to row1, I am not bothered about whatever be the suffixes for a,b,c...
I can get all column names for a given row, add them to set by splitting the row keys by their first part and emit the set. I am worried, if there would be a better way of doing it by filters or some other hbase way of getting it done, please comment...
You can use COlumnPrefixFilter for that. You can see the following code
Configuration hadoopConf = new Configuration();
hadoopConf.set("hbase.zookeeper.quorum", "localhost");
hadoopConf.set("hbase.zookeeper.property.clientPort", "2181");
HTable hTable = new HTable(hadoopConf, "KunderaExamples");
Scan scan = new Scan();
scan.setFilter(new ColumnPrefixFilter("A".getBytes()));
ResultScanner scanner = hTable.getScanner(scan);
Iterator<Result> resultsIter = scanner.iterator();
while (resultsIter.hasNext())
{
Result result = resultsIter.next();
List<KeyValue> values = result.list();
for (KeyValue value : values)
{
System.out.println(value.getKey());
System.out.println(new String(value.getQualifier()));
System.out.println(value.getValue());
}
}

How to apply several QualifierFilter to a row in HBase

we would like to filter a scan on a HBase table with two QualifierFilters.
Means we would like to only get the rows of the table which do have a certain column 'col_A' AND (!) a certain other column 'col_B'.
Our current approach looks like this:
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
Filter filter1 = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("col_A".getBytes()));
filterList.addFilter(filter1);
Filter filter2 = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator("col_B".getBytes()));
filterList.addFilter(filter2);
Scan scan = new Scan();
scan.setFilter(filterList);
...
The ResultScanner does not return any results from this scan although there are several rows in the HBase table which do have both columns 'col_A' and 'col_B'.
If we only apply filter1 to the scan everything works fine an we do get all the rows which have 'col_A'.
If we only apply filter2 to the scan it is the same. We do get all rows which have 'col_B'.
Only if we combine these two filters we do not get any results.
What would be the right way to get only the rows from the table which do have col_A AND col_B?
You can achieve this by defining the following filters:
List<Filter> filters = new ArrayList<Filter>(2);
byte[] colfam = Bytes.toBytes("c");
byte[] fakeValue = Bytes.toBytes("DOESNOTEXIST");
byte[] colA = Bytes.toBytes("col_A");
byte[] colB = Bytes.toBytes("col_B");
SingleColumnValueFilter filter1 =
new SingleColumnValueFilter(colfam, colA , CompareOp.NOT_EQUAL, fakeValue);
filter1.setFilterIfMissing(true);
filters.add(filter1);
SingleColumnValueFilter filter2 =
new SingleColumnValueFilter(colfam, colB, CompareOp.NOT_EQUAL, fakeValue);
filter2.setFilterIfMissing(true);
filters.add(filter2);
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);
Scan scan = new Scan();
scan.setFilter(filterList);
The idea here is to define one SingleColumnValueFilter per column you are looking for, each with a fake value and a CompareOp.NOT_EQUAL operator. I.e:
such a SingleColumnValueFilter will return all columns for a given name.
Source: http://mapredit.blogspot.com/2012/05/using-filters-in-hbase-to-match-two.html
I think this line is the issue -
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
You want it to be -
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);
The filter will try to find a column that has both the column qualifier and there is no such column

Resources