Is it possible to set where condition to hbase row-keys? - hadoop

Is it possible to set where condition to hbase row-keys? Suppose I have row-keys 1,2,3,4,5...
I need to query like "where row-key<4"??

I think you want an InclusiveStopFilter
s = new Scan(Bytes.toBytes("startRow"));
s.setFilter(new InclusiveStopFilter(Bytes.toBytes("stopRow")));
http://svn.apache.org/repos/asf/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/filter/InclusiveStopFilter.java

You can easily write your own FilterBase implementation with any meaning you want.
http://svn.apache.org/repos/asf/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
override filterRowKey method like in InclusiveStopFilter sources.

You can have a scan start and stop row:
Scan s = new Scan();
s.setStartRow(Bytes.toBytes("startRow"));
s.setStopRow(Bytes.toBytes("endRow"));

Related

How to skip update when parameter is null/0 in spring batch with JdbcBatchItemWriter

We have a scenario wherein while doing a batch update on a table using JdbcBatchItemWriter, We are not finding a way to not update is the attribute is null. We don't want to have too many queries and ItemPreparedStatementSetter so we have a single query to update all fields in the table. Different batch jobs set update different attributes of the table
List<Report> summaryList = getSummaryList()
JdbcBatchItemWriter<ItemMktDcGpReport> writer1 = new JdbcBatchItemWriter<>();
String sql_update = GenericConstants.UPDATE_QUERY;
writer1.setDataSource(dataSource);
ItemPreparedStatementSetter<Report> updatePreparedStatementSetter = new ItemMergeUpdatePreparedItemSetter();
writer1.setItemPreparedStatementSetter(updatePreparedStatementSetter);
writer1.setSql(sql_update);
writer1.afterPropertiesSet();
writer1.write(summaryList);
Tried the following seeing few examples on conditional update at the query but it doesn't help yet.
Below is the query. Any help on this will be very much appreciated.
UPDATE_QUERY = "update [dbo].[test_tbl]
SET test_col1 = CASE When ?!=0 then ?
else test_col1 end ,
test_col2 = CASE When ?!=0 then ?
else test_col2 WHERE market=? and country = ?"
I don't want to construct the SQL query based on parameter as I will lose out on the bulk writing feature of JdbcBatchItemWriter. Can someone please suggest the right approach to solve this problem and possibly correct the SQL query I'm writing?

Using Spring MongoTemplate to update nested arrays in MongoDB

Can anyone help with a MongoTemplate question?
I have got a record structure which has nested arrays and I want to update a specific entry in a 2nd level array. I can find the appropriate entry easier enough by the Set path needs the indexes of both array entries & the '$' only refers to the leaf item. For example if I had an array of teams which contained an array of player I need to generate an update path like :
val query = Query(Criteria.where( "teams.players.playerId").`is`(playerId))
val update = Update()
with(update) {
set("teams.$.players.$.name", player.name)
This fails as the '$' can only be used once to refer to the index in the players array, I need a way to generate the equivalent '$' for the index in the teams array.
I am thinking that I need to use a separate Aggregate query using the something like this but I can't get it to work.
project().and(ArrayOperators.arrayOf( "markets").indexOf("")).`as`("index")
Any ideas for this Mongo newbie?
For others who is facing similar issue, One option is to use arrayFilters in UpdateOptions. But looks like mongotemplate in spring does not yet support the use of UpdateOptions directly. Hence, what can be done is:
Sample for document which contain object with arrays of arrayObj (which contain another arrays of arrayObj).
Bson filter = eq("arrayObj.arrayObj.id", "12345");
UpdateResult result = mongoTemplate.getDb().getCollection(collectionName)
.updateOne(filter,
new Document("$set", new Document("arrayObj.$[].arrayObj.$[x].someField"), "someValueToUpdate"),
new UpdateOptions().arrayFilters(
Arrays.asList(Filters.eq("x.id, "12345))
));

Triple composite key in Hbase

I have a use case where I want 3 level composite key.
For eg.
Rollnumber:class:friendsRollNumber
I would want to query "Get all friends for a particular roll number and a class"
I could not find sufficient examples over net to use composite keys and range scans over it together.
Currently, I am doing the following.
byte[]rowkey = Bytes.add(Bytes.tobytes("myrollnumber"),Bytes.tobytes("myClass"),Bytes.tobytes("myFriendsRollNumber"))
This is the way I form row key .
Will it select region server based on myRollNumber and myClass ? If not How can I do that ?
Also , For range scan , what is the correct way to use it. I am doing it in the following way. I am still in process of writing the code , so have not tested it.
Scan s = new Scan();
Filter f = New PrefixFilter(Bytes.tobytes("myrollnumber"),Bytes.tobytes("class"))
s.setFilter(f)
Is the above way correct to scan as per my requirement ?
Also , how to get the individual parts of rowKey from the scanner ?
Try this:
byte[] prefix=Bytes.toBytes("rollnumber" + "class");
Scan scan = new Scan(prefix));
PrefixFilter prefixFilter = new PrefixFilter(prefix);
scan.addFilter(prefixFilter);
ResultScanner resultScanner = table.getScanner(scan);
For your requirement, you can use start stop row feature of Scan. You do not need a filter.
byte[]startRow= Bytes.add(Bytes.tobytes("search_rollnumber"),Bytes.tobytes("myClass"));
byte[]stopRow= Bytes.add(Bytes.tobytes("search_rollnumber"),Bytes.tobytes("myClass"));
stopRow[stopRow.length - 1]++;
Scan s = new Scan(startRow, stopRow);
Using this scan you will get all rows starting with search_rollnumberMyClass.
I am not sure you use : in your rowkey. But i think you should if both rollnumber and class are represented as integer.

Hadoop: Map Reduce: read from HBase, but filter rows by content of one column

I am really new to Hadoop and I am not able to find an answer to my question. I want to write a map reduce job, where I read from HBase and write then in a simple text file.
In HBase, Ive got a column representing an id. Now I dont want to work on all containing rows in my HBase Table, but only on those between a maxId and a minId.
I found out that I could possibly user filters (scan.setFilter), so that I can filter rows which dont match my request.
This is my first Map Reduce Job, so please be patient :-)
Ive got a Starter Class, where I configure the job and the Scan Object and then start the job.
Now, my first try looks like this:
private Scan getScan()
{
final Scan scan = new Scan();
// ** FILTER **
List<Filter> filters = new ArrayList<Filter>();
Filter filter1 = new ValueFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes(Integer.parseInt(minId))));
filters.add(filter1);
Filter filter2 = new ValueFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes(Integer.parseInt(maxId))));
filters.add(filter2);
FilterList filterList = new FilterList(filters);
scan.setFilter(filterList);
scan.setCaching(500);
scan.setCacheBlocks(false);
// id
scan.addColumn("columnfamily".getBytes(), "id".getBytes());
return scan;
}
Well, Im not sure if this is the right way to do it. I also read that I could pass my minId and maxId maybe with the Configuration Object to the Map Job, but Im not sure how.
Besides, what have I to do afterwards? I would normally just initiate the job with initTableMapperJob and pass the Scan Object to it. Ive read something of ResultScanner and so, do I need them? I thought the MapReduce Framework would now automatically pass the correct rows to my map job, is that correct?

EF4.1 LINQ, selecting all results

I am new to LINQ queries and to EF too, I usually work with MySQL and I can't guess how to write really simples queries.
I'd like to select all results from a table. So, I used like this:
ZXContainer db = new ZXContainer();
ViewBag.ZXproperties = db.ZXproperties.All();
But I see that I have to write something inside All(---).
Could someone guide me in how could I do that? And if someone has any good link for references too, I thank so much.
All() is an boolean evaluation performed on all of the elements in a collection (though immediately returns false when it reaches an element where the evaluation is false), for example, you want to make sure that all of said ZXproperties have a certain field set as true:
bool isTrue = db.ZXproperties.All(z => z.SomeFieldName == true);
Which will either make isTrue true or false. LINQ is typically lazy-loading, so if you're calling db.ZXproperties directly, you have access to all of the objects as is, but it isn't quite what you're looking for. You can either load all of the objects at the variable assignment with an .ToList():
ViewBag.ZXproperties = db.ZXproperties.ToList();
or you can use the below expression:
ViewBag.ZXproperties = from s in db.ZXproperties
select s;
Which is really no different than saying:
ViewBag.ZXproperties = db.ZXproperties;
The advantage of .ToList() is that if you are wanting to do multiple calls on this ViewBag.ZXproperties, it will only require the initial database call when it is assigning the variable. Alternatively, if you do any form of queryable action on the data, such as .Where(), you'll have another query performed, which is less than ideal if you already have the data to work with.
To select everything, just skip the .All(...), as ZXproperties allready is a collection.
ZXContainer db = new ZXContainer();
ViewBag.ZXproperties = db.ZXproperties;
You might want (or sometimes even need) to call .ToList() on this collection before use...
You don't use All. Just type
ViewBag.ZXproperties = db.ZXproperties;
or
ViewBag.ZXproperties = db.ZXproperties.ToList();
The All method is used to determine if all items of collection match some condition.
If you just want all of the items, you can just use it directly:
ViewBag.ZXproperties = db.ZXproperties;
If you want this evaluated immediately, you can convert it to a list:
ViewBag.ZXproperties = db.ZXproperties.ToList();
This will force it to be pulled across the wire immediately.
You can use this:
var result = db.ZXproperties.ToList();
For more information on linq see 101 linq sample.
All is some checking on all items and argument in it, called lambda expression.

Resources