HBase "between" Filters - hadoop

I'm trying to retrieve rows with in range, using Filter List but I'm not successful.
Below is my code snippet.
I want to retrieve data between 1000 and 2000.
HTable table = new HTable(conf, "TRAN_DATA");
List<Filter> filters = new ArrayList<Filter>();
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(Bytes.toBytes("TRAN"),
Bytes.toBytes("TRAN_ID"),
CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("1000")));
filter1.setFilterIfMissing(true);
filters.add(filter1);
SingleColumnValueFilter filter2 = new SingleColumnValueFilter(Bytes.toBytes("TRAN"),
Bytes.toBytes("TRAN_ID"),
CompareFilter.CompareOp.LESS,new BinaryComparator(Bytes.toBytes("2000")));
filters.add(filter2);
FilterList filterList = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList);
ResultScanner scanner1 = table.getScanner(scan);
System.out.println("Results of scan #1 - MUST_PASS_ALL:");
int n = 0;
for (Result result : scanner1) {
for (KeyValue kv : result.raw()) {
System.out.println("KV: " + kv + ", Value: "
+ Bytes.toString(kv.getValue()));
{
n++;
}
}
scanner1.close();
Tried with all possible ways using
1. SingleColumnValueFilter filter2 = new SingleColumnValueFilter(Bytes.toBytes("TRANSACTIONS"),
Bytes.toBytes("TRANS_ID"),
CompareFilter.CompareOp.LESS, new SubstringComparator("5000"));
SingleColumnValueFilter filter2 = new SingleColumnValueFilter(Bytes.toBytes("TRANSACTIONS"),
Bytes.toBytes("TRANS_ID"),
CompareFilter.CompareOp.LESS, Bytes.toBytes("5000")); None of above approaches work :(

One thing which is certainly off here is when creating the FILTERLIST you also have to specify FilterList.Operator, otherwise not sure how filterlist would handle multiple filters. In your case it should be something like :-
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);
See if this helps.

Looks OK. Check if you have persisted them as "1000" and "2000" not as byte[] for 1000 and 2000.
Rest looks fine to me.

1 FilterList default operator is Operator.MUST_PASS_ALL. It's ok of your code about this.
2 If you put the string as bytes in hbase, your choice is wrong.
because "1000" < "2" < "5000"
Put put = new Put(rowKey_ForTest);
put.add(ColumnFamilyName, QName1, Bytes.toBytes("2"));
table.put(put);
List<Filter> filters = new ArrayList<Filter>();
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
ColumnFamilyName, QName1, CompareOp.GREATER,
new BinaryComparator(Bytes.toBytes("1000")));
filters.add(filter1);
SingleColumnValueFilter filter2 = new SingleColumnValueFilter(
ColumnFamilyName, QName1, CompareOp.LESS, new BinaryComparator(
Bytes.toBytes("5000")));
filters.add(filter2);
FilterList filterList = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList);
List<String> resultRowKeys = new ArrayList<String>();
ResultScanner resultScanner = table.getScanner(scan);
for (Result result = resultScanner.next(); result != null; result = resultScanner
.next()) {
resultRowKeys.add(Bytes.toString(result.getRow()));
}
Util.close(resultScanner);
Assert.assertEquals(1, resultRowKeys.size());
3 If you put the int as bytes, your code is wrong.
You should use Bytes.toBytes(int), not Bytes.toBytes(String).
My test code is on https://github.com/zhang-xzhi/simplehbase
There is a lot of test about hbase.
Or you can post your put code, so we can check how did you save your data to hbase.
Or you can debug it, check your value's format.
See if it can help.

Related

How to Insert data in Elastic Search

In my dotnet core project i need to insert data in elastic search. I am using bellow code for insert.
List<Patient> employeeData = null;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://localhost:5001/api/Employee/GetAll");
request.Method = "GET";
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
var data= reader.ReadToEnd();
reader.Close();
dataStream.Close();
employeeData = JsonConvert.DeserializeObject<List<Employee>>(data);
}
var lst = employeeData;
int count = 0;
foreach (var obj in lst)
{
count ++;
this.client.Index(obj, i => i
.Index("employee")
.Type("myEmployee")
.Id(count)
// .Refresh()
);
}
After Execute the above code i am using bellow url for check the inserted data
localhost:9200/emp
I am getting following output.
{"emp":{"aliases":{},"mappings":{"myEmpl":{"properties":{"firstName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"gEmailId":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"gMobile":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"wmployeeID":{"type":"long"},"registrationNo":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}},"settings":{"index":{"creation_date":"1547020635852","number_of_shards":"5","number_of_replicas":"1","uuid":"6kle4jzMQDSICnPsmATbDw","version":{"created":"6050499"},"provided_name":"emp"}}}}
I am not able to see any of my data. What is the problem in this.
localhost:9200/emp returns the settings and mappings for index 'emp'. For the content, try localhost:9200/emp/_search.

HBase Aggregation

I'm having some trouble doing aggregation on a particular column in HBase.
This is the snippet of code I tried:
Configuration config = HBaseConfiguration.create();
AggregationClient aggregationClient = new AggregationClient(config);
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("drs"), Bytes.toBytes("count"));
ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();
Long sum = aggregationClient.sum(Bytes.toBytes("DEMO_CALCULATIONS"), ci , scan);
System.out.println(sum);
sum returns a value of null.
The aggregationClient API works fine if I do a rowcount.
I was trying to follow the directions in http://michaelmorello.blogspot.in/2012/01/row-count-hbase-aggregation-example.html
Could there be a problem with me using a LongColumnInterpreter when the 'count' field was an int? What am I missing in here?
You can only use long(8bytes) to do sum with default setting.
Cause in the code of AggregateImplementation's getSum method, it handle all the returned KeyValue as long.
List<KeyValue> results = new ArrayList<KeyValue>();
try {
boolean hasMoreRows = false;
do {
hasMoreRows = scanner.next(results);
for (KeyValue kv : results) {
temp = ci.getValue(colFamily, qualifier, kv);
if (temp != null)
sumVal = ci.add(sumVal, ci.castToReturnType(temp));
}
results.clear();
} while (hasMoreRows);
} finally {
scanner.close();
}
and in LongColumnInterpreter
public Long getValue(byte[] colFamily, byte[] colQualifier, KeyValue kv)
throws IOException {
if (kv == null || kv.getValueLength() != Bytes.SIZEOF_LONG)
return null;
return Bytes.toLong(kv.getBuffer(), kv.getValueOffset());
}

Export C# List to Csv file

I was trying to export my C# list to Csv file. All is set well. But the thing is field seperator is not working properly. its showing like, my string with " at the end (eg: 0000324df"). Here is my Controller code.
IEnumerable stockexpo = stockexp; // Assign value
MemoryStream output = new MemoryStream();
StreamWriter writer = new StreamWriter(output, Encoding.UTF8);
writer.Write("ItemNo,");
writer.Write("Repeat Count");
writer.WriteLine();
foreach (StockResult order in stockexpo)
{
writer.Write(String.Format("{0:d}", order.ItemNumber));
writer.Write("\"");
writer.Write(",");
writer.Write("\"");
writer.Write(order.Count);
writer.Write("\"");
writer.Write(",");
writer.WriteLine();
}
writer.Flush();
output.Position = 0;
return File(output, "text/comma-separated-values", "stockexp.csv");
I need to know how i can seperate the field values appropriately. Anyone can help me for this.
writer.Write("\"");
This line of code will be outputting a " every time. Why have it at all?
Also, I wouldn't have a comma before the WriteLine, since there is no need to delimit the end of the file.
IEnumerable stockexpo = stockexp; // Assign value
MemoryStream output = new MemoryStream();
StreamWriter writer = new StreamWriter(output, Encoding.UTF8);
writer.Write("ItemNo,");
writer.Write("Repeat Count");
writer.WriteLine();
foreach (StockResult order in stockexpo)
{
writer.Write(order.ItemNumber);
writer.Write(",");
writer.Write(order.Count);
writer.WriteLine();
}
writer.Flush();
output.Position = 0;
return File(output, "text/comma-separated-values", "stockexp.csv");

When creating and loading HFile programmatically to HBase new entries are unavailable

I'm trying to create HFiles programmatically and loading them in a running HBase instance. I found a lot of info in HFileOutputFormat and in LoadIncrementalHFiles
I managed to create the new HFile, send it to the cluster. In the cluster web interface the new store file appears but the new keyrange is unavailable.
InputStream stream = ProgrammaticHFileGeneration.class.getResourceAsStream("ga-hourly.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
String line = null;
Map<byte[], String> rowValues = new HashMap<byte[], String>();
while((line = reader.readLine())!=null) {
String[] vals = line.split(",");
String row = new StringBuilder(vals[0]).append(".").append(vals[1]).append(".").append(vals[2]).append(".").append(vals[3]).toString();
rowValues.put(row.getBytes(), line);
}
List<byte[]> keys = new ArrayList<byte[]>(rowValues.keySet());
Collections.sort(keys, byteArrComparator);
HBaseTestingUtility testingUtility = new HBaseTestingUtility();
testingUtility.startMiniCluster();
testingUtility.createTable("table".getBytes(), "data".getBytes());
Writer writer = new HFile.Writer(testingUtility.getTestFileSystem(),
new Path("/tmp/hfiles/data/hfile"),
HFile.DEFAULT_BLOCKSIZE, Compression.Algorithm.NONE, KeyValue.KEY_COMPARATOR);
for(byte[] key:keys) {
writer.append(new KeyValue(key, "data".getBytes(), "d".getBytes(), rowValues.get(key).getBytes()));
}
writer.appendFileInfo(StoreFile.BULKLOAD_TIME_KEY, Bytes.toBytes(System.currentTimeMillis()));
writer.appendFileInfo(StoreFile.MAJOR_COMPACTION_KEY, Bytes.toBytes(true));
writer.close();
Configuration conf = testingUtility.getConfiguration();
LoadIncrementalHFiles loadTool = new LoadIncrementalHFiles(conf);
HTable hTable = new HTable(conf, "table".getBytes());
loadTool.doBulkLoad(new Path("/tmp/hfiles"), hTable);
ResultScanner scanner = hTable.getScanner("data".getBytes());
Result next = null;
System.out.println("Scanning");
while((next = scanner.next()) != null) {
System.out.format("%s %s\n", new String(next.getRow()), new String(next.getValue("data".getBytes(), "d".getBytes())));
}
Did anyone actually make this work ? I have a compilable / testable version up on my github
take a look at the LoadIncrementalHFiles test in the hbase source code : https://github.com/apache/hbase/blob/7c46646994b7a9d6f947cf12796579ef48d0b0bd/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java

Update using LINQ to SQL

How can I update a record against specific id in LINQ to SQL?
LINQ is a query tool (Q = Query) - so there is no magic LINQ way to update just the single row, except through the (object-oriented) data-context (in the case of LINQ-to-SQL). To update data, you need to fetch it out, update the record, and submit the changes:
using(var ctx = new FooContext()) {
var obj = ctx.Bars.Single(x=>x.Id == id);
obj.SomeProp = 123;
ctx.SubmitChanges();
}
Or write an SP that does the same in TSQL, and expose the SP through the data-context:
using(var ctx = new FooContext()) {
ctx.UpdateBar(id, 123);
}
In the absence of more detailed info:
using(var dbContext = new dbDataContext())
{
var data = dbContext.SomeTable.SingleOrDefault(row => row.id == requiredId);
if(data != null)
{
data.SomeField = newValue;
}
dbContext.SubmitChanges();
}
AdventureWorksDataContext db = new AdventureWorksDataContext();
db.Log = Console.Out;
// Get hte first customer record
Customer c = from cust in db.Customers select cust where id = 5;
Console.WriteLine(c.CustomerType);
c.CustomerType = 'I';
db.SubmitChanges(); // Save the changes away
DataClassesDataContext dc = new DataClassesDataContext();
FamilyDetail fd = dc.FamilyDetails.Single(p => p.UserId == 1);
fd.FatherName=txtFatherName.Text;
fd.FatherMobile=txtMobile.Text;
fd.FatherOccupation=txtFatherOccu.Text;
fd.MotherName=txtMotherName.Text;
fd.MotherOccupation=txtMotherOccu.Text;
fd.Phone=txtPhoneNo.Text;
fd.Address=txtAddress.Text;
fd.GuardianName=txtGardianName.Text;
dc.SubmitChanges();
I found a workaround a week ago. You can use direct commands with "ExecuteCommand":
MDataContext dc = new MDataContext();
var flag = (from f in dc.Flags
where f.Code == Code
select f).First();
_refresh = Convert.ToBoolean(flagRefresh.Value);
if (_refresh)
{
dc.ExecuteCommand("update Flags set value = 0 where code = {0}", Code);
}
In the ExecuteCommand statement, you can send the query directly, with the value for the specific record you want to update.
value = 0 --> 0 is the new value for the record;
code = {0} --> is the field where you will send the filter value;
Code --> is the new value for the field;
I hope this reference helps.

Resources