I am trying to get all row keys which match a particular pattern from Hbase table. The rowkey are present in a particular format as abc:def:ghi.
I am trying to write a java code which would get me all row keys starting with "abc". If any one can give me some idea , by using filter or any other method we can obtain the same . That would be very grateful.
Thanks
Ashit
Related
I have a csv file of more than 1M records written in English + another language. I have to make a UI that gets a keyword, search through the document, and returns record where that key appears. I look for the key in two columns only.
Here is how I implemented it:
First, I made a postgres database for the data stored in the CSV file. Then made a classic website where the user can enter a keyword. This is the SQL query that I use(In spring boot)
SELECT * FROM table WHERE col1 LIKE %:keyword% OR col2 LIKE %:keyword%;
Right now, it is working perfectly fine, but I was wondering how to make search faster? was using SQL instead of classic document search better?
If the document is only searched once and thrown away, then it's overhead to load into a database. Instead can search the file directly using the nio parallel search feature which uses multiple threads to concurrently search the file:
List<Record> result = Files.lines("some/path")
.parallel()
.unordered()
.map(l -> lineToRecord(l))
.filter(r -> r.getCol1().contains(keyword) || r.getCol2().contains(keyword))
.collect(Collectors.toList());
NOTE: need to provide the lineToRecord() method and the Record class.
If the document is going to be searched over and over again, then can think about indexing the document. This means pre-processing the document to suit the search requirements. In this case it's keywords of col1 and col2. An index is like a map in java, eg:
Map<String, Record> col1Index
But since you have the "LIKE" semantics, this is not so easy to do as it's not as simple as splitting the string by white space since the keyword could match a substring. So in this case it might be best to look for some tool to help. Typically this would be something like solr/lucene.
Databases can also provide similar functionality eg: https://www.postgresql.org/docs/current/pgtrgm.html
For LIKE queries, you should look at the pg_trgm index type with the gin_trgm_ops operator class. You shouldn't need to change query at all, just build the index on each column. Or maybe one multi-column index.
I just started working with graylog and I have some issues.
Can I write a query that will bring me logs with unique identifier?
For examples I have logs with op_id and loan_amt and I want to get sum of loan_amt from all logs. Here comes the problem : some logs may share same op_id and my sum will not be correct because will add plenty times the loan_amt from logs with same op_id
Can you help me, please?
If I understand correctly you will need to further narrow down your search criteria to filter out duplicate log entries.
You can use the GrayLog search query language to do this.
Try to find fields where duplicate logs differentiate from each other and then create a filter to exclude one from your results.
For example something like this:
source:hostname.that.logs.loans_amt AND LoggerName:your.logger.that.logs.loan_amt
I am saving list of data as below in hbase with unique id along with column family name :
I can query on address column family with specific id but I want to query on the json value like
where homenumber = 4
Can we do that? Any example will be helpful
Thanks
You can use HBase filter for this. Find the possible duplicate questions
Scan with filter using HBase shell
Scan HTable rows for specific column value using HBase shell
To start working with HBase filter, refer
http://hbase.apache.org/0.94/book/client.filter.html
http://www.hadooptpoint.org/filters-in-hbase-shell/
I am using JDBC river to pull data to ElasticSearch from Oracle database.
As mentioned in following link, left join can be used to get multiple values of one column of same primary id record in single json array. But if there is only one records after left join, river doesn't create array, rather puts the value in the json field.
This is causing problem to NEST to understand the type of object.
https://github.com/jprante/elasticsearch-jdbc#structured-objects
So, is there any way to force some fields to be array even it has just one value?
There is a way to to do this using the bracket notation as described here JDBC river Bracket Notation
So basically in your SQL query if you have
Select tag as tag.name from tags
you need to change it to
Select tag as tag[name] from tags
hope this helps
I am using two different indexes which are named cdr_mobile and cdr_volte that have lots of common fields. So in order to show datas on Kibana I have made an alias using two of them cdr_alias and now I am using cdr_alias in order to retieve data to Kibana.
The problem I have is, I cannot demonstrate common fields on the same column. Instead of that I am having for example, cdr_volte.startOfCharge and startOfCharge (the other thing I dont understand is here, it is using the cdr_mobile as default).
Do you have any idea how I can put the common fields on the same column ?
Thank you
I have made an error on mapping, In order to put same fields on the same column you must have exactly same mapping for the field for both indices.