Search Multiple Indexes with condition - elasticsearch

Here is requirement I am working on
There are multiple indexes with name content_ssc, content_teal, content_mmy.
These indexes can have common data (co_code is one of the field in the documents of these indexes)
a. content_ssc can have documents with co_code = teal/ssc/mmy
b. content_mmy can have documents with co_code = ssc/mmy
I need to get the data using below condition (this is one of the approach to get the unique data from these indexes)
a. (Index = content_ssc and site_code = ssc) OR (Index = content_mmy and site_code = mmy)
Basically I am getting a duplicate data from these indexes currently so I need any solution which should fetch unique data from these indexes using the above condition.
I have tried using boolean query with multiple indices from this link but it didn't produce unique result.
Please suggest.

You can use distinct query , and you will get unique result

Related

NRediSearch - Getting total documents matched count

Is there a way to get a total results count when calling Aggregate function?
Note that I'm not using Aggregate function to aggregate results, but as an advanced search query, because Search function does not allow to sort by multiple fields.
RediSearch returns total documents matched count, but I can't find a way to get this number using NRediSearch library.
With NRediSearch
Using NRediSearch, you would need to build and execute aggregation that will run a GROUPBY 0 and the COUNT reducer, say you have a person-idx index and you want to count all the Person documents in Redis:
var client = new Client("person-idx", muxer.GetDatabase());
var result = await client.AggregateAsync(new AggregationBuilder().GroupBy(new List<string>(), new List<Reducer>{Reducers.Count()}));
Console.WriteLine(result.GetResults().First().Values.First());
Will get the count you are looking for.
With Redis.OM
There's a newer library Redis.OM which you can also use to make these aggregations a bit simpler, the same operation would be done with the following:
var peopleAggregations = provider.AggregationSet<Person>();
Console.WriteLine(peopleAggregations.Count());

how to change hbase table scan results order

I am trying to copy specific data from one hbase table to another which requires scanning the table for only rowkeys and parsing a specific value from there. It works fine but I noticed the results seem to be returned in ascending sort order & in this case alphabetically. Is there a way to specify a reverse order or perhaps by insert timestamp?
Scan scan = new Scan();
scan.setMaxResultSize(1000);
scan.setFilter(new FirstKeyOnlyFilter());
ResultScanner scanner = TestHbaseTable.getScanner(scan);
for(Result r : scanner){
System.out.println(Bytes.toString(r.getRow()));
String rowKey = Bytes.toString(r.getRow());
if(rowKey.startsWith("dm.") || rowKey.startsWith("bk.") || rowKey.startsWith("rt.")) {
continue;
} else if(rowKey.startsWith("yt")) {
List<String> ytresult = Arrays.asList(rowKey.split("\\s*.\\s*"));
.....
This table is huge so I would prefer to skip to the rows I actually need. Appreciate any help here.
Have you tried the .setReversed() property of the Scan? Keep in mind that in this case your start row would have to be the logical END of your rowKey range, and from there it would scan 'upwards'.

Elasticsearch 5 - Field distinct values without aggregation

I'm working with a time-based index storing syslog events.
All the data is coming from different sources (PCs).
Suppose I have this kind of events:
timestamp = 0
source = PC-1
event = event_type_1
timestamp = 1
source = PC-1
event = event_type_1
timestamp = 1
source = PC-2
event = event_type_1
I want to make a query that will retrieve all the distinct value of "source" field for documents where match event = event_type_1
I am expecting to have all exact values (no approximations).
To achieve it I have written a cardinality query with an aggregation specifying the correct size, because I have no prior knowledge of the number of distinct sources. I think this is a expensive work to do as it consumes a lot of memory.
Is there any other alternative to get this done?

Searching without duplication - aggregations and tophit

I am beginning with ElasticSearch and really like it, hovewer I am stuck with quite simple scenario.
I am indexing such structure of a Worker:
NAME SURENAME ID AGE SEX NAME_SURENAME BIRTH_DATE
NAME_SURENAME - not analyzed - this field is indexed for grouping purposes
NAME, SURENAME - analyzed
The task is simple - search 5 unique workers sorted by birth_date (unique means the same name and surename, even if they are in different age and are different people)
I read about aggregation queries and as I understand, I can get only aggregations without documents. Unfortunatelly I aggregate by name and surename so I won't have other fields in results in buckets, like for example document ID field at least. But I also read about TopHit aggregation, that it returns document, and i tried it - the second idea below.
I have two ideas
1) Not use aggregations, just search 5 workers, filter duplicates in java and again search workers and filter duplicates in Java till I reach 5 unique results
2) Use aggregations. I event tried it like below, it even works on test data but since it is my first time, please advice, whether it works accidentially or it is done correctly? So generally I thought I could get 5 buckets with one TopHit document. I have no idea how TopHit document is chosen but it seems to work. Below is the code
String searchString = "test";
BoolQueryBuilder query = boolQuery().minimumNumberShouldMatch(1).should(matchQuery("name", searchString).should(matchQuery("surename", searchString));
TermsBuilder terms = AggregationBuilders.terms("namesAgg").size(5);
terms.field("name_surename");
terms.order(Terms.Order.aggregation("birthAgg", false)).subAggregation(AggregationBuilders.max("birthAgg")
.field("birth_date")
.subAggregation(AggregationBuilders.topHits("topHit").setSize(1).addSort("birth_date", SortOrder.DESC));
SearchRequestBuilder searchRequestBuilder = client.prepareSearch("workers")
.addAggregation(terms).setQuery(query).setSize(1).addSort(SortBuilders.fieldSort("birth_date")
.order(SortOrder.DESC));
Terms aggregations = searchRequestBuilder.execute().actionGet().getAggregations().get("namesAgg");
List<Worker> results = new ArrayList<>();
for (Terms.Bucket bucket : aggregations.getBuckets()) {
Optional<Aggregation> first = bucket.getAggregations().asList().stream().filter(aggregation -> aggregation instanceof TopHits).findFirst();
SearchHit searchHitFields = ((TopHits) first.get()).getHits().getHits()[0];
Transformer<SearchHit, Worker> transformer = transformers.get(Worker.class);
Worker transform = transformer.transform(searchHitFields);
results.add(transform);
}
return results;//

Propel, selecting just one column

I am trying to run a query in propel that runs an aggregate function (SUM).
My Code
$itemQuery = SomeEntity::Create();
$itemQuery->withColumn('SUM(SomeColumn)', someColumn)
->groupBy(SomeForeignKey);
Problem
It should theoretically return the sum of every group of items but the problem is propel tries to fetch all columns, and also appends a bunch of other columns to the group by clause. This results in an unexpected categorisation and therefore the sum is incorrect.
Is there anyway to make propel fetch just the column I am running the aggregation function on so that the group by statement works as well?
You need to add a select statement for the column and the foreign key:
$itemQuery = SomeEntity::Create();
$itemQuery->select(array(SomeColumn, SomeForeignKey));
$itemQuery->withColumn('SUM(SomeColumn)', someColumn);
$itemQuery->groupBy(SomeForeignKey);

Resources