how to get Unique records from elastic search engine based on a field - elasticsearch

I have an elastic search index that stores the list of restaurants in an area. I'm using spring elastic search to query the restaurant based on a given geo-location (lat/long) within 10 miles distance. I have a requirement where I only need to show a restaurant chain once, I'm seeing multiple records in my search result for the restaurant chains because they have the same name but different addresses. I only need to show the nearest restaurant chain restaurant along with the other unique restaurants. Is there a single query that can do that? Below is my code [removed some stuff for brevity!]
public SearchHits<Results> search(List<String> items){
final NativeSearchQueryBuilder searchQuery = new NativeSearchQueryBuilder();
BoolQueryBuilder termsQuery = boolQuery();
termsQuery.should(termsQuery(entry.getKey(), items));
boolQuery.must(termsQuery);
// ...I do additional logic here
searchQuery.withQuery(boolQuery);
// apply the terms aggregation searchQuery.addAggregation(terms(CATEGORIES_KEY).field(CATEGORY).size(BUCKET_SIZE));
Query query = searchQuery.build();
SearchHits<Results> searchHits = elasticsearcTemplate.search(query, Results.class);
return searchHits;
}

I was going thru the documentation of elasticsearch, it turns out...there is a simple fix for that :) I can use Collapse The collapse feature removes the duplicate data based on a field. So I only needed to add this line:
searchQuery.withCollapseField("restaurant_name");
// restaurant_name is what I want unique values on

Related

Mongodb Retrieve records based on only day and month

I am new in writing aggregate queries in Mongo DB + Spring
Scenario: We are storing birthDate(Jjava.uti.Date) in mongo db which got stored as ISO date. Now we are trying to look for the records which are matching with the dayOfMonth and Month only. So that we can corresponding object from the list.
I had gone through few solutions and here is the way I am trying but this is giving me a null set of records.
Aggregation agg = Aggregation.newAggregation(
Aggregation.project().andExpression("dayOfMonth(birthDate)").as("day").andExpression("month(birthDate)")
.as("month"),
Aggregation.group("day", "month"));
AggregationResults<Employee> groupResults = mongoTemplate.aggregate(agg, Employee.class, Employee.class);
I also tried applying a a query with the help of Criteria but this is also giving me a Employee object which all null content.
Aggregation agg = Aggregation.newAggregation(Aggregation.match(Criteria.where("birthDate").lte(new Date())), Aggregation.project().andExpression("dayOfMonth(birthDate)").as("day").andExpression("month(birthDate)")
.as("month"),
Aggregation.group("day", "month"));
AggregationResults<Employee> groupResults = mongoTemplate.aggregate(agg, Employee.class, Employee.class);
I must missing some important thing which is giving me these null data.
Additional Info: Employee object has only birthDate(Date) and email(String) in it
Please try to specify the fields to be included in the $project stage.
project("birthDate", "...").andExpression("...
The _id field is, by default, included in the output documents. To include any other fields from the input documents in the output documents, you must explicitly specify the inclusion in $project.
see: MongoDBReference - $project (aggregation)
I've created DATAMONGO-2200 to add an option to project directly onto the fields of a given domain type via something like project(Employee.class).

Elasticsearch QueryBuilder not all fields always there

I'm trying to use a QueryBuilder but I have problems with fields not always being needed.
.setQuery(QueryBuilders.boolQuery()
.must(termQuery("country", countryName))
.must(termQuery("Region", regionName))
.must(termQuery("City", city))
.must(rangeQuery("persons").from(persons))
.get();
In the example above city might not always be needed, but if I leave it empty it searches for an empty city. This is just for city, but I expect 10+ fields later on.
Can I somehow conditionally add things to the builder or is there another smart way?
You can build your query and then pass it to the search request. During building you can conditionally add your statements to the query. It will look like this
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.must(termQuery("country", countryName))
.must(termQuery("Region", regionName))
.must(rangeQuery("persons").from(persons));
if(city != null && city.trim().equals("")) {
queryBuilder.must(termQuery("City", city));
}
.setQuery(queryBuilder); //add query to your search request

Spring data elastic search - Query - Full text search

I am trying to use elastic search for full text search and Spring data for integrating elastic search with my application.
For example,
There are 6 fields to be indexed.
1)firstName
2)lastName
3)title
4)location
5)industry
6)email
http://localhost:9200/test/_mapping/
I can see these fields in the mapping.
Now, I would like to make a search against these fields with a search input.
For example, When I search "mike 123", it has to search against all these 6 fields.
In Spring data repository,
The below method works to search only in firstName.
Collection<Object> findByFirstNameLike(String searchInput)
But, I would like to search against all the fields.
I tried,
Collection<Object> findByFirstNameLikeOrLastNameLikeOrTitleLikeOrLocationLikeOrIndustryLikeOrEmailLike(String searchInput,String searchInput1,String searchInput2,String searchInput3,)
Here, even the input string is same, i need to pass the same input as 6 params. Also the method name looks bigger with multiple fields.
Is there anyway to make it simple with #Query or ....
Like,
Collection<Object> findByInput(String inputString)
Also, boosting should be given for one of the field.
For example,
When i search for "mike mat", if there is any match in the firstName, that should be the first one in the result even there are exact match in the other fields.
Thanks
Lets suppose your search term is in the variable query, you can use the method search in ElasticsearchRepository.
repo.search(queryStringQuery(query))
to use queryStringQuery use the following import
import static org.elasticsearch.index.query.QueryBuilders.queryStringQuery;
I found the way to achieve this and posting here. Hope, this would help.
QueryBuilder queryBuilder = boolQuery().should(
queryString("Mike Mat").analyzeWildcard(true)
.field("firstName", 2.0f).field("lastName").field("title")
.field("location").field("industry").field("email"));
Thanks
Not a spring-data elasticsearch expert. But I see two directions you can go. The first would be to use the #Query option. That way you can create your own query. The second would be to use the example in the Filter builder section:
http://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.misc.filter
Within elasticearch you would want to use the multi_match query:
http://www.elastic.co/guide/en/elasticsearch/reference/1.5/query-dsl-multi-match-query.html
In java such a query would look like this:
QueryBuilder qb = multiMatchQuery(
"kimchy elasticsearch",
"user", "message"
);
Example coming from: http://www.elastic.co/guide/en/elasticsearch/client/java-api/current/query-dsl-queries.html#multimatch
We can write our own custom query as below.
we can specific index, routing value (this is used if alias is used)
SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices(INDEX)
.withRoute(yourQueryBuilderHelper.getRouteValue())
.withQuery(yourQueryBuilderHelper.buildQuery(yourSearchFilterRequestObject))
.withFilter(yourQueryBuilderHelper.buildFilter(yourSearchFilterRequestObject)).withTypes(TYPE)
.withSort(yourQueryBuilderHelper.buildSortCriteria(yourSearchFilterRequestObject))
.withPageable(yourQueryBuilderHelper.buildPaginationCriteria(yourSearchFilterRequestObject)).build();
FacetedPage<Ticket> searchResults = elasticsearchTemplate.queryForPage(searchQuery, YourDocumentEntity.class);
Its good to use your own queryBuilder helper which can seperate your elasticSearchService from queryBuilder responsibility.
Hope this helps
Thanks
QueryBuilder class is helpful to query from spring Dao to elastic search:
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryBuilder;
QueryBuilder qb = QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("state", "KA"));
.must(QueryBuilders.termQuery("content", "test4"))
.mustNot(QueryBuilders.termQuery("content", "test2"))
.should(termQuery("content", "test3"));
.should(termQuery("content", "test3"));
Try like this, you can even set importance of the field
QueryBuilder queryBuilder = QueryBuilders.multiMatchQuery(query)
.field("name", 2.0f)
.field("email")
.field("title")
.field("jobDescription", 3.0f)
.type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX);
Another way is using Query String query
Query searchQuery = new StringQuery(
"{\"query\":{\"query_string\":{\"query\":\""+ your-query-here + "\"}}}\"");
SearchHits<Product> products = elasticsearchOperations.search(
searchQuery,
Product.class,
IndexCoordinates.of(PRODUCT_INDEX_NAME));
This will search all the field from your document of specified index

In Solr, how can I get a list of one field ( document id ) for all documents?

I am working with a Solr instance that is populated from an oracle database. As records are added and deleted from the oracle database they are supposed to also be added and removed from Solr.
The schema.xml has this setup, which we use to store the ID that is also the primary key in oracle:
<uniqueKey>id</uniqueKey>
<field name="id" type="string" indexed="true" stored="true"/>
Furthermore the ids are not in sequential order. The solr admin interface has not been much help, I can only see the IDs along with the rest of each record, a few at a time, paginated.
There are about a million documents in this solr core.
I can easily get the IDs of the records from the oracle database, and so I would like to also get a list of the document id's from the solr index for comparison.
I haven't been able to find any information on how to do this but I may be searching
If you really need to get the id of all your documents, use the fl parameter. Something like that:
SolrQuery q = new SolrQuery("*:*&fl=id");
// ^^^^^
// return only the `id` field
q.setRows(10000000);
// ^^^^^^^^
// insanely high number: retrieve _all_ rows
// see: http://wiki.apache.org/solr/CommonQueryParameters#rows-1
return server.query(q).getResults();
(untested):
For simple comparison between the content in Oracle and in Solr, you might just want to count documents:
SolrQuery q = new SolrQuery("*:*");
q.setRows(0);
// ^
// don't retrieve _any_ row
return server.query(q).getResults().getNumFound();
// ^^^^^^^^^^^^^
// just get the number of matching documents
(untested):
In latest Solr (4.10), you can export large number of records.
However, if you really just want one field, you can make a request with that one field and export as CSV. That minimizes the formatting overhead.
For Solr 7 syntax has changed a bit. This is what worked for me (in Java):
CloudSolrClient solrClient = ...;
solrClient.setDefaultCollection("collection1");
SolrQuery q = new SolrQuery("*:*");
q.set("fl", "id");
q.setRows(10000000);
Set<String> uniqueIds = solrClient.query(q).getResults()
.stream().map(x -> (String) x.get("id"))
.collect(Collectors.toSet());

MongoTemplate method or query for finding maximum values from a fileds

I am using MongoTemplate for my DB operations. Now i want to fetch the maximum fields values from the selected result. Can someone guide me how i write the query so that when i pass the query to find method it will return me the desired maximum fields of document . Thanks in advance
Regards
You can find "the object with the maximum field value" in spring-data-mongodb. Mongo will optimize sort/limit combinations IF the sort field is indexed (or the #Id field). Otherwise it is still pretty good because it will use a top-k algorithm and avoid the global sort (mongodb sort doc). This is from Mkyong's example but I do the sort first and set the limit to one second.
Query query = new Query();
query.with(new Sort(Sort.Direction.DESC, "idField"));
query.limit(1);
MyObject maxObject = mongoTemplate.findOne(query, MyObject.class);

Resources