Get distance field from $geoNear aggregation in mongodb - spring

I have the following aggregation command in my Spring project:
NearQuery query = NearQuery.near(longitude,latitude).maxDistance(distance).spherical(true);
agg = newAggregation(
geoNear(query, "distance"),
unwind("rate"),
group("id")
.first("name").as("name")
.sum("$rate.general_rate").as("rate")
.count().as("num_rates")
);
But when i made the mapped with my class, the distance field(from geoNear) is not exist. How can i pass in pipeline the distance to appear with others groups fields?

I faced the same :)
The answer is in your question..
geoNear(query, "distance"),
The above line will look for the property distance (type Double in case of Point) in the class which you are trying to aggregate the result.
Eg., output
"distance" : 420.82602810248557 (in meters)

Related

how to get Unique records from elastic search engine based on a field

I have an elastic search index that stores the list of restaurants in an area. I'm using spring elastic search to query the restaurant based on a given geo-location (lat/long) within 10 miles distance. I have a requirement where I only need to show a restaurant chain once, I'm seeing multiple records in my search result for the restaurant chains because they have the same name but different addresses. I only need to show the nearest restaurant chain restaurant along with the other unique restaurants. Is there a single query that can do that? Below is my code [removed some stuff for brevity!]
public SearchHits<Results> search(List<String> items){
final NativeSearchQueryBuilder searchQuery = new NativeSearchQueryBuilder();
BoolQueryBuilder termsQuery = boolQuery();
termsQuery.should(termsQuery(entry.getKey(), items));
boolQuery.must(termsQuery);
// ...I do additional logic here
searchQuery.withQuery(boolQuery);
// apply the terms aggregation searchQuery.addAggregation(terms(CATEGORIES_KEY).field(CATEGORY).size(BUCKET_SIZE));
Query query = searchQuery.build();
SearchHits<Results> searchHits = elasticsearcTemplate.search(query, Results.class);
return searchHits;
}
I was going thru the documentation of elasticsearch, it turns out...there is a simple fix for that :) I can use Collapse The collapse feature removes the duplicate data based on a field. So I only needed to add this line:
searchQuery.withCollapseField("restaurant_name");
// restaurant_name is what I want unique values on

Lucene scoring: get cosine similarity as scores

I'm trying to solve nearest neighbor search problem.
Here is my code:
// Indexing
val analyzer = new StandardAnalyzer()
val directory = new RAMDirectory()
val config = new IndexWriterConfig(analyzer)
val iwriter = new IndexWriter(directory, config)
val queryField = "fieldname"
stringData.foreach { str =>
val doc = new Document()
doc.add(new TextField(queryField, str, Field.Store.YES))
iwriter.addDocument(doc)
}
iwriter.close()
// Searching
val ireader = DirectoryReader.open(directory)
val isearcher = new IndexSearcher(ireader)
val parser = new QueryParser(queryField, analyzer)
val query = parser.parse("Some text for testing")
val hits = isearcher.search(query, 10).scoreDocs
When I look on the value hits I see scores more then 1.
As far as I know, lucene scoring formula is:
score(q,d) = coord-factor(q,d) · query-boost(q) · cosSim(q,d) · doc-len-norm(d) · doc-boost(d)
But I want to get only cosine similarity in range[0,1] between query and document instead of coord-factor, doc-len-norm and so on.
What is a possible way to achieve it?
If you have gone through this official documentation, you would realize that the rest of the terms in the score expression is important and makes the scoring process more logical and coherent.
But still if you want to achieve a scoring process using only Cosine Similaity, then you can write your custom similarity class. I have used different types of similarity method for document retrieval in my class assignment. So, in short you can write your own similarity method and assign it to the Lucene's index searcher. I am giving an example here which you modify to accomplish what you want.
Write your custom class (you just need to override one method in your class).
import org.apache.lucene.search.similarities.BasicStats;
import org.apache.lucene.search.similarities.SimilarityBase;
public class MySimilarity extends SimilarityBase {
#Override
protected float score(BasicStats stats, float termFreq, float docLength) {
double tf = 1 + (Math.log(termFreq) / Math.log(2));
double idf = Math.log((stats.getNumberOfDocuments() + 1) / stats.getDocFreq()) / Math.log(2);
float dotProduct = (float) (tf * idf);
return dotProduct;
}
}
Then assign your implemented method to index searcher for relevance calculation as below.
IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(indexPath)));
IndexSearcher indexSearcher = new IndexSearcher(reader);
indexSearcher.setSimilarity(new MySimilarity());
Here, i am using tf-idf dot product to compute similarity between query and documents. Formula is,
Two things need to be mentioned here are:
stats.getNumberOfDocuments() returns total number documents in the index.
stats.getDocFreq() returns document frequency for a term appeared in both query and document.
Lucene will now call the score() method that you have implemented to compute relevance score for each of the matched terms; terms that appeare both in query and documents.
This is not an straight forward answer to your question i know but you can use the approach i mentioned above in anyway you want. I implemented 6 different scoring technique in my homework assignment. I hope it will help you too.

Spring data elastic search - Query - Full text search

I am trying to use elastic search for full text search and Spring data for integrating elastic search with my application.
For example,
There are 6 fields to be indexed.
1)firstName
2)lastName
3)title
4)location
5)industry
6)email
http://localhost:9200/test/_mapping/
I can see these fields in the mapping.
Now, I would like to make a search against these fields with a search input.
For example, When I search "mike 123", it has to search against all these 6 fields.
In Spring data repository,
The below method works to search only in firstName.
Collection<Object> findByFirstNameLike(String searchInput)
But, I would like to search against all the fields.
I tried,
Collection<Object> findByFirstNameLikeOrLastNameLikeOrTitleLikeOrLocationLikeOrIndustryLikeOrEmailLike(String searchInput,String searchInput1,String searchInput2,String searchInput3,)
Here, even the input string is same, i need to pass the same input as 6 params. Also the method name looks bigger with multiple fields.
Is there anyway to make it simple with #Query or ....
Like,
Collection<Object> findByInput(String inputString)
Also, boosting should be given for one of the field.
For example,
When i search for "mike mat", if there is any match in the firstName, that should be the first one in the result even there are exact match in the other fields.
Thanks
Lets suppose your search term is in the variable query, you can use the method search in ElasticsearchRepository.
repo.search(queryStringQuery(query))
to use queryStringQuery use the following import
import static org.elasticsearch.index.query.QueryBuilders.queryStringQuery;
I found the way to achieve this and posting here. Hope, this would help.
QueryBuilder queryBuilder = boolQuery().should(
queryString("Mike Mat").analyzeWildcard(true)
.field("firstName", 2.0f).field("lastName").field("title")
.field("location").field("industry").field("email"));
Thanks
Not a spring-data elasticsearch expert. But I see two directions you can go. The first would be to use the #Query option. That way you can create your own query. The second would be to use the example in the Filter builder section:
http://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.misc.filter
Within elasticearch you would want to use the multi_match query:
http://www.elastic.co/guide/en/elasticsearch/reference/1.5/query-dsl-multi-match-query.html
In java such a query would look like this:
QueryBuilder qb = multiMatchQuery(
"kimchy elasticsearch",
"user", "message"
);
Example coming from: http://www.elastic.co/guide/en/elasticsearch/client/java-api/current/query-dsl-queries.html#multimatch
We can write our own custom query as below.
we can specific index, routing value (this is used if alias is used)
SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices(INDEX)
.withRoute(yourQueryBuilderHelper.getRouteValue())
.withQuery(yourQueryBuilderHelper.buildQuery(yourSearchFilterRequestObject))
.withFilter(yourQueryBuilderHelper.buildFilter(yourSearchFilterRequestObject)).withTypes(TYPE)
.withSort(yourQueryBuilderHelper.buildSortCriteria(yourSearchFilterRequestObject))
.withPageable(yourQueryBuilderHelper.buildPaginationCriteria(yourSearchFilterRequestObject)).build();
FacetedPage<Ticket> searchResults = elasticsearchTemplate.queryForPage(searchQuery, YourDocumentEntity.class);
Its good to use your own queryBuilder helper which can seperate your elasticSearchService from queryBuilder responsibility.
Hope this helps
Thanks
QueryBuilder class is helpful to query from spring Dao to elastic search:
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryBuilder;
QueryBuilder qb = QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("state", "KA"));
.must(QueryBuilders.termQuery("content", "test4"))
.mustNot(QueryBuilders.termQuery("content", "test2"))
.should(termQuery("content", "test3"));
.should(termQuery("content", "test3"));
Try like this, you can even set importance of the field
QueryBuilder queryBuilder = QueryBuilders.multiMatchQuery(query)
.field("name", 2.0f)
.field("email")
.field("title")
.field("jobDescription", 3.0f)
.type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX);
Another way is using Query String query
Query searchQuery = new StringQuery(
"{\"query\":{\"query_string\":{\"query\":\""+ your-query-here + "\"}}}\"");
SearchHits<Product> products = elasticsearchOperations.search(
searchQuery,
Product.class,
IndexCoordinates.of(PRODUCT_INDEX_NAME));
This will search all the field from your document of specified index

Is $maxDistanceInKilometers broken?

I'm using parse REST api to query based on location. I'm using the following location queries
latitude = "-33.90483";
longitude = "151.2243";
When I use that, I got response objects. But when I try with the following location query, I got zero objects.
latitude = "-33.89";
longitude = "151.14";
I have this option as well -> "$maxDistanceInKilometers" = 20;
I used http://andrew.hedges.name/experiments/haversine/ to find distance between two coordinates but I found that they are just 7km away from each other so I should at least get some response objects from the second query.
Please let me know if I'm doing something wrong.
Thanks

Getting all zip codes within an n mile radius

What's the best way to get a function like the following to work:
def getNearest(zipCode, miles):
That is, given a zipcode (07024) and a radius, return all zipcodes which are within that radius?
There is a project on SourceForge that could assist with this:
http://sourceforge.net/projects/zips/
It gives you a database with zip codes and their latitude / longitude, as well as coding examples of how to calculate the distance between two sets of coordinates. There is probably a better way to do it, but you could have your function retrieve the zipcode and its coordinates, and then step through each zipcode in the list and add the zipcode to a list if it falls within the number of miles specified.
If you want this to be accurate, you must start with polygon data that includes the location and shape of every zipcode. I have a database like this (used to be published by the US census, but they no longer do that) and have built similar things atop it, but not that exact request.
If you don't care about being exact (which I'm guessing you don't), you can get a table of center points of zipcodes and query points ordered by great circle distance. PostGIS provides great tools for doing this, although you may construct a query against other databases that will perform similar tasks.
An alternate approach I've used is to construct a box that encompasses the circle you want, querying with a between clause on lon/lat and then doing the great-circle in app code.
Maybe this can help. The project is configured in kilometers though. You can modify these in CityDAO.java
public List<City> findCityInRange(GeoPoint geoPoint, double distance) {
List<City> cities = new ArrayList<City>();
QueryBuilder queryBuilder = geoDistanceQuery("geoPoint")
.point(geoPoint.getLat(), geoPoint.getLon())
//.distance(distance, DistanceUnit.KILOMETERS) original
.distance(distance, DistanceUnit.MILES)
.optimizeBbox("memory")
.geoDistance(GeoDistance.ARC);
SearchRequestBuilder builder = esClient.getClient()
.prepareSearch(INDEX)
.setTypes("city")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setScroll(new TimeValue(60000))
.setSize(100).setExplain(true)
.setPostFilter(queryBuilder)
.addSort(SortBuilders.geoDistanceSort("geoPoint")
.order(SortOrder.ASC)
.point(geoPoint.getLat(), geoPoint.getLon())
//.unit(DistanceUnit.KILOMETERS)); Original
.unit(DistanceUnit.MILES));
SearchResponse response = builder
.execute()
.actionGet();
SearchHit[] hits = response.getHits().getHits();
scroll:
while (true) {
for (SearchHit hit : hits) {
Map<String, Object> result = hit.getSource();
cities.add(mapper.convertValue(result, City.class));
}
response = esClient.getClient().prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
if (response.getHits().getHits().length == 0) {
break scroll;
}
}
return cities;
}
The "LocationFinder\src\main\resources\json\cities.json" file contains all cities from Belgium. You can delete or create entries if you want too. As long as you don't change the names and/or structure, no code changes are required.
Make sure to read the README https://github.com/GlennVanSchil/LocationFinder

Resources