Elasticsearch - Sort By Distance Not Working? - elasticsearch

I have an index where the records are stored in the following format:
"_source": {
"name": "ACME Pallets",
"about": null,
"slug": "acme-pallets",
"serviceAreas": [
{
"admin1": "usa",
"admin2": null,
"admin3": null,
"admin4": null,
"countryCode": "US",
"googlePlaceId": null,
"locality": null,
"selectedLevel": "admin1"
}
],
"id": "fadsflsjdfkk3234234",
"addresses": [
{
"address1": "4342 Dietrich Rd",
"address2": null,
"city": "San Antonio",
"countryCode": "US",
"latitude": 29.44122,
"longitude": -98.34404,
"primary": true,
"name": "office",
"postal": "78219",
"province": "TX",
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}
]
}
I am trying to return results from this index where the records are sorted by distance to the search point I pass in. My sort config being passed in looks like this:
_geo_distance: {
'addresses.location': { lat: 31.75917, lon: -106.48749 },
order: 'asc',
unit: 'mi',
mode: 'min'
}
The results I receive back are not sorted according to distance. If I manually plot out the individual locations on a map and the search pin passed in, I can see that the sorting is out of order.
If I pass in a sorting config to my search to sort by alphabetically order or to sort by relevance (aka _score), the sorting returned is correct.
Does anyone know why ES might be returning my results incorrectly when sorting by distance?

addresses is an array in my index. Each object inside of addresses has a property called location of type geo_point.
From all the documentation that I've read, passing 'addresses.location': { lat: 31.75917, lon: -106.48749 } into the search should work, but it doesn't. ES should be smart enough to find the location geo point in each object and use that as the reference when calculating the distance. If there are more than one object inside of the addresses array, then ES by default should get the center point of all the objects inside of addresses and use that to calculate the distance from the search point.
In my case, I don't have any data where addresses has more than one object. I ended up creating a location geo_point property outside of the addresses property during index build and then passing in location: { lat: 31.75917, lon: -106.48749 } for the search. This made ES sort results based on distance correctly.
What my new index looks like with the added location property:
"_source": {
"name": "ACME Pallets",
"about": null,
"slug": "acme-pallets",
"serviceAreas": [
{
"admin1": "usa",
"admin2": null,
"admin3": null,
"admin4": null,
"countryCode": "US",
"googlePlaceId": null,
"locality": null,
"selectedLevel": "admin1"
}
],
"id": "fadsflsjdfkk3234234",
"addresses": [
{
"address1": "4342 Dietrich Rd",
"address2": null,
"city": "San Antonio",
"countryCode": "US",
"latitude": 29.44122,
"longitude": -98.34404,
"primary": true,
"name": "office",
"postal": "78219",
"province": "TX",
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}
]
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}

Related

Turn a JSON array of key/value pairs into object properties

I'm trying to use JSONata to convert arrays of "key/value" objects into properties of the parent object. My input looks like this:
[
{
"city": "Ottawa",
"properties": [
{
"name": "population",
"value": 37
},
{
"name": "postalCode",
"value": 10001
},
{
"name": "founded",
"value": 1826
}
]
},
{
"city": "Toronto",
"properties": [
{
"name": "population",
"value": 54
},
{
"name": "postalCode",
"value": 10002
}
]
}
]
I'm struggling to generate the output I need, I've seen examples that reference explicit elements, like in this answer, but I need the properties to be converted "dynamically" since I don't know them in advance. I think I need something like this, but I'm missing some particular function:
$[].{
"city": city,
properties.name: properties.value
}
This is the output I need to generate:
[
{
"city": "Ottawa",
"population": 37,
"postalCode": 10001,
"founded": 1826
},
{
"city": "Toronto",
"population": 54,
"postalCode": 10002
}
]
The properties arrays don't always contain the same keys, but the city attributes are always present.
You can use the reduce operator, as described in the Grouping docs here:
$[].(
$city := city;
properties{ "city": $city, name: value }
)
You can play with it live: https://stedi.link/uUANwtE
Please try this expression.
$[].{
"city": $.city,
$.properties[0].name: $.properties[0].value,
$.properties[1].name: $.properties[1].value,
$.properties[2].name: $.properties[2].value,
$.properties[3].name: $.properties[3].value
}
https://try.jsonata.org/s1Ea4kUvo

Validation of each field in JSON response using karate API

I want to validate particular fields in the response whether it is integer or float(ex: fullbathrooms field). I tried below code but getting match failed error. Could you please help here ?.....Thanks
Given path '/property-client'
And request {"address": <address>,"city": <city>,"state": <state>,"zipCode": <zipCode>}
When method post
Then status 200
And print response
And match response == {fullbathrooms:'#number'}
Examples:
|read('testFile1.csv')|
Error : match failed: EQUALS
Actual response:
{
"success": true,
"message": {
"version": "1.0",
"response": {
"id": "94568859",
"type": "express",
"responseheader": null,
"reportdata": {
"property": {
"source": null,
"type": null,
"dom": null,
"propertytype": "Single Family Residence",
"standardtype": null,
"address": {
"documentid": null,
"number": "150",
"directional": null,
"street": "BRIDGE",
"suffix": "RD",
"postdirectional": null,
"unit": "",
"city": "HILLSBOROUGH",
"state": "CA",
"zip": "94010",
"zipplus4": "6908",
"fulladdress": "150 BRIDGE RD, HILLSBOROUGH, CA 94010"
},
"info": {
"type": null,
"fips": "6081",
"county": "San Mateo",
"bedrooms": "5",
"bathrooms": "6.50",
"fullbathrooms": "6.50",
"totalrooms": "0",
"livingarea": "7750",
"totallivingarea": "7750",
"landarea": "41382",
"landareatype": null,
"pool": "true",
"landvalue": "6904800",
"improvementvalue": "3284414",
"assessedvalue": "10189214",
"assessedyear": "2021",
"taxvalue": "11746898",
"taxyear": "2021",
"deliquentyear": null,
"yearbuilt": "2011",
"propertytax": null,
"approxage": "11",
"parcelnumber": "032-400-110",
"titlecompany": null,
"geocode": {
"latitude": "37.563272",
"longitude": "-122.334442",
"geoqualitycode": ""
}
}
Please take some time to read the documentation: https://github.com/karatelabs/karate#match-contains
I'm not going to refer to your response dump (which by the way is not valid JSON), but give you a simple example. Please pay attention to the structure of your JSON. And note that the 6.50 is a string not a number in your response.
* def response = { "foo": { "bar": { "fullbathrooms": "6.50" } } }
* match response.foo.bar == { fullbathrooms: '#string' }
If you want to validate numbers within strings, please refer other answers, for example: https://stackoverflow.com/search?q=%5Bkarate%5D+number+regex

Querying geoshape results from different indexes

I have the following scenario: I have an index with all possible ads of my platform. These ads contain an object with coordinates (latitude/longitude).
{
"id": "123",
"slug": "my-ad-slug",
"location": {
"coordinates": {
"latitude": 1.123456,
"longitude": 1.987654
}
}
I also have another index with some locations and their polygons and shapes for geolocation searches.
{
"id": "456",
"name": "my location",
"geo_location": {
"type": "Polygon",
"coordinates": [...]
}
}
My question is: how can I query all ads that are within a certain polygon, since I have two different indexes in this case? Do you see an easy way on doing so?
Thanks y'all!

couchbase cbimport key generation with key in nest json

I have the following json document to be uploaded to a full text search index
{
"bank_id": {
"country_code": "AT",
"bank_code": "ASPKAT2LXXX",
"bank_code_type": "SWIFT_CODE"
},
"institution": {
"name": "Allgemeine Sparkasse Oberoesterreich Bankaktiengesellschaft"
},
"address": {
"address_line_1": "Promenade 11-13",
"address_line_2": "Linz",
"country_code": "AT"
},
"bank_capabilities": ["INSTANT_CREDIT"],
"payment_network_details": [{
"network": "REAL_TIME_PAYMENTS",
"capabilities": ["INSTANT_CREDIT"]
}]
}
And I want to generate key with country_code and bank_code. How can i accomplish this?
tried with -g %bank_id.country_code%::%bank_id.bank_code% and it is not working.

Best approch of Elastic Search time based feeds module?

I am new with elastic search and looking for the best solution with which i can create a feed module which have time based feeds along with there group and comment.
I learned little and come up with following.
PUT /group
{
"mappings": {
"groupDetail": {},
"content": {
"_parent": {
"type": "groupDetail"
}
},
"comment": {
"_parent": {
"type": "content"
}
}
}
}
so that will be placed separately as per index.
but than after i found one post where i found that parent child is costly operation for search than nested objects.
something like following is two group(feed) having details with content and comments as nested element.
{
"_index": "group",
"_type": "groupDetail",
"_id": 6829,
"_score": 1,
"_source": {
"groupid": 6829,
"name": "Jignesh Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": true,
"tags": [
"spotrs",
"surat"
],
"content": [
{
"contentid": 1,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 1"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 2,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
{
"_index": "group",
"_type": "groupDetail",
"_id": 6849,
"_score": 1,
"_source": {
"groupid": 6849,
"name": "Xyz Group Public",
"insdate": "2016-10-01T04:09:33.916Z",
"upddate": "2017-04-19T05:19:40.281Z",
"isVerified": false,
"tags": [
"spotrs",
"food"
],
"content": [
{
"contentid": 3,
"type": "1",
"byUser": 5858,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 1,
"v": "lorem ipsum long text 3"
},
{
"t": 2,
"v": "http://www.imageurl.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
},
{
"contentid": 4,
"type": "2",
"byUser": 5859,
"insdate": "2016-10-01 11:20",
"info": [
{
"t": 4,
"v": "http://www.videoURL.com/1"
}
],
"comments": [
{
"byuser": 5859,
"comment": "Comment 1",
"upddate": "2016-10-01T04:09:33.916Z"
},
{
"byuser": 5860,
"comment": "Comment 2",
"upddate": "2016-10-01T04:09:33.916Z"
}
]
}
]
}
}
now if i try to think with nested object than i confused if user add comment very frequently than reindexing factor will effect?
So main think i want to ask is which is the best approach with which i can add comment frequently and my content searching result is also faster.
Performance
Parent/child stores relevant data in same shards, as separately doc, which avoid the network;
Parent/child needs a joining process when retrieving data;
Nested object store the inner and outer object together, as a single doc;
So, we can infer:
Update nested object will re-index whole index, which can very expensive if your document is large;
Update parent or child alone will not affect the other one;
Searching nested object is a little fast, which save the process of joining;
Suggestions
As far as I understand your problem, you should use parent/child.
When your group's comments become more and more, adding a new comment will still re-index whole content, which can be very time-consuming;
On the other hand, search a comment with parent/child just need one more look up after finding the child, which is relative acceptable.
Furthermore, you should also take the rate of searching a comment comparing to adding a comment into account:
If you need searching a lot but a little new comments, maybe you can choose nested object;
Otherwise, choose parent/child;
By the way, you may combine both of them:
When this feed is active, use parent/child to store them;
When it is closed, i.e., no more comments can be added, move them to a new index with nested object;
If you do not specify more detailed info other than very frequently it is going to be hard to come up with a recommendation. Also you have not mentioned how your data looks like. A comment in a blog post might be happening rare, even in heated discussions. A comment/reply in a forum post (that will result in a huge document) might be sth very different. I'd personally start with nested and see how it goes, but I also do not know all the requirements, so this might be a very wrong answer.

Resources