Querying geoshape results from different indexes - elasticsearch

I have the following scenario: I have an index with all possible ads of my platform. These ads contain an object with coordinates (latitude/longitude).
{
"id": "123",
"slug": "my-ad-slug",
"location": {
"coordinates": {
"latitude": 1.123456,
"longitude": 1.987654
}
}
I also have another index with some locations and their polygons and shapes for geolocation searches.
{
"id": "456",
"name": "my location",
"geo_location": {
"type": "Polygon",
"coordinates": [...]
}
}
My question is: how can I query all ads that are within a certain polygon, since I have two different indexes in this case? Do you see an easy way on doing so?
Thanks y'all!

Related

Elasticsearch - Sort By Distance Not Working?

I have an index where the records are stored in the following format:
"_source": {
"name": "ACME Pallets",
"about": null,
"slug": "acme-pallets",
"serviceAreas": [
{
"admin1": "usa",
"admin2": null,
"admin3": null,
"admin4": null,
"countryCode": "US",
"googlePlaceId": null,
"locality": null,
"selectedLevel": "admin1"
}
],
"id": "fadsflsjdfkk3234234",
"addresses": [
{
"address1": "4342 Dietrich Rd",
"address2": null,
"city": "San Antonio",
"countryCode": "US",
"latitude": 29.44122,
"longitude": -98.34404,
"primary": true,
"name": "office",
"postal": "78219",
"province": "TX",
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}
]
}
I am trying to return results from this index where the records are sorted by distance to the search point I pass in. My sort config being passed in looks like this:
_geo_distance: {
'addresses.location': { lat: 31.75917, lon: -106.48749 },
order: 'asc',
unit: 'mi',
mode: 'min'
}
The results I receive back are not sorted according to distance. If I manually plot out the individual locations on a map and the search pin passed in, I can see that the sorting is out of order.
If I pass in a sorting config to my search to sort by alphabetically order or to sort by relevance (aka _score), the sorting returned is correct.
Does anyone know why ES might be returning my results incorrectly when sorting by distance?
addresses is an array in my index. Each object inside of addresses has a property called location of type geo_point.
From all the documentation that I've read, passing 'addresses.location': { lat: 31.75917, lon: -106.48749 } into the search should work, but it doesn't. ES should be smart enough to find the location geo point in each object and use that as the reference when calculating the distance. If there are more than one object inside of the addresses array, then ES by default should get the center point of all the objects inside of addresses and use that to calculate the distance from the search point.
In my case, I don't have any data where addresses has more than one object. I ended up creating a location geo_point property outside of the addresses property during index build and then passing in location: { lat: 31.75917, lon: -106.48749 } for the search. This made ES sort results based on distance correctly.
What my new index looks like with the added location property:
"_source": {
"name": "ACME Pallets",
"about": null,
"slug": "acme-pallets",
"serviceAreas": [
{
"admin1": "usa",
"admin2": null,
"admin3": null,
"admin4": null,
"countryCode": "US",
"googlePlaceId": null,
"locality": null,
"selectedLevel": "admin1"
}
],
"id": "fadsflsjdfkk3234234",
"addresses": [
{
"address1": "4342 Dietrich Rd",
"address2": null,
"city": "San Antonio",
"countryCode": "US",
"latitude": 29.44122,
"longitude": -98.34404,
"primary": true,
"name": "office",
"postal": "78219",
"province": "TX",
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}
]
"location": {
"lat": 29.44156,
"lon": -98.37704
}
}

Indexing In ElasticSearch For Auditing

There is a microservice-based architecture wherein each service has a different type of entity. For example:
Service-1:
{
"entity_type": "SKU",
"sku": "123",
"ext_sku": "201",
"store": "1",
"product": "abc",
"timestamp": 1564484862000
}
Service-2:
{
"entity_type": "PRODUCT",
"product": "abc",
"parent": "xyz",
"description": "curd",
"unit_of_measure": "gm",
"quantity": "200",
"timestamp": 1564484863000
}
Service-3:
{
"entity_type": "PRICE",
"meta": {
"store": "1",
"sku": "123"
},
"price": "200",
"currency": "INR",
"timestamp": 1564484962000
}
Service-4:
{
"entity_type": "INVENTORY",
"meta": {
"store": "1",
"sku": "123"
},
"in_stock": true,
"inventory": 10,
"timestamp": 1564484864000
}
I want to write an Audit Service backed by elasticsearch, which will ingest all these entities and it will index based on entity_type, store, sku, timestamp.
Will elasticsearch be a good choice here? Also, how will the indexing work? So, for example, if I search for store=1, it should return all the different entities that have store as 1. Secondly, will I be able to get all the entities between 2 timestamps?
Will ES and Kibana (to visualize) be good choices here?
Yes. Your use case is pretty much exactly what is described in the docs under filter context:
In filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Does this timestamp fall into the range 2015 to 2016?
Is the status field set to published?

How can I manage sitemaps in Kentico Cloud?

There used to be sitemaps functionality but it got deprecated. Taxonomies are suggested as a replacement but when I request items from the API, the taxonomy elements lack the hierarchical structure. How do I search for an item that represents a parent page in website structure? Thank you.
You can still do that with Delivery API. First, you need to create and and organize your taxonomy group the same way you would organize your sitemap. Consider following sitemap as an example:
Home
About
Our team
Management
Contact us
Mission&Values
This is what the taxonomies would look like in Kentico Cloud:
Models for your items need to be created with taxonomy element, that would serve as a sitemap location selector. When you retrieve this element from the item, it will give you a list of terms that the item is associated with. If you tick two terms with the item (Contact us, Our team), this is what the element would look like in the API:
{
"item": {
"system": {
"id": "8a9e7010-c79b-41c5-a0bc-4f20c9c233b8",
"name": "Example item - contact form",
"codename": "example_item___contact_form",
"language": "default",
"type": "example_content_model",
"sitemap_locations": [],
"last_modified": "2019-05-13T08:20:50.3173519Z"
},
"elements": {
"sitemap": {
"type": "taxonomy",
"name": "Sitemap",
"taxonomy_group": "sitemap",
"value": [
{
"name": "Contact Us",
"codename": "contact_us"
},
{
"name": "Our team",
"codename": "our_team"
}
]
}
}
},
"modular_content": {}
}
As you can see you get information about taxonomy group codename and a flat list of name and codename pairs of each ticked term. To get the hierarchical structure, you need to make second call to retrieve the taxonomy group, which will yield following:
{
"system": {
"id": "0b4e3da2-8699-4b4d-961c-1fe912c91570",
"name": "Sitemap",
"codename": "sitemap",
"last_modified": "2019-05-13T08:01:34.6109452Z"
},
"terms": [
{
"name": "Home",
"codename": "home",
"terms": []
},
{
"name": "About",
"codename": "about",
"terms": [
{
"name": "Our team",
"codename": "our_team",
"terms": [
{
"name": "Management",
"codename": "management",
"terms": []
},
{
"name": "Contact Us",
"codename": "contact_us",
"terms": []
}
]
},
{
"name": "Mission & Values",
"codename": "mission___values",
"terms": []
}
]
}
]
}
Which reflects needed hierarchy. You can compare the codenames you got from your item to the position of the taxonomy term in your group - to get the parent taxonomy term just get the parent JSON node. If you need to figure out the parent item itself, you can call Delivery API again and use one of the array filters to get all the items marked with the parent sitemap location.

How to insert 100 million entries in FireBase database without painting myself into a corner

I am learning and trying to understand how to optimizing my FireBase database for my use case.
Looking at the code below: Lets say that the US/STREET_ADDRESS get pushed 150 million entries(the number of street addresses in the US), and I want to sort on the "path": key, since its a unique identifier for all addresses. What would the performance be here when it comes to responce time? If this setup regarding my question is not advicable how can I change the US/STREET_ADDRESS?
Reading about fan-out but dont think i need that because of no multichat or other multi anything. In my case I simply want to create a databas that holdes the intere worlds all street addresses, for whatever reason :).
Should I denormalize the US/STREET_ADDRESS even more, like splitting the 10 million entries into US/A/STREET_ADDRESS, US/B/STREET_ADDRESS and US/C/STREET_ADDRESS. This can be done since the "path": "US/California/Orange County/3138 E Maple Ave" in the sample code below is unique, and could even be used as a Key instead of Firebase postsRef.push() auto key, that this code sample use. But still there will be 100 millions
{
"AE": {
"name": "United Arab Emirates"
},
"GB": {
"name": "United Kingdom"
},
"US": {
"name": "United States",
"STREET_ADDRESS": {
"-KUxrqYlItjme2v1I_W5": {
"id": "hjg86-tg33-8hu4-yh5u",
"path": "US/California/Orange County/3138 E Maple Ave"
},
"-KUJHjj7HG5gGHNNJ_T5": {
"id": "ds86-tg12-7eu4-juw3",
"path": "US/Florida/Tampa/104 Biscayne Ave"
}
}
},
"CH": {
"name": "Switzerland"
},
"SY": {
"name": "Syrian Arab Republic"
},
"TW": {
"name": "Taiwan"
},
"TJ": {
"name": "Tajikistan"
},
"TZ": {
"name": "Tanzania, United Republic of"
},
"TH": {
"name": "Thailand"
}
}

ElasticSearch - Querying only for particular array elements that are not empty

I'm relatively new to ES and am having difficulty finding really good references or tutorials on the query dsl.
We have a document type of the example below. The query I wish to conduct is thus: "Return all the email_package records that have at least one entities record (one record in the 'entities' array)." And yes I want the complete 'email' record.
Could anyone assist? Also if you could point to a reference or tutorial or cookbook somewhere that addresses question like this, that would be also greatly appreciated.
"email_package": {
"email": {
"date": "2007-02-13T18:24:22-04:00",
"subject": "this is the subject",
"body": "this is the body"
},
"entities": [
{
"Louisville": {
"City": "South"
}
},
{
"Memphis": {
"City": "South"
}
}
]
}
// more 'email_package records follow...
Your document is a bit problematic, since you seems to be nesting objects and giving them different names. If you are not bound to the current structure, I would have changed the mapping into something that is more manageable, and queries will be straight forward, e.g:
"email_package": {
"email": {
"body": "this is the body1",
"date": "2007-02-13T18:24:22-04:00",
"subject": "this is the subject"
},
"entities": [
{
"name": "Louisville"
"City": "South",
},
{
"name": "Memphis"
"City": "South",
}
]
}
Query:
{ "filter": {
"exists": {
"field": "email_package.entities.name"
}
}

Resources