Elasticsearch & X-Pack: how to get vertices/connections from nested documents - elasticsearch

I just started using X-Pack for Elasticsearch and want to connect vertices from a nested document type. However, looking for documentation on this hasn't got me anywhere.
What I have is an index of documents which have person names/ids as nested documents (one document can have many persons, one person can be related to many documents). The desired result is to get a graph data with connections between persons.
Does anyone have a clue or can tell me if this is even possible?
Part of my mappings:
mappings: {
legend: {
properties: {
persons: {
type: 'nested',
properties: {
id: {
type: 'string',
index: 'not_analyzed'
},
name: {
type: 'string',
index: 'not_analyzed'
}
}
}
}
}
}
And my Graph API query, which of course doesn't work because I don't know how to handle the "name" field of the nested "persons" field.
POST sagenkarta_v3/_xpack/_graph/_explore
{
"controls": {
"use_significance": true,
"sample_size": 20000,
"timeout": 2000
},
"vertices": [
{
"field": "persons.name"
}
],
"connections": {
"vertices": [
{
"field": "persons.name"
}
]
}
}
Thanks in advance!

The following question was discussed here:
https://discuss.elastic.co/t/elasticsearch-x-pack-how-to-get-vertices-connections-from-nested-documents/88709
quote from Mark_Harwood - Elastic Team Member:
Unfortunately Graph does not support nested documents but you can use
copy_to in your mappings to put the person data in an indexed field in
the containing root document.
I can see that you have the classic problem of
"computers-want-IDs-but-people-want-labels" and have both these
values. In Graph (and arguably the rest of Kibana too) I suggest you
use tokens that combine IDs for uniqueness' sake and names for
readability by humans.
The copy_to and IDs-and-labels tips are part of the modelling
suggestions in my elasticon talk this year:
https://www.elastic.co/elasticon/conf/2017/sf/getting-your-data-graph-ready
3

Related

min_score excluding documents with higher scores

I have a trove of several million documents which I'm querying like this:
const query = {
min_score: 1,
query: {
bool: {
should: [
{
multi_match: {
query: "David",
fields: ["displayTitle^2", "synopsisList.text"],
type: "phrase",
slop: 2
}
},
{
nested: {
path: "contributors",
query: {
multi_match: {
query: "David",
fields: [
"contributors.characterName",
"contributors.contributionBy.displayTitle"
],
type: "phrase",
slop: 2
}
},
score_mode: "sum"
}
}
]
}
}
};
This query is giving sane looking results for a wide range of terms. However, it has a problem with "David" - and presumably others.
"David" crops up fairly regularly in the text. With the min_score option this query always returns 0 documents. When I remove min_score I get thousands of documents the best of which has a score of 22.749.
Does anyone know what I'm doing wrong? I guess min_score doesn't work the way I think it does.
Thanks
The problem I was trying to solve was that when I added some filter clauses to the above query elastic would return all the documents that satisfied the filter even those with a score of zero. That's how should works. I didn't realise that I can nest the should inside a must which achieves the desired effect.

Elasticsearch multiple suggestions with more advanced cases like matching prefix in the middle of a sentence

My use case : I have a search bar when the user can type his query. I want to show multiple types of search suggestions to the user in addition to a regular query suggestion. For example, in the screenshot below, as you can see in this screenshot, you can see there are company sector, companies, and schools suggestions.
This is currently implemented using completion suggesters and the following mappings (this is code from our Ruby implementation, but I believe you should be able to understand it easily)
{
_source: '',
suggest: {
text: query_from_the_user, # User query like "sec" to find "security" related matches
'school_names': {
completion: {
field: 'school_names_suggest',
},
},
'companies': {
completion: {
field: 'company_name.suggest',
},
},
'sectors': {
completion: {
field: sector_field_based_on_current_language(I18n.locale),
# uses 'company_sector.french.suggest' when the user browses in french
},
},
},
}
Here are my mappings (this is written in Ruby as but I believe it shouldn't be too hard to mentally convert this to Elasticsearch JSON config
indexes :company_name, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :company_sector, type: 'object' do
indexes :french, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :english, type: 'text' do
indexes :suggest, type: 'completion'
end
end
indexes :school_names_suggest, type: 'completion'
# sample Indexed JSON
{
company_name: "Christian Dior Couture",
company_sector: {
english: 'Milk sector',
french: 'Secteur laitier'
},
school_names_suggest: ['Télécom ParisTech', 'Ecole Centrale Paris']
}
The problem is the suggestion is not powerful enough and cannot autocomplete based on the middle of a sentence and provide additional results even after a perfect match. Here are some scenarios that I need to capture with my ES implementation
CASE 1 - Matching by prefix in the middle of a sentence
# documents
[{ company_name: "Christian Dior Couture" }]
# => A search term "Dior" should return this document because it matches by prefix on the second word
CASE 2 - Provide results even after a perfect match
# documents
[
{ company_name: "Crédit Agricole" },
{ company_name: "Crédit Agricole Pyrénées Gascogne" },
]
# => A search term "Crédit Agricole" should return both documents (using the current implementation it only returns "Crédit Agricole"
Can I implement this using suggesters in Elasticsearch ? Or do I need to fall back to multiple search that would take advantage of the new search-as-you-type data type using a query as mentionned in the doc ?
I am using elasticsearch 7.1 on AWS and the Ruby driver (gem elasticsearch-7.3.0)

How to get unseen, nearby documents from a Amplify - AppSync - ElasticSearch - DynamoDB Stack?

Problem:
Use Amplify.js from AWS.
A Tinder similar app.
Here you can find jobs close by.
These may only be seen once.
We should save what the user likes and dislikes.
What I've already managed:
I have the scheme:
type Query {
nearbyJobs(location: LocationInput!, km: Int): ModelJobConnection
}
type User #model {
id: ID!
name: String
interacts: [Jobinteract] #connection(name: "interactsuser")
createdAt: String
updatedAt: String
}
type Job #model #searchable {
id: ID!
name: String
location: Location
is_swiped_by: AWSJSON
interacts: [Jobinteract] #connection(name: "interactjob")
createdAt: String
updatedAt: String
}
With #searchable I have established the connection to ElasticSearch. Since this seems to be the only way to search for jobs nearby.
Now it becomes tricky.
At the moment I save in the field: is_seen_from_user all users id´s who have already seen this job. Since there were about 1000 users so far, that was ok.
This was my es query:
"body": {
"size": 30,
"sort": [
{
"createdAt": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"createdAt": {
"gte": "now-30d/d"
}
}
}
],
"must_not": {
"match_phrase": {
"is_swiped_by.user": "$ctx.identity.sub"
}
},
"filter": {
"geo_distance": {
"distance" : "${distance}km",
"location" : $util.toJson($ctx.args.location)
}
}
}
}
is_swiped_by.user
So I looked into the array to see if the user was there.
if yes - skip.
But now I rather have the problem that there can be more users.
This means, I can't save it into a field anymore.
There would probably have to be a new table.
type Jobinteract #model {
id: ID!
user: User! #connection(name: "interactsuser")
job: Job! #connection(name: "interactjob")
decision: Int
createdAt: String
updatedAt: String
}
The question now is. If I have the table (Jobinteract) now. Should I make it #searchable too?
Then I also have the data in ElasticSearch. But how can I bring them together?
It is then data from different indexes.
I read hasChild in ES. But don't understand exactly how this should work, if it's the right way?!
i'm also currently testing whether i can get access to ES via a lambda, so i'd just call up all the jobs nearby and compare them myself.
But that's probably not the best option.
Get 100 jobs from nearby from Elasticsearch, compare it to the table below. If there are 50 left, send them to the frontend, if not, get 100 again.
The more the user liked, the longer this call would go.
The #searchable directive does not currently support custom ElasticSearch mappings out of the box so you will need to perform some custom setup for your ElasticSearch cluster. You should be able to use joining queries such as hasChild to find all locations where there is no associated child record in the same index that indicates the user has interacted with the job before.
As of writing, the #searchable directive stores different #models in separate indexes so you will need to write a custom resolver that puts the "Interaction" child record in the same index that specifies when a user has interacted with a job and then you will need to update the ES index mapping such that it uses a join data type so you can use the hasChild query. See https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html for more information.

How to add "context" to Elastic Search suggestions

I'm building a Enterprise social network.
I want to suggest people to add as friend, based on their title.
For example, the value can be: developer, blogger, singer, barber, bartender ...
My users are saved into ElasticSearch, their titles are saved in the field 'title'.
The current mapping is:
title: {
type: 'text',
analyzer: 'autocomplete_analyzer',
search_analyzer: 'autocomplete_analyzer_search'
}
and the query is:
should: [
{
match: {
title: {
query: user.title,
minimum_should_match: '90%',
boost: 2
}
}
}
]
and the analyzers definitions are:
indexConfig: {
settings: {
analysis: {
analyzer: {
autocomplete_analyzer: {
tokenizer: 'autocomplete_tokenizer',
filter: ['lowercase', 'asciifolding']
},
autocomplete_analyzer_search: {
tokenizer: 'lowercase',
filter: ['asciifolding']
},
phrase_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'asciifolding', 'fr_stop', 'fr_stemmer', 'en_stop', 'en_stemmer']
},
derivative_analyzer: {
tokenizer: 'standard',
filter: ['lowercase', 'asciifolding', 'derivative_filter', 'fr_stop', 'fr_stemmer', 'en_stop', 'en_stemmer']
}
},
tokenizer: {
autocomplete_tokenizer: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
token_chars: ['letter', 'digit']
}
},
filter: {
derivative_filter: {
type: 'word_delimiter',
generate_word_parts: true,
catenate_words: true,
catenate_numbers: true,
catenate_all: true,
split_on_case_change: true,
preserve_original: true,
split_on_numerics: true,
stem_english_possessive: true
},
en_stop: {
type: 'stop',
stopwords: '_english_'
},
en_stemmer: {
type: 'stemmer',
language: 'light_english'
},
fr_stop: {
type: 'stop',
stopwords: '_french_'
},
fr_stemmer: {
type: 'stemmer',
language: 'light_french'
}
}
}
}
}
I tested it, the relevance is very good, but they are not enough users matched by this, because of the '90%' criteria.
A quick and dirty solution is to lower this criteria to 50% of course.
However, If I do that, I suppose that Elastic will search titles based on the concordance of the letters in the title, rather that the relevance of the proximity between titles.
For example, If my user is a 'barber', ElasticSearch might suggest 'bartender', because they have in common: b,a,r,e,r
Hence, I have two questions:
1 - is my assumption correct ?
2 - what can I do to add more relevance on my titles search ?
The problem with your search is following - it uses autocomplete_analyzer, which is basically creates a huge index with a lot of n-grams.
Example for bartender would be something like ba, bar, bart, etc.
As you could see, for barber you will have a bit similar n-grams, which would make a match.
Regarding your questions, if you would lower the minimum_should_match you will get more results, but that's just because the following matching procedure will lead to partial matches.
To increase the relevancy - I would recommend to use another analyzer, since this n-gram analyzer is usually suitable only for autosuggest functionality, which isn't the case. There could be several choices from keeping it simple to keyword analyzer, or whitespace one.
What would be more important is to properly construct the query. For example if user searches for partial title, e.g bar, you may use prefix query. However, if you're searching just by full match (e.g. developer or bartender) it would be more important to just normalize title field properly. E.g. to use lowercase analyzer with some stemming.

Elasticsearch Autocomplete on Specific field in Specific Document

I have Documents that contain many fields which are lists of values.
I would like to be able to autocomplete from one specific such field at a time in one specific document without data duplication (like Completion Suggestors)
For example, I would like to be able to autocomplete after 3 characters from the values in the category field of the document with id: '7'.
I tried to implement something based on this but this doesn't seem to work on a list of values.
For filtering the suggestions by a field, you can add the fields to filter on in context.
"category":{
type: "completion",
payloads: false,
context: {
id: {
type: "category",
path: "id"
}
}
}
You can index the document as :
POST /myindex/myitem/1
{
id: 123,
category: {
input: "my category",
context: {
id: 123
}
}
}
The minimum length check has to be applied on the client side. ES suggesters do not provide anything like that.
Now, you can suggest on category field with a filter on id field.

Resources