Amazon Neptune full text search query not working as expected

Amazon Neptune full text search query not working as expected - elasticsearch

I am trying to implement a full-text search for Neptune DB using elasticsearch manually but getting this error :
{"requestId":"bcb16f6b-7e60-4e71-b0d8-a6a4a9b38b00","code":"MalformedQueryException","detailedMessage":"Failed to interpret Gremlin query: null"}
Here is my document:
{
"entity_id": "f8b9726f-74f9-a0e0-5fbd-b609bbb14f89",
"entity_type": [
"suggestions"
],
"document_type": "vertex",
"predicates": {
"title": {
"value": "samsung mobile"
}
}
}
query:
g.withSideEffect('Neptune#fts.endpoint','elasticsearch cluster end point').withSideEffect('Neptune#fts.queryType', 'match').V().has('title','Neptune#fts samsung').local(values('title').fold()).limit(5).valueMap().toList()
it is giving error only if I am putting an existing word in search i.e Samsung but if I am searching for an unavailable word it worked fine not throwing any error.
Not sure what is wrong here, can anyone help me with this?

The local step you showed will, for each 'title' property found, create a list with that property in it. Without the local step all values found would be wrapped into a single list if you just did values('title').fold() .
Note, however, and this is probably why your query was failing, that you cannot add a valueMap step after that local step as you would be trying to apply valueMap not to vertices but to one or more lists of strings coming out of the local step.

Related

Painless (Elasticsearch) can't use keyword - script error

I'm trying to create a scripted field in Kibana, which checks whether the field "Direction" is "I" or not.
if (doc['Direction'].value != "I") {return 1;} else {return 0;}
But for some reason, it won't work. With all other fields, that aren't explicitly mentioned in the index mapping it works that way, but I had to mention Direction in the mapping because I also have an alias pointing to it. For Direction I put the following in the mapping file:
"Direction": {
"type": "keyword"
}
And there is also an alias pointing to Direction:
"ISDN_Direction": {
"path": "Direction",
"type": "alias"
}
but both fields can't be used in the painless script. I don't get an error, but the result preview, for the first 10 results, is just empty.
Can someone help me with this issue?

I found the problem!
I changed the data type mapping, but I still had indices in my ES DB that had an old mapping on "text". Kibana didn't show me a mapping conflict, since both, text and keywords, are strings.
I deleted the old indices which mapped the field to "text" and now the painless calculation works without any problem :slight_smile:

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?

I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values

OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]

The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

Unable to loop through array field ES 6.1

I'm facing a problem in ElasticSearch 6.1 that I cannot solve and I don't know why. I have read the docs several times and maybe I'm missing something.
I have a scripted query that needs to do some calculation before decides if a record is available or not.
Here is the following script:
https://gist.github.com/dunice/a3a8a431140ec004fdc6969f77356fdf
What I'm doing is trying to loop though an array field with the following source:
"unavailability": [
{
"starts_at": "2018-11-27T18:00:00+00:00",
"local_ends_at": "2018-11-27T15:04:00",
"local_starts_at": "2018-11-27T13:00:00",
"ends_at": "2018-11-27T20:04:00+00:00"
},
{
"starts_at": "2018-12-04T18:00:00+00:00",
"local_ends_at": "2018-12-04T15:04:00",
"local_starts_at": "2018-12-04T13:00:00",
"ends_at": "2018-12-04T20:04:00+00:00"
},
]
When the script is executed it throws the error: No field found for [unavailability] in mapping with types [aircraft]
Is there any clue to make it work?
Thanks
UPDATE
Query:
https://gist.github.com/dunice/3ccd7d83ca6ddaa63c11013b84e659aa
UPDATE 2
Mapping:
https://gist.github.com/dunice/f8caee114bbd917115a21b8b9175a439
Data example:
https://gist.github.com/dunice/8ad0602bc282b4ca19bce8ae849117ad

You cannot access an array present in the source document via doc_values (i.e. doc). You need to directly access the source document via the _source variable instead, like this:
for(int i = 0; i < params._source['unavailability'].length; i++) {
Note that depending on your ES version, you might want to try ctx._source or just _source instead of params._source

I solve my use-case in a different approach.
Instead having a field as array of object like unavailability was I decided to create two fields as array of datetime:
unavailable_from
unavailable_to
My script walks through the first field then checks the second with the same position.
UPDATE
The direct access to _source is disabled by default:
https://github.com/elastic/elasticsearch/issues/17558

Kibana visualization showing wrong results when compared to discover

I am using kibana for visualization on elastic search. I am trying to find the maximum occurring terms in cleaned_keyword_phrases, which is an array of keywords. Basically the cleaned keyword_phrases is an array of skills eg: ["java","spring","ms word"].
The results that I get when I am searching for a query(primary_class:"job" and jobPost:"java developer") is showing correct results when I see it in discover tab, but in visualize tab the results are wrong.
Eg, when i am searching for java developer, these are the results being displayed(these seem right) in quick count in result:
discover result:
Whereas when i try to visualize, the results change(these seem wrong) and are displayed as:
visualize results:
Infact, on changing query to developer from "java developer" the results in quick count in discover change but the results in the visualization tab remain the same. This makes me feel that the query is not being run in visualize tab.
I tried running the query using sense plugin but in that too the results are coming wrong.
Query:
{
"size": 0,
"query": {
"query_string": {
"query": "primary_class:\"job\" and jobPost:\"java developer\"",
"analyze_wildcard": true
}
},
"aggs": {
"3": {
"terms": {
"field": "cleaned_keyword_phrases",
"size": 20,
"order": {
"_count": "desc"
}
}
}
}
}
kibana Version 4.0.2
Build 6004
Commit SHA b286116
Edit: Good results are results which are more related to the query i.e. java developer in this context. Thus results coming up in quick count on the discover tab are "Good" and the ones showing up in the visualize tab seem bad as they are not related(these are not changing when changing the command in kibana).

I had a problem with my hostnames, similar to yours.
The visualization splits a name like vm-xx-yy in vm, xx and yy and show the results for that.
After setting the field from index:analyzed to index:not_analyzed it works correctly.

have you checked your visualisation when attached on a dashboards with same query string in search bar ? If it does apply query string on when on dashboard then may be because here on visualize we are just creating a visualization !

ES keeps returning every document

I recently inherited an ES instance and ensured I read an entire book on ES cover-to-cover before posting this, however I'm afraid I'm unable to get even simple examples to work.
I have an index on our staging environment which exhibits behavior where every document is returned no matter what - I have a similar index on our QA environment which works like I would expect it to. For example I am running the following query against http://staging:9200/people_alias/_search?explain:
{ "query" :
{ "filtered" :
{ "query" : { "match_all" : {} },
"filter" : { "term" : { "_id" : "34414405382" } } } } }
What I noticed on this staging environment is the score of every document is 1 and it is returning EVERY document in my index no matter what value I specify ...using ?explain I see the following:
_explanation: {
value: 1
description: ConstantScore(*:*), product of:
details: [
{
value: 1, description: boost
}, { value: 1, description: queryNorm } ] }
On my QA environment, which correctly returns only one record I observe for ?explain:
_explanation: {
value: 1
description: ConstantScore(cache(_uid:person#34414405382)), product of:
details: [ {
value: 1,
description: boost
}, {
value: 1,
description: queryNorm
}
]
}
The mappings are almost identical on both indices - the only difference is I removed the manual field-level boost values on some fields as I read field-level boosting is not recommended in favor of query-time boosting, however this should not affect the behavior of filtering on the document ID (right?)
Is there any clue I can glean from the differences in the explain output or should I post the index mappings? Are there any server-level settings I should consider checking? It doesn't matter what query I use on Staging, I can use match queries and exact match lookups on other fields and Staging just keeps returning every result with Score 1.0
I feel like I'm doing something very glaringly and obviously wrong on my Staging environment. Could someone please explain the presence of ConstantScore, boost and queryNorm? I thought from looking at examples in other literature I would see things like term frequency etc.
EDIT: I am issuing the query from Elastic Search Head plugin

In your HEAD plugin, you need to use POST in order to send the query in the payload, otherwise the _search endpoint is hit without any constraints.
In your browser, if you open the developer tools and look at the networking tab, you'll see that nothing is sent in the payload when using GET.
It's a common mistake people often do. Some HTTP clients (like curl) do send a payload using GET, but others (like /head/) don't. Sense will warn you if you use GET instead of POST when sending a payload and will automatically force POST instead of GET.
So to sum it up, it's best to always use POST whenever you wish to send some payload to your servers, so you don't have to care about the behavior of the HTTP client you're using.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio