Amazon Neptune Full Text Search - specify fields - elasticsearch

So SPARQL documentation contains examples how to specify multiple fields to search for:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX neptune-fts: <http://aws.amazon.com/neptune/vocab/v01/services/fts#>
SELECT * WHERE {
SERVICE neptune-fts:search {
neptune-fts:config neptune-fts:endpoint 'http://your-es-endpoint.com' .
neptune-fts:config neptune-fts:queryType 'query_string' .
neptune-fts:config neptune-fts:query 'mikael~ OR rondelli' .
neptune-fts:config neptune-fts:field foaf:name .
neptune-fts:config neptune-fts:field foaf:surname .
neptune-fts:config neptune-fts:return ?res .
}
}
I'm trying to do the same thing, but in Gremlin:
g.withSideEffect('Neptune#fts.endpoint', '...')
.V().has(['name', 'company'], 'Neptune#fts term*')
This obviously doesn't work. Now I could use wildcard like this:
g.withSideEffect('Neptune#fts.endpoint', '...')
.V().has('*', 'Neptune#fts term*')
But now I'm matching all the fields, and it fails because our index has too many. (There's a limit of 1,024 I think.)
Any idea how to specify a list of fields to search through in a Gremlin query?

Meanwhile I found a workaround, which works but is not very clean:
You can set your query to use query_string format like this:
.withSideEffect("Neptune#fts.queryType", "query_string")
It will be less forgiving for syntax, but it means you can search for fields inside the query:
field1:foo AND field2:bar
Now with Neptune it's not so simple, because your field names aren't just field1, field2, but they are formatted like this:
predicates: {
field1: {
value: "..."
},
field2: {
value: "..."
}
}
That's fine, you just need to modify the query:
predicates.field1.value:foo AND predicates.field2.value:bar
And here's how I do "make sure that some of the fields match term":
predicates.field1.value:<term> OR predicates.field2.value:<term>

Related

Accessing other documents inside update and reindex script

Is there anyway to run a query and fetch matching documents inside a script?
For example I want to add all of the documents matching the query x to the field f of my document d like this:
d:
{
other fields...
"f": [ {doc 1 matching the query}, {doc 2 matching the query}, ... ]
It is possible in sql to use a query inside another query. Does elasticsearch have anything similar?
Tried writing all kinds of code in script to fetch data from indexes in my database but didn't work.

Elastic search wildcard search space issue

Consider index field "ProductName" having the value "dove 3.75oz" and when user searches for "dove 3.75oz" text below bool query is working fine to retreive the document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75oz"}}}]}}
If user searches for "dove 3.75 oz" (Space between "3.75" and "oz") the bool query is failing to retrieve the same document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75 oz"}}}]}}
Question: How to design a query using a wildcard query that supports space or no spaces? Please share an example.
Text fields values are broken into tokens by default and then stored. So something like "hello man"" will be saved separately as hello and man because of the space between them. And that is exactly why this will not work with a wildcard query.
{"wildcard":{"ProductName":{"value":"3.75 oz"}}}
It only works for single tokens. For wildcard queries you can use a special field type called wildcard.
If you do not want to reindex your data, try phrase search like:
"match_phrase": {
"ProductName": {
"query": "3.75 oz"
}
}

I'm having a problem with elasticsearch, how do I query for these conditions

I'm having a problem with elasticsearch, how do I query for these conditions
beginsWith
endsWith
contains
You can use Wildcard Query to perform such queries, A wildcard operator is a placeholder that matches one or more characters. For example, the * wildcard operator matches zero or more characters. You can combine wildcard operators with other characters to create a wildcard pattern.
in your case you can use wildCard query like below for example to check if string start contain or endwith 'od':
beginsWith : od*
endsWith: *od
contains: *od*
Rest API call example for all terms contains "od":
GET /_search
{
"query": {
"wildcard": {
"text": {
"value": "*od*"
}
}
}
}
for more information you can check ES official documentation here
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

How do boolean predicates work in Elasticsearch query string syntax

I have a question regarding the ES query string syntax. I am searching logstash log-entries containg xml documents and I'd like to search for documents containg certain XML attributes with certain values. When searching for:
id: foobar AND attrName=SomeValue
In my data set this query finds lets say 100 documents
When searching for:
id: foobar AND attrName SomeValue
I get less documents. Why is that, when according to the query_string docs the default operator is OR.
When I escape the " character and query like this I get the correct results:
id: foobar AND attrName=\"SomeValue\"
I'm running the query using the following json:
{
"sort": [
"#timestamp"
],
"query": {
"query_string": {
"query": "mySearchText"
}
},
"fields": [
"_id"
],
"size": 100
}
Any tips on how to search in XML documents containing only elements and attributes but no text nodes.
Edit #1: I just stumpbled upon another thing I don't understand. Why are these queries different:
a AND b OR c
is different than:
a AND (b OR c)
Any tips on how these queries are evaluated?
Edit #2: Okay I think I nailed down what behaviour is confusing me.
When my query string looks like this:
id: foo AND attrName=\"SomeValue\" AND field2:bar
I get all documents where:
- id=foo
- field2=bar
- contain the text attrName AND the text SomeValue
When I change my query to (added parentheses):
id: foo AND (attrName=\"SomeValue\") AND field2:bar
I get all documents where:
- id=foo
- field2=bar
- contain the text attrName OR the text SomeValue
Why is (attrName=\"SomeValue\") evaluated as attrName OR SomeValue, whereas without parentheses it is attrName AND SomeValue?

Project the sum of all fields in a document that match a regular expression, in elasticsearch

In Elasticsearch, I know I can specify the fields I want to return from documents that match my query using {"fields":["fieldA", "fieldB", ..]}.
But how do I return the sum of all fields that match a particular regular expression (as a new field)?
For example, if my documents look like this:
{"documentid":1,
"documentStats":{
"foo_1_1":1,
"foo_2_1":5,
"boo_1_1:3
}
}
and I want the sum of all stats that match _1_ per document?
You can define an artificial field called script_field that contains a small Groovy script, which will do the job for you.
So after your query, you can add a script_fields section like this:
{
"query" : {
...
},
"script_fields" : {
"sum" : {
"script" : "_source.documentStats.findAll{ it.key =~ '_1_'}.collect{it.value}.sum()"
}
}
}
What the script does is simply to retrieve all the fields in documentStats whose name matches _1_ and sums all their values, in this case, you'll get 4.
Make sure to enable dynamic scripting in elasticsearch.yml and restart your ES node before trying this out.

Resources