How to make Neptune Search more lenient - elasticsearch

I have some entries inside the graph that I am searching (e.g. hello_world, foo_bar_baz) and I want to be able to search "hello" and get hello_world back.
Currently, I will only get a result if I search the entire string (i.e. searching hello_world or foo_bar_baz)
This seems to be due to elasticsearch's standard analyzer behaviour but I don't know how to deal with this with Neptune.
with neptune_graph() as g:
my_query = " OR ".join(
f"predicates.{field}.value:({query})" for field in ['names', 'spaces']
)
search_results = (
g.withSideEffect(
"Neptune#fts.endpoint", f"https://{neptuneSearchURL}"
)
.withSideEffect("Neptune#fts.queryType", "query_string")
.withSideEffect("Neptune#fts.sortOrder", "DESC")
.V()
.hasLabel("doc")
.has(
"*",
f"Neptune#fts entity_type:table AND ({my_query})",
)
)

One way is to use a wild card.
Given:
g.addV('search-test').property('name','Hello_World')
v[0ebedfda-a9bd-e320-041a-6e98da9b1379]
Assuming the search integration is all in place, after the search index has been updated, the following will find the vertex:
g.withSideEffect("Neptune#fts.endpoint",
"https://vpc-neptune-xxx-abc123.us-east-1.es.amazonaws.com").
withSideEffect('Neptune#fts.queryType', 'query_string').
V().
has('name','Neptune#fts hello*').
elementMap().
unfold()
Which yields
{<T.id: 1>: '0ebedfda-a9bd-e320-041a-6e98da9b1379'}
{<T.label: 4>: 'search-test'}
{'name': 'Hello_World'}

The problem I was having was indeed the analyzer, except I didn't understand how to fix it until now.
When creating the elasticsearch index in the first place, you need to set what settings you want.
The solution was creating index using
with neptune_search() as es:
es.indices.create(index="my_index", body={/*set custom analyser here*/});
es.index(index="my_index", ... other stuff);
# example of changing the analyser (needs "" around keys and values)
#body={
# settings:{analysis:{analyzer:{default:{
# type: custom,
# tokenizer:"lowercase"
# }}}}
#}

Related

Kibana visualize use wild card in search bar

Is it possible to use wild card in Kibana visualize search bar.
Tried to use it like below, but did not work.
operation: "Revers" NOT file:"*Test.Revers"
This returns 2 because there are two Revers terms ("Revers", "/test/count/Test.Revers" ) even though only one data entry is in the stats data.
The following also returns the same value as 2.
operation: "Revers"
Stat data sample is as below.
"_source": {
"status": 0,
"trstime": 1819,
"username": "test",
"operation": "Revers",
"file": "/test/count/Test.Revers"
}
I have tested it in ES 7.10 as you not mentioned ES version.
Answer to your question is YES, you can use wildcrad in Kibana visualize search bar but value should be without double quotes. Because if you give value in doble quotes it will consider as text and search for it.
You can try below query and it will give you your expected output:
operation: Revers AND NOT file.keyword: *Test.Revers
The result given for the below query as 1 without double quotes.
operation: Revers AND NOT file: *Test.Revers

django-haystack with Elasticsearch: how can I get fuzzy (word similarity) search to work?

I'm using elasticsearch==2.4.1 and django-haystack==3.0 with Django==2.2 using an Elasticsearch instance version 2.3 on AWS.
I'm trying to implement a "Did you mean...?" using a similarity search.
I have this model:
class EquipmentBrand(SafeDeleteModel):
name = models.CharField(
max_length=128,
null=False,
blank=False,
unique=True,
)
The following index:
class EquipmentBrandIndex(SearchIndex, Indexable):
text = fields.EdgeNgramField(document=True, model_attr="name")
def index_queryset(self, using=None):
return self.get_model().objects.all()
def get_model(self):
return EquipmentBrand
And I'm searching like this:
results = SearchQuerySet().models(EquipmentBrand).filter(content=AutoQuery(q))
When name is "Example brand", these are my actual results:
q='Example brand" -> Found
q='bra" -> Found
q='xam' -> Found
q='Exmple' -> *NOT FOUND*
I'm trying to get the last example to work, i.e. finding the item if the word is similar.
My goal is to suggest items from the database in case of typos.
What am I missing to make this work?
Thanks!
I don't think you want to be using EdgeNgramField. "Edge" n-grams, from the Elasticsearch Docs:
emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.
It's intended for autocomplete. It only matches string that are prefixes of the target. So, when the target document include "example", searches that work would be "e", "ex", "exa", "exam", ...
"Exmple" is not one of those strings. Try using plain NgramField.
Also, please consider upgrading. So much has been fixed and improved since ES 2.4.1

Custom data search within an array

Is it possible to search an account's custom data to find a value contained in an array?
Something like:
?customData.[arrayName].{key}=value
The Stormpath docs don't mention array searching.
Yes, with Stormpath it is totally possible to search for custom data even if the values are stored as an array!
Please note that the field names are simple names, and the values are what are different data types like array, map, string etc... so the query is not as complex as one would think :-)
For example, if I want to store custom data called favoriteColors, which is an array like
"favoriteColors": [ "red", "black", "blue", "white" ]
Notice the field name is just like any other field name. The value is the array.
To search for accounts which have a value red in the favoriteColors array, you just need the normal query syntax:
?customData.favoriteColors=red
The full request (if searching a Directory of accounts), might look like this:
https://api.stormpath.com/v1/directories/<directory_uid>/accounts?customData.favoriteColors=red
You could also do the same search on the Tenant resource to search tenant-wide (across all accounts):
https://api.stormpath.com/v1/tenants/<tenant_uid>/accounts?customData.favoriteColors=red
This query would match an account that contains red in the favoriteColors array. If I changed the query to ?customData.favoriteColors=yellow it would not match unless yellow was also added to the array.
Searching for custom data in an array can definitely be done. The syntax is: customData.{fieldName}\[{index}\]=value where {index} can be the specific index you are looking for, or * if you want to find it anywhere in the array. (Note that the [] characters are escaped with a backslash or the query interpreter gets it confused with a range query.)
If you leave off the index entirely, then \[*\] is implied. More precisely, Stormpath will check for either the value in the fieldName or the value as an element in an array of fieldName. However, syntactic sugar can only work if the array field is the last element in your search. Since you can put literally any JSON object into your custom data, Stormpath cannot check every single possibility. Imagine something like customData.foo.bar.baz.qux=bingo. Stormpath would not try to guess that maybe foo is an array, maybe bar is an array or not, maybe baz is an array or not - only maybe qux is an array or not. So, if you want to search an array of objects, you cannot leave out the \[*\].
Here is an example. I have an account with the custom data:
{
"favoriteThings": [
{
"thing": "raindrops",
"location": "on roses"
},
{
"thing": "whiskers",
"location": "on kittens"
},
{
"thing": "snowflakes",
"location": "on my nose and eye lashes"
}
],
"favoriteColors": [
"blue",
"grey"
]
}
The following queries will yield the following results:
customData.favoriteColors=blue will include this account.
customData.favoriteColors\[1\]=blue will not include this account because blue is not at index 1.
customData.favoriteThings\[*\].thing=whiskers will include this account
customData.favoriteThings\[*\].thing=ponies will not include this account because it does not list ponies as one of his favorite things, but may include other accounts with custom data in the same structure.
customData.favoriteThings.thing=whiskers would not include this account or any other accounts with the same custom data structure because in that case, Stormpath would be looking for a single nested JSON favoriteThings object, not an array.

Elasticsearch Query String with Dot/Point at the end, i.e. +foo.*

I have an index containing lot's of streets. The index looks like this:
Mainstreet 42
Some other street 15
Foostr. 9
The default search query looks like this:
+QUERY_STRING*
So querying for foo (sent as +foo*) or foostr (sent as +foostr*) results in Foostr. 9, which is correct. BUT querying for foostr. (which get's sent to Elasticsearch as +foostr.*) gives no results, but why?
I use standard analyzer and the query string with no special options. (This also returns 0 results when using http://127.0.0.1:9200/test/streets?q=+foostr.*).
Btw. this: http://127.0.0.1:9200/test/streets?q=+foostr. (same as above without the asterisk) finds the right results
Questions:
Why is this happening?
How to avoid this behavior?
One thing i didn't think about was:
Elasticsearch will not analyze wildcard queries by default!
This means. By default it will act like this:
input query | the query that ES will use
----------------------------------------
foo | foo
foo. | foo
foo* | foo*
foo.* | foo.*
As you can see, if the input query contains a wildcard, ES will not remove any characters. When using no wildcard, ES will take the query and run an analyzer, which (i.e. when using the default analyzer) will remove all dots.
To "fix" this, you can either
Remove all dots manually from the query string. Or
Use analyze_wildcard=true (i.e. http://127.0.0.1:9200/test/streets?q=+foostr.*&analyze_wildcard=true). Here's an explanation of what happens: https://github.com/elastic/elasticsearch/issues/787
1) This is because standard analyser does not index special characters. Example if you index a string Yoo! My name is Karthik., elasticsearch breaks it down to (yoo, my, name, is, karthik) without special characters (which actually makes sense in many simple cases) and in lowercase. So, when you search for foostr., there were no results.. as it was indexed as foostr (without ".").
2) You can use different types of analysers for different fields depending on your requirement while indexing (or you can use no_analyser as well).
Example:-
$ curl -XPUT 'http://localhost:9200/bookstore/book/_mapping' -d '
{
"book" : {
"properties" : {
"title" : {"type" : "string", "analyzer" : "simple"},
"description" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
'
You can refer this and this for more information.
HTH!

mongoengine filter query not working

I've defined a Flavor Document much like the rest of my models, and recently added the is_archived field:
class Flavor(BaseDocument):
is_archived = BooleanField(default=False)
In a python shell I can verify that my Documents do indeed have the field and are set to a Boolean:
for f in Flavor.objects.all():
print f.is_archived, type(f.is_archived)
>> False <type 'bool>
>> False <type 'bool>
>> ...
But when I filter the query it returns only the documents I have created since adding the field.
Flavor.objects(is_archived=False)
Flavor.objects.filter(is_archived=False)
>> [<Flavor: newFlavor>]
>> [<Flavor: newFlavor>]
How can I update my old Documents to be collected by the filtered query?
Just figured it out. Perfect example of how carefully framing the question leads naturally to the answer:
for f in Flavor.objects.all():
f.update(set__is_archived=f.is_archived)

Resources