mongoengine filter query not working - flask-mongoengine

I've defined a Flavor Document much like the rest of my models, and recently added the is_archived field:
class Flavor(BaseDocument):
is_archived = BooleanField(default=False)
In a python shell I can verify that my Documents do indeed have the field and are set to a Boolean:
for f in Flavor.objects.all():
print f.is_archived, type(f.is_archived)
>> False <type 'bool>
>> False <type 'bool>
>> ...
But when I filter the query it returns only the documents I have created since adding the field.
Flavor.objects(is_archived=False)
Flavor.objects.filter(is_archived=False)
>> [<Flavor: newFlavor>]
>> [<Flavor: newFlavor>]
How can I update my old Documents to be collected by the filtered query?

Just figured it out. Perfect example of how carefully framing the question leads naturally to the answer:
for f in Flavor.objects.all():
f.update(set__is_archived=f.is_archived)

Related

How to make Neptune Search more lenient

I have some entries inside the graph that I am searching (e.g. hello_world, foo_bar_baz) and I want to be able to search "hello" and get hello_world back.
Currently, I will only get a result if I search the entire string (i.e. searching hello_world or foo_bar_baz)
This seems to be due to elasticsearch's standard analyzer behaviour but I don't know how to deal with this with Neptune.
with neptune_graph() as g:
my_query = " OR ".join(
f"predicates.{field}.value:({query})" for field in ['names', 'spaces']
)
search_results = (
g.withSideEffect(
"Neptune#fts.endpoint", f"https://{neptuneSearchURL}"
)
.withSideEffect("Neptune#fts.queryType", "query_string")
.withSideEffect("Neptune#fts.sortOrder", "DESC")
.V()
.hasLabel("doc")
.has(
"*",
f"Neptune#fts entity_type:table AND ({my_query})",
)
)
One way is to use a wild card.
Given:
g.addV('search-test').property('name','Hello_World')
v[0ebedfda-a9bd-e320-041a-6e98da9b1379]
Assuming the search integration is all in place, after the search index has been updated, the following will find the vertex:
g.withSideEffect("Neptune#fts.endpoint",
"https://vpc-neptune-xxx-abc123.us-east-1.es.amazonaws.com").
withSideEffect('Neptune#fts.queryType', 'query_string').
V().
has('name','Neptune#fts hello*').
elementMap().
unfold()
Which yields
{<T.id: 1>: '0ebedfda-a9bd-e320-041a-6e98da9b1379'}
{<T.label: 4>: 'search-test'}
{'name': 'Hello_World'}
The problem I was having was indeed the analyzer, except I didn't understand how to fix it until now.
When creating the elasticsearch index in the first place, you need to set what settings you want.
The solution was creating index using
with neptune_search() as es:
es.indices.create(index="my_index", body={/*set custom analyser here*/});
es.index(index="my_index", ... other stuff);
# example of changing the analyser (needs "" around keys and values)
#body={
# settings:{analysis:{analyzer:{default:{
# type: custom,
# tokenizer:"lowercase"
# }}}}
#}

Fetch value from XML using dynamic tag in ESQL

I have an xml
<family>
<child_one>ROY</child_one>
<child_two>VIC</child_two>
</family>
I want to fetch the value from the XML based on the dynamic tag in ESQL. I have tried like this
SET dynamicTag = 'child_'||num;
SET value = InputRoot.XMLNSC.parent.(XML.Element)dynamicTag;
Here num is the value received from the input it can be one or two. The result should be value = ROY if num is one and value is VIC if num is two.
The chapter ESQL field reference overview describes this use case:
Because the names of the fields appear in the ESQL program, they must be known when the program is written. This limitation can be avoided by using the alternative syntax that uses braces ( { ... } ).
So can change your code like this:
SET value = InputRoot.XMLNSC.parent.(XMLNSC.Element){dynamicTag};
Notice the change of the element type as well, see comment of #kimbert.

django-haystack with Elasticsearch: how can I get fuzzy (word similarity) search to work?

I'm using elasticsearch==2.4.1 and django-haystack==3.0 with Django==2.2 using an Elasticsearch instance version 2.3 on AWS.
I'm trying to implement a "Did you mean...?" using a similarity search.
I have this model:
class EquipmentBrand(SafeDeleteModel):
name = models.CharField(
max_length=128,
null=False,
blank=False,
unique=True,
)
The following index:
class EquipmentBrandIndex(SearchIndex, Indexable):
text = fields.EdgeNgramField(document=True, model_attr="name")
def index_queryset(self, using=None):
return self.get_model().objects.all()
def get_model(self):
return EquipmentBrand
And I'm searching like this:
results = SearchQuerySet().models(EquipmentBrand).filter(content=AutoQuery(q))
When name is "Example brand", these are my actual results:
q='Example brand" -> Found
q='bra" -> Found
q='xam' -> Found
q='Exmple' -> *NOT FOUND*
I'm trying to get the last example to work, i.e. finding the item if the word is similar.
My goal is to suggest items from the database in case of typos.
What am I missing to make this work?
Thanks!
I don't think you want to be using EdgeNgramField. "Edge" n-grams, from the Elasticsearch Docs:
emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.
It's intended for autocomplete. It only matches string that are prefixes of the target. So, when the target document include "example", searches that work would be "e", "ex", "exa", "exam", ...
"Exmple" is not one of those strings. Try using plain NgramField.
Also, please consider upgrading. So much has been fixed and improved since ES 2.4.1

How do I create a compound multi-index in rethinkdb?

I am using Rethinkdb 1.10.1 with the official python driver. I have a table of tagged things which are associated to one user:
{
"id": "PK",
"user_id": "USER_PK",
"tags": ["list", "of", "strings"],
// Other fields...
}
I want to query by user_id and tag (say, to find all the things by user "tawmas" with tag "tag"). Starting with Rethinkdb 1.10 I can create a multi-index like this:
r.table('things').index_create('tags', multi=True).run(conn)
My query would then be:
res = (r.table('things')
.get_all('TAG', index='tags')
.filter(r.row['user_id'] == 'USER_PK').run(conn))
However, this query still needs to scan all the documents with the given tag, so I would like to create a compound index based on the user_id and tags fields. Such an index would allow me to query with:
res = r.table('things').get_all(['USER_PK', 'TAG'], index='user_tags').run(conn)
There is nothing in the documentation about compound multi-indexes. However, I
tried to use a custom index function combining the requirements for compound
indexes and multi-indexes by returning a list of ["USER_PK", "tag"] pairs.
My first attempt was in python:
r.table('things').index_create(
'user_tags',
lambda each: [[each['user_id'], tag] for tag in each['tags']],
multi=True).run(conn)
This makes the python driver choke with a MemoryError trying to parse the index function (I guess list comprehensions aren't really supported by the driver).
So, I turned to my (admittedly, rusty) javascript and came up with this:
r.table('things').index_create(
'user_tags',
r.js(
"""(function (each) {
var result = [];
var user_id = each["user_id"];
var tags = each["tags"];
for (var i = 0; i < tags.length; i++) {
result.push([user_id, tags[i]]);
}
return result;
})
"""),
multi=True).run(conn)
This is rejected by the server with a curious exception: rethinkdb.errors.RqlRuntimeError: Could not prove function deterministic. Index functions must be deterministic.
So, what is the correct way to define a compound multi-index? Or is it something
which is not supported at this time?
Short answer:
List comprehensions don't work in ReQL functions. You need to use map instead like so:
r.table('things').index_create(
'user_tags',
lambda each: each["tags"].map(lambda tag: [each['user_id'], tag]),
multi=True).run(conn)
Long answer
This is actually a somewhat subtle aspect of how RethinkDB drivers work. So the reason this doesn't work is that your python code doesn't actually see real copies of the each document. So in the expression:
lambda each: [[each['user_id'], tag] for tag in each['tags']]
each isn't ever bound to an actual document from your database, it's bound to a special python variable which represents the document. I'd actually try running the following just to demonstrate it:
q = r.table('things').index_create(
'user_tags',
lambda each: print(each)) #only works in python 3
And it will print out something like:
<RqlQuery instance: var_1 >
the driver only knows that this is a variable from the function, in particular it has no idea if each["tags"] is an array or what (it's actually just another very similar abstract object). So python doesn't know how to iterate over that field. Basically exactly the same problem exists in javascript.

Boolean value scripting issue with MVEL and Elasticsearch

I have a field mapping defined as
{"top_seller":{"type":"boolean"}}
In my query, I'm trying to do a custom score query based on the boolean value. I'm pulling my hair out. Every time I run a script such as this:
return if(doc['top_seller'].value==true) {10} else {0}
Every single document gets the true 10 boost. Only 1% of my documents are set as TRUE. I've tried without ==true, with =='true'. I've tried the ternary. doc['top_seller'].value==true?10:0. I've tried 1/0 instead of true/false.
I even did an experiment where I created a new index and type with with a single true and a single false document. In a match_all query, they both get the boost as though they have the true value.
Wow, on a whim, I was looking at the core type settings for boolean.
The boolean type Maps to the JSON boolean type. It ends up storing within the index either T or F, with automatic translation to true and false respectively.
The answer is:
doc['top_seller'].value == 'T' ? 10 : 0
Edit: As of 5.2.x, I am finally able to use doc['top_seller'] ? 10 : 0. https://www.elastic.co/guide/en/elasticsearch/reference/current/boolean.html

Resources