How to get word2index from gensim

How to get word2index from gensim - gensim

By doc we can use this to read a word2vec model with genism
model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)
This is an index-to-word mapping, that is, e.g., model.index2word[2], how to derive an inverted mapping (word-to-index) based on this?

The mappings from word-to-index are in the KeyedVectors vocab property, a dictionary with objects that include an index property.
For example:
word = "whatever" # for any word in model
i = model.vocab[word].index
model.index2word[i] == word # will be true

Even simpler solution would be to enumerate index2word
word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)}
word2index['hi'] == 30308 # True

Related

django-haystack with Elasticsearch: how can I get fuzzy (word similarity) search to work?

I'm using elasticsearch==2.4.1 and django-haystack==3.0 with Django==2.2 using an Elasticsearch instance version 2.3 on AWS.
I'm trying to implement a "Did you mean...?" using a similarity search.
I have this model:
class EquipmentBrand(SafeDeleteModel):
name = models.CharField(
max_length=128,
null=False,
blank=False,
unique=True,
)
The following index:
class EquipmentBrandIndex(SearchIndex, Indexable):
text = fields.EdgeNgramField(document=True, model_attr="name")
def index_queryset(self, using=None):
return self.get_model().objects.all()
def get_model(self):
return EquipmentBrand
And I'm searching like this:
results = SearchQuerySet().models(EquipmentBrand).filter(content=AutoQuery(q))
When name is "Example brand", these are my actual results:
q='Example brand" -> Found
q='bra" -> Found
q='xam' -> Found
q='Exmple' -> *NOT FOUND*
I'm trying to get the last example to work, i.e. finding the item if the word is similar.
My goal is to suggest items from the database in case of typos.
What am I missing to make this work?
Thanks!

I don't think you want to be using EdgeNgramField. "Edge" n-grams, from the Elasticsearch Docs:
emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.
It's intended for autocomplete. It only matches string that are prefixes of the target. So, when the target document include "example", searches that work would be "e", "ex", "exa", "exam", ...
"Exmple" is not one of those strings. Try using plain NgramField.
Also, please consider upgrading. So much has been fixed and improved since ES 2.4.1

conditional FilterSets in DRF 3.7 autogen docs: can I add a queryparam filter for a route (but only for certain HTTP verbs)

(DRF v3.7, django-filters v1.1.0)
Hi! I have a working FilterSet that lets me filter my results via a query parameter, e.g. http://localhost:9000/mymodel?name=FooOnly
This is working just fine.
class MyNameFilter(FilterSet):
name = CharFilter(field_name='name', help_text='Filter by name')
class Meta:
model = MyModel
fields = ('name',)
class MyModel(...):
...
filter_backends = (DjangoFilterBackend,)
filter_class = MyNameFilter
But when I render the built-in auto-generated docs for my API, I am seeing this query parameter documented for all methods in my route, e.g. GET, PUT, PATCH, etc.
I only intend to filter via this query parameter for some of these HTTP verbs, as it doesn't make sense for others, e.g. PUT
Is there a good way to make my FilterSet conditional in this manner? Conditional on route method.
I tried applying this logic at both the Router level (a misguided idea). Also at the ViewSet level -- but there is no get_filter_class override method in the same way there is e.g. get_serializer_class.
Thanks for the help.

you'll get get_filter_class in DjangoFilterBackend. You need to create a new FilterBackend which overrides the filter_queryset method.
class GETFilterBackend(DjangoFilterBackend):
def filter_queryset(self, request, queryset, view):
if request.method == 'GET':
return super().filter_queryset(request, queryset, view)
return queryset
class MyModel(...):
...
filter_backends = (GETFilterBackend,)
filter_class = MyNameFilter

Figured this out, with help from Carlton G. on the django-filters Google Groups forum (thank you, Carlton).
My solution was to go up a level and intercept the CoreAPI schema that came out of the AutoSchema inspection, but before it made its way into the auto-generated docs.
At this point of interception, I override _allows_filters to apply only on my HTTP verbs of interest. (Despite being prefixed with a _ and thus intended as a private method not meant for overriding, the method's comments explicitly encourage this. Introduced in v3.7: Initially "private" (i.e. with leading underscore) to allow changes based on user experience.
My code below:
from rest_framework.schemas import AutoSchema
# see https://www.django-rest-framework.org/api-guide/schemas/#autoschema
# and https://www.django-rest-framework.org/api-guide/filtering/
class LimitedFilteringViewSchema(AutoSchema):
# Initially copied from lib/python2.7/site-packages/rest_framework/schemas/inspectors.py:352,
# then modified to restrict our filtering by query-parameters to only certain view
# actions or HTTP verbs
def _allows_filters(self, path, method):
if getattr(self.view, 'filter_backends', None) is None:
return False
if hasattr(self.view, 'action'):
return self.view.action in ["list"] # original code: ["list", "retrieve", "update", "partial_update", "destroy"]
return method.lower() in ["get"] # original code: ["get", "put", "patch", "delete"]
And then, at my APIView level:
class MyViewSchema(LimitedFilteringViewSchema):
# note to StackOverflow: this was some additional schema repair work I
# needed to do, again adding logic conditional on the HTTP verb.
# Not related to the original question posted here, but hopefully relevant
# all the same.
def get_serializer_fields(self, path, method):
fields = super(MyViewSchema, self).get_serializer_fields(path, method)
# The 'name' parameter is set in MyModelListItemSerializer as not being required.
# However, when creating an access-code-pool, it must be required -- and in DRF v3.7, there's
# no clean way of encoding this conditional logic, short of what you see here:
#
# We override the AutoSchema inspection class, so we can intercept the CoreAPI Fields it generated,
# on their way out but before they make their way into the auto-generated api docs.
#
# CoreAPI Fields are named tuples, hence the poor man's copy constructor below.
if path == u'/v1/domains/{domain_name}/access-code-pools' and method == 'POST':
# find the index of our 'name' field in our fields list
i = next((i for i, f in enumerate(fields) if (lambda f: f.name == 'name')(f)), -1)
if i >= 0:
name_field = fields[i]
fields[i] = Field(name=name_field.name, location=name_field.location,
schema=name_field.schema, description=name_field.description,
type=name_field.type, example=name_field.example,
required=True) # all this inspection, just to set this here boolean.
return fields
class MyNameFilter(FilterSet):
name = CharFilter(field_name='name', help_text='Filter returned access code pools by name')
class Meta:
model = MyModel
fields = ('name',)
class MyAPIView(...)
schema = MyViewSchema()
filter_backends = (DjangoFilterBackend,)
filter_class = MyNameFilter

Dynamic domain apply while other field's value changed - odoo

class procurement(models.Model)
_name="procurement"
procurement_line_ids = fields.One2many(comodel_name='procurement.line', inverse_name='procurement_id', string='Procurement Lines')
global_procurement = fields.Boolean("Global Procurement",default=True)
class procurement_line(models.Model)
_name="procurement.line"
procurement_id = fields.Many2one(comodel_name='procurement', string='Procurement')
warehouse_id = fields.Many2one(comodel_name='stock.warehouse', string='Warehouse')
class stock_warehouse(models.Model)
_name="stock.warehouse"
is_default_warehouse = fields.Boolean(string="Is Default Warehouse?",default=False)
If global_procurement is True then I want to load only default warehouses in procurement lines otherwise I want to load all warehouses. So how could I do this.

We may try with following way.
Pass value in context. For example:
<field name="warehouse_id"
context="{'global_procurement': parent.global_procurement}"/>
Check context value name_search() of stock.warehouse object. For example:
#api.model
def name_search(self, name, args=None, operator='ilike', limit=100):
if self._context and self._context.get('global_procurement'):
default_list = [1,2,3] # set your logic to search list of default warehouse
return self.browse(default_list).name_get()
return super(Warehouse, self).name_search(name=name, args=new_args, operator=operator, limit=limit)
I have written answer in air. I didn't try it.

I have done it by just defining domain in field (idea is taken from then #Odedra's answer).
<field name="warehouse_id" required="1" domain="[('field_name','=',parent.global_procurement)]" options="{'no_create': True, 'no_quick_create':True, 'no_create_edit':True}" />

Create an object if one is not found

How do I create an object if one is not found? This is the query I was running:
#event_object = #event_entry.event_objects.find_all_by_plantype('dog')
and I was trying this:
#event_object = EventObject.new unless #event_entry.event_objects.find_all_by_plantype('dog')
but that does not seem to work. I know I'm missing something very simple like normal :( Thanks for any help!!! :)

find_all style methods return an array of matching records. That is an empty array if no matching records are found. And an empty is truthy. Which means:
arr = []
if arr
puts 'arr is considered turthy!' # this line will execute
end
Also, the dynamic finder methods (like find_by_whatever) are officially depreacted So you shouldn't be using them.
You probably want something more like:
#event_object = #event_entry.event_objects.where(plantype: 'dog').first || EventObject.new
But you can also configure the event object better, since you obviously want it to belong to #event_entry.
#event_object = #event_entry.event_objects.where(plantype: 'dog').first
#event_object ||= #event_entry.event_objects.build(plantype: dog)
In this last example, we try to find an existing object by getting an array of matching records and asking for the first item. If there are no items, #event_object will be nil.
Then we use the ||= operator that says "assign the value on the right if this is currently set to a falsy value". And nil is falsy. So if it's nil we can build the object form the association it should belong to. And we can preset it's attributes while we are at it.

Why not use built in query methods like find_or_create_by or find_or_initialize_by
#event_object = #event_entry.event_objects.find_or_create_by(plantype:'dog')
This will find an #event_entry.event_object with plantype = 'dog' if one does not exist it will then create one instead.
find_or_initialize_by is probably more what you want as it will leave #event_object in an unsaved state with just the association and plantype set
#event_object = #event_entry.event_objects.find_or_initialize_by(plantype:'dog')
This assumes you are looking for a single event_object as it will return the first one it finds with plantype = 'dog'. If more than 1 event_object can have the plantype ='dog' within the #event_entry scope then this might not be the best solution but it seems to fit with your description.

More concise way of writing this array inclusion / default fallback code?

I find that I've been doing this a fair enough number of times in my Rails controllers that I'm interested in finding a better way of writing it out (if possible). Essentially, I'm validating the input to a few options, and falling back on a default value if the input doesn't match any of the options.
valid_options = %w(most_active most_recent most_popular)
#my_param = valid_options.include?(params[:my_param]) ? params[:my_param] : 'most_recent'

If you use a hash instead of an array, it would be faster and cleaner. And, since your default is "most_recent", having "most_recent" in valid_options is redundant. You better remove it.
filter_options =
Hash.new("most_recent")
.merge("most_popular" => "most_popular", "most_active" => "most_active")
#my_param = filter_options[params[:my_param]]

I too would go the Hash route.
This could be imaginable:
Hash[valid_options.zip valid_options].fetch(params[:my_param], "most_recent")

A bit farfetched.
valid_options = %w(most_active most_recent most_popular)
(valid_options & [params[:my_param]]).first || 'most_recent'

How is the below:
valid_options = %w(most_active most_recent most_popular)
valid_options.detect(proc{'default_value'}){|i| i == params[:my_param] }
Another one:
valid_options = %w(most_active most_recent most_popular)
valid_options.dup.delete(params[:my_param]) { "default" }

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to get word2index from gensim - gensim

By doc we can use this to read a word2vec model with genism model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False) This is an index-to-word mapping, that is, e.g., model.index2word[2], how to derive an inverted mapping (word-to-index) based on this?

The mappings from word-to-index are in the KeyedVectors vocab property, a dictionary with objects that include an index property. For example: word = "whatever" # for any word in model i = model.vocab[word].index model.index2word[i] == word # will be true

Even simpler solution would be to enumerate index2word word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)} word2index['hi'] == 30308 # True

Related

django-haystack with Elasticsearch: how can I get fuzzy (word similarity) search to work?

conditional FilterSets in DRF 3.7 autogen docs: can I add a queryparam filter for a route (but only for certain HTTP verbs)

Dynamic domain apply while other field's value changed - odoo

Create an object if one is not found

More concise way of writing this array inclusion / default fallback code?

Categories

Resources