Exclude multi-word keywords from Twitter Search API - ruby

I have a list of keywords to be excluded from search here
KEYWORDS = %w[
covid corona subway railway travel plane brazil ]
exclude = Twitter::KEYWORDS.split(",").join(" -")
and this is how my search query looks like
json_response = #client.search("(javascript) -#{exclude}", lang: "en", result_type: "recent", tweet_mode: "extended", count: 100)
How can I pass multi-word keywords here to be excluded, for example keywords like "off the hand" or "game plan"?
Adding them along with the other keywords doesn't work as expected.

In case anyone comes back looking for the same problem, this is how I solved it:
#client.search("(javascript) -#{exclude} -\"off the hand\" -\"game plan\", lang: "en", result_type: "recent", tweet_mode: "extended", count: 100)
So, basically by using escape characters which allowed me to pass multi-word keywords as an exact string.

Related

How to make Neptune Search more lenient

I have some entries inside the graph that I am searching (e.g. hello_world, foo_bar_baz) and I want to be able to search "hello" and get hello_world back.
Currently, I will only get a result if I search the entire string (i.e. searching hello_world or foo_bar_baz)
This seems to be due to elasticsearch's standard analyzer behaviour but I don't know how to deal with this with Neptune.
with neptune_graph() as g:
my_query = " OR ".join(
f"predicates.{field}.value:({query})" for field in ['names', 'spaces']
)
search_results = (
g.withSideEffect(
"Neptune#fts.endpoint", f"https://{neptuneSearchURL}"
)
.withSideEffect("Neptune#fts.queryType", "query_string")
.withSideEffect("Neptune#fts.sortOrder", "DESC")
.V()
.hasLabel("doc")
.has(
"*",
f"Neptune#fts entity_type:table AND ({my_query})",
)
)
One way is to use a wild card.
Given:
g.addV('search-test').property('name','Hello_World')
v[0ebedfda-a9bd-e320-041a-6e98da9b1379]
Assuming the search integration is all in place, after the search index has been updated, the following will find the vertex:
g.withSideEffect("Neptune#fts.endpoint",
"https://vpc-neptune-xxx-abc123.us-east-1.es.amazonaws.com").
withSideEffect('Neptune#fts.queryType', 'query_string').
V().
has('name','Neptune#fts hello*').
elementMap().
unfold()
Which yields
{<T.id: 1>: '0ebedfda-a9bd-e320-041a-6e98da9b1379'}
{<T.label: 4>: 'search-test'}
{'name': 'Hello_World'}
The problem I was having was indeed the analyzer, except I didn't understand how to fix it until now.
When creating the elasticsearch index in the first place, you need to set what settings you want.
The solution was creating index using
with neptune_search() as es:
es.indices.create(index="my_index", body={/*set custom analyser here*/});
es.index(index="my_index", ... other stuff);
# example of changing the analyser (needs "" around keys and values)
#body={
# settings:{analysis:{analyzer:{default:{
# type: custom,
# tokenizer:"lowercase"
# }}}}
#}

django-haystack with Elasticsearch: how can I get fuzzy (word similarity) search to work?

I'm using elasticsearch==2.4.1 and django-haystack==3.0 with Django==2.2 using an Elasticsearch instance version 2.3 on AWS.
I'm trying to implement a "Did you mean...?" using a similarity search.
I have this model:
class EquipmentBrand(SafeDeleteModel):
name = models.CharField(
max_length=128,
null=False,
blank=False,
unique=True,
)
The following index:
class EquipmentBrandIndex(SearchIndex, Indexable):
text = fields.EdgeNgramField(document=True, model_attr="name")
def index_queryset(self, using=None):
return self.get_model().objects.all()
def get_model(self):
return EquipmentBrand
And I'm searching like this:
results = SearchQuerySet().models(EquipmentBrand).filter(content=AutoQuery(q))
When name is "Example brand", these are my actual results:
q='Example brand" -> Found
q='bra" -> Found
q='xam' -> Found
q='Exmple' -> *NOT FOUND*
I'm trying to get the last example to work, i.e. finding the item if the word is similar.
My goal is to suggest items from the database in case of typos.
What am I missing to make this work?
Thanks!
I don't think you want to be using EdgeNgramField. "Edge" n-grams, from the Elasticsearch Docs:
emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.
It's intended for autocomplete. It only matches string that are prefixes of the target. So, when the target document include "example", searches that work would be "e", "ex", "exa", "exam", ...
"Exmple" is not one of those strings. Try using plain NgramField.
Also, please consider upgrading. So much has been fixed and improved since ES 2.4.1

How to get elements matching a partial text

I'm using NEST to create services, so I can search into a field (label)
Is there a way to get answers from a partial string ?
For example, if I have three labels : "John Doe" , "Dadido" and "Unicorn", if I type "Do", I get the two first ones
For now, I have this :
elasticClient.Search<ESbase>(s => s.Query(q=>q.Regexp(c =>
c.Name("label_query")
.Field(p =>p.Label).Value('*'+label+'*'))));
And when I try it, it doesn't send anything back
match: { text: '.*label.*'}should work
If you want use regex: Value(".*label.*")
I assume you used default maping and in your label string you dont have special character.
Edit: use wildcard work too .Wildcard("*label*")

How can I know if geocoder result was just a match in city name?

I'm using Geocoder gem with Google lookup. I'm having a hard time filtering matches.
My code is:
geocoded_by :complete_address, params: {
region: 'UY',
components: { country: 'UY' }
}
I only want results in a certain country (Uruguay). If I look up an address just like that it finds no results. (For example Hernani 1570). So, I had to search appending the city name, so that's why now I search by Hernani 1570, Montevideo and only then I would find a match.
The problem is that if I search for this is not a real address, Montevideo I find a match (partial but still) which is the center of the city. I don't want to remove all partial matches, so, is there a way to find out if the match I got was just a match in the city name or if it actually was a match by that street address?
Thank you very much

Custom data search within an array

Is it possible to search an account's custom data to find a value contained in an array?
Something like:
?customData.[arrayName].{key}=value
The Stormpath docs don't mention array searching.
Yes, with Stormpath it is totally possible to search for custom data even if the values are stored as an array!
Please note that the field names are simple names, and the values are what are different data types like array, map, string etc... so the query is not as complex as one would think :-)
For example, if I want to store custom data called favoriteColors, which is an array like
"favoriteColors": [ "red", "black", "blue", "white" ]
Notice the field name is just like any other field name. The value is the array.
To search for accounts which have a value red in the favoriteColors array, you just need the normal query syntax:
?customData.favoriteColors=red
The full request (if searching a Directory of accounts), might look like this:
https://api.stormpath.com/v1/directories/<directory_uid>/accounts?customData.favoriteColors=red
You could also do the same search on the Tenant resource to search tenant-wide (across all accounts):
https://api.stormpath.com/v1/tenants/<tenant_uid>/accounts?customData.favoriteColors=red
This query would match an account that contains red in the favoriteColors array. If I changed the query to ?customData.favoriteColors=yellow it would not match unless yellow was also added to the array.
Searching for custom data in an array can definitely be done. The syntax is: customData.{fieldName}\[{index}\]=value where {index} can be the specific index you are looking for, or * if you want to find it anywhere in the array. (Note that the [] characters are escaped with a backslash or the query interpreter gets it confused with a range query.)
If you leave off the index entirely, then \[*\] is implied. More precisely, Stormpath will check for either the value in the fieldName or the value as an element in an array of fieldName. However, syntactic sugar can only work if the array field is the last element in your search. Since you can put literally any JSON object into your custom data, Stormpath cannot check every single possibility. Imagine something like customData.foo.bar.baz.qux=bingo. Stormpath would not try to guess that maybe foo is an array, maybe bar is an array or not, maybe baz is an array or not - only maybe qux is an array or not. So, if you want to search an array of objects, you cannot leave out the \[*\].
Here is an example. I have an account with the custom data:
{
"favoriteThings": [
{
"thing": "raindrops",
"location": "on roses"
},
{
"thing": "whiskers",
"location": "on kittens"
},
{
"thing": "snowflakes",
"location": "on my nose and eye lashes"
}
],
"favoriteColors": [
"blue",
"grey"
]
}
The following queries will yield the following results:
customData.favoriteColors=blue will include this account.
customData.favoriteColors\[1\]=blue will not include this account because blue is not at index 1.
customData.favoriteThings\[*\].thing=whiskers will include this account
customData.favoriteThings\[*\].thing=ponies will not include this account because it does not list ponies as one of his favorite things, but may include other accounts with custom data in the same structure.
customData.favoriteThings.thing=whiskers would not include this account or any other accounts with the same custom data structure because in that case, Stormpath would be looking for a single nested JSON favoriteThings object, not an array.

Resources