Fos Elastica remove common words(or, and etc..) from search query - elasticsearch

Hello I`m trying to get query results using FosElasticaBundle with this query, I
can't find a working example for filtering common words like (and, or) if it is possible this words not to be highlighted also would be really good. My struggle so far :
$searchForm = $this->createForm(SearchFormType::class, null);
$searchForm->handleRequest($request);
$matchQuery = new \Elastica\Query\Match();
$matchQuery->setField('_all', $queryString);
$searchQuery = new \Elastica\Query();
$searchQuery->setQuery($matchQuery);
$searchQuery->setHighlight(array(
"fields" => array(
"title" => new \stdClass(),
"content" => new \stdClass()
),
'pre_tags' => [
'<strong>'
],
'post_tags' => [
'</strong>'
],
'number_of_fragments' => [
'0'
]
));
Thanks in advance ;)

Do you want (and, or) to be ignored or not to have a value on your search?
If that's the case you may want to use stop words on your elasticsearch index.
Here's a reference.
https://www.elastic.co/guide/en/elasticsearch/guide/current/using-stopwords.html

Related

Running Query versus a Query Search Query via Nest client?

I am new to ElasticSearch and am writing basic search queries. I want to be able to search the full text field for a keyword. I understand that this can be done using query search query, but I am unclear on how this is done using the Nest client.
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.Match(m => m
.Query(search.Text))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));
This is the code I have. Basically, I am trying to search for some text between some date,, but I believe above it is not searching the body for code. Is there a way I can easily change this query so that it is searching the whole body? Or is it already doing that and I'm unware?
I am searching for events in cluster. An example of an event might look like:
{
"text": "string",
"includeExecution": true,
"processIds": "string",
"statuses": [
"string"
],
"space": "string",
"from": "2021-09-17T01:40:03.796Z",
"to": "2021-09-17T01:40:03.796Z",
"take": 0,
"skip": 0,
"orderBy": "string",
"orderByDescending": true
}
In my case, I want to be able to search for the word "string" and have this result come up (because "string" exists on space)
Try using the QueryString query like this. That will search for search.Text in all fields of your documents.
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.QueryString(qs => qs
.Query(search.Text))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));

Elastic search searms to prioritize results with an isolated search term during a full text search

I am having problems with Elastic Search. It seams the search term is being isolated in search results.
We have a large subtitle database that was indexed using Elastic Search.
It seams however, that our searches prioritize search results where the search term is isolated.
Ie: the search for "Eat" produces:
Oh, skydiving. // Skydiving. // Oh, I got that one. // Eating crazy. // Eating, eating. // Just pass, just pass. // You guys suck at that. // What was that? // Synchronized swimming
AND
it's my last night so we're gonna live // life like there's no tomorrow. // - I think I'd just wanna, // - Eat. // - Bring all the food, // whether it's Mcdonald's, whether it's, // - Ice cream.
We need to INSTEAD prioritize search results where the searchTerm is found WITHIN the sentence, rather than just on its own.
I need help determining what needs to be fixed - The Mapping, the filters, the tokenizers etc.
Here are my settings:
static public function getSettings(){
return [
'number_of_shards' => 1,
'number_of_replicas' => 1,
'analysis' => [
'filter' => [
'filter_stemmer' => [
'type' => 'stemmer',
'language' => 'english'
]
],
'analyzer' => [
'text_analyzer' => [
'type' => 'custom',
"stopwords" => [],
'filter' => ['lowercase', 'filter_stemmer','stemmer'],
'tokenizer' => 'standard'
],
]
]
];
}
and here are my mapping:
https://gist.github.com/firecentaur/d0e1e196f7fddbb4d02935bec5592009
And here is my search
https://gist.github.com/firecentaur/5ac97bbd8eb02c406d6eecf867afc13c
What am I doing wrong?
This behavior must be caused by the TL/IDF algorithm.
If a query match a field, it will be more important if their is few words in the field.
If you want to adapt this to your use case, you can use a function_score query.
This post should help you to find a solution.
How can I boost the field length norm in elasticsearch function score?

Dynamic field list for MultiMatch - Nest

We have a requirement to have a search for a document type with a variable/dynamic number of fields being queried against. For one search/type it might be Name and Status. For another, the Description field. The fields to be searched against will be chosen by the user at run time.
To do this statically appears easy. Something like this to search in Name and Description fields. (Assume that rootQuery is a valid searchDescriptor ready for the query.
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => ff.Name).Field(ff => ff.Description))));
However, we don't want to have a library of static queries to handle the potential permutations if possible. We'd rather do something dynamic like:
foreach (var field in string-list-of-fields-from-user)
{
rootQuery.Query(q => q.MultiMatch(mm => mm.Query(filter.Value.ToString()).Fields(f => f.Field(ff => field);
}
Is this possible? If so, how?
You can pass the string list of fields directly to .Fields(...)
var searchResponse = client.Search<Document>(s => s
.Query(q => q
.MultiMatch(mm => mm
.Query("query")
.Fields(new string[] { "field1", "field2", "field3" })
)
)
);
which yields
{
"query": {
"multi_match": {
"fields": ["field1", "field2", "field3"],
"query": "query"
}
}
}

NEST 2.0 with Elasticsearch for GeoDistance always returns all records

I have the below code using C# .NET 4.5 and NEST 2.0 via nuget. This query always returns my type 'trackpointes' with the total number of documents with this distance search code. I have 2,790 documents and the count return is just that. Even for 1 centimeter as the distance unit it returns all 2,790 documents. My type of 'trackpointes' has a location field, type of geo_point, geohash true, and geohash_precision of 9.
I am just trying to filter results based on distance without any other search terms and for my 2,790 records it returns them all regardless of the unit of measurement. So I have to be missing something (hopefully small). Any help is appreciated. The NEST examples I can find are a year or two old and that syntax does not seem to work any more.
double distance = 4.0;
var geoResult = client.Search<TrackPointES>(s => s.From(0).Size(10000).Type("trackpointes")
.Query(query => query
.Bool( b => b.Filter(filter => filter
.GeoDistance(geo => geo
.Distance(distance, Nest.DistanceUnit.Kilometers).Location(35, -82)))
)
)
);
If I use POSTMAN to connect to my instance of ES and POST a search w/ the below JSON, I get a return of 143 total documents out of 2,790. So I know the data is right as that is a realistic return.
{
"query" : {
"filtered" : {
"filter" : {
"geo_distance" : {
"distance" : "4km",
"location" : {
"top_left": {
"lat" : 35,
"lon" : -82
}
}
}
}
}
}
}
Looks like you didn't specify field in your query. Try this one:
var geoResult = client.Search<Document>(s => s.From(0).Size(10000)
.Query(query => query
.Bool(b => b.Filter(filter => filter
.GeoDistance(geo => geo
.Field(f => f.Location) //<- this
.Distance(distance, Nest.DistanceUnit.Kilometers).Location(35, -82)))
)
)
);
I forgot to specify the field to search for the location. :( But I am posting here just in case someone else has the same issue and to shame myself into trying harder...
.Field(p => p.location) was the difference in the query.
var geoResult = client.Search<TrackPointES>(s => s.From(0).Size(10000).Type("trackpointes")
.Query(query => query
.Bool( b => b.Filter(filter => filter
.GeoDistance(geo => geo.Field(p => p.location).DistanceType(Nest.GeoDistanceType.SloppyArc)
.Distance(distance, Nest.DistanceUnit.Kilometers).Location(35, -82)))
)
)
);

Selectively turn off stop words in Elastic Search

So I would like to turn off stop word filtering on the username, title, and tags fields but not the description field.
As you can imagine I do not want to filter out a result called the best but I do want to stop the from affecting the score if it is in the description field (search the on GitHub if you want an example).
Now #Javanna says ( Is there a way to "escape" ElasticSearch stop words? ):
In your case I would disable stopwords for that specific field rather than modifying the stopword list, but you could do the latter too if you wish to.
Failing to provide an example so I searched around and tried the common query: http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/ which didn't work for me either.
So I searched for specifically stopping the filtering stop words however the closest I have come to is stopping it index wide: Can I customize Elastic Search to use my own Stop Word list? by attacking the analyzer directly, or failing that the documentation hints at making my own analyzer :/.
What is the best way selectively disable stop words on certain fields?
I think you already know what to do, which would be to customize your analyzers for certain fields. From what I understand you did not manage to create a valid syntax example for that. This is what we used in a project, I hope that this example points you in the right direction:
{
:settings => {
:analysis => {
:analyzer => {
:analyzer_umlauts => {
:tokenizer => "standard",
:char_filter => ["filter_umlaut_mapping"],
:filter => ["standard", "lowercase"],
}
},
:char_filter => {
:filter_umlaut_mapping => {
:type => 'mapping',
:mappings_path => es_config_file("char_mapping")
}
}
}
},
:mappings => {
:company => {
:properties => {
[...]
:postal_city => { :type => "string", :analyzer => "analyzer_umlauts", :omit_norms => true, :omit_term_freq_and_positions => true, :include_in_all => false },
}
}
}
}

Resources