Complex search within elasticsearch using NEST (.net) - elasticsearch

I am working with elasticsearch 2.3.4 (can update to 5 but its still 1 week since release and waiting for reviews on how its working)
I am trying to create asearch within my .net class
ISearchResponse<ClassModel> apiResponse = client1.Search<ClassModel>(a =>
a.Query(q =>
q.Term(p => p.param1, Param1) &&
q.Term(p => p.const1, "const1") &&
q.Term(p => p.param2, param2)));
For some reason the const1 return no values (even if i run it alone without the other params) but with HD extension i get results, maybe i shouldnt use Term ? something else?
Thank you in advance

It sounds as though you might not have the correct mapping on the "const1" field.
Edit as per comment below: You can use a term query on an analyzed field but it's unlikely to work how you might expect. If your field "const1" contains multiple words, then a term query with search text equal to the string you indexed will not match.
"const1": {
"type": "string",
"index": "not_analyzed"
}

Related

Simple query without a specified field searching in whole ElasticSearch index

Say we have an ElasticSearch instance and one index. I now want to search the whole index for documents that contain a specific value. It's relevant to the search for this query over multiple fields, so I don't want to specify every field to search in.
My attempt so far (using NEST) is the following:
var res2 = client.Search<ElasticCompanyModelDTO>(s => s.Index("cvr-permanent").AllTypes().
Query(q => q
.Bool(bo => bo
.Must( sh => sh
.Term(c=>c.Value(query))
)
)
));
However, the query above results in an empty query:
I get the following output, ### ES REQEUST ### {} , after applying the following debug on my connectionstring:
.DisableDirectStreaming()
.OnRequestCompleted(details =>
{
Debug.WriteLine("### ES REQEUST ###");
if (details.RequestBodyInBytes != null) Debug.WriteLine(Encoding.UTF8.GetString(details.RequestBodyInBytes));
})
.PrettyJson();
How do I do this? Why is my query wrong?
Your problem is that you must specify a single field to search as part of a TermQuery. In fact, all ElasticSearch queries require a field or fields to be specified as part of the query. If you want to search every field in your document, you can use the built-in "_all" field (unless you've disabled it in your mapping.)
You should be sure you really want a TermQuery, too, since that will only match exact strings in the text. This type of query is typically used when querying short, unanalyzed string fields (for example, a field containing an enumeration of known values like US state abbreviations.)
If you'd like to query longer full-text fields, consider the MultiMatchQuery (it lets you specify multiple fields, too.)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
Try this
var res2 = client.Search<ElasticCompanyModelDTO>(s =>
s.Index("cvr-permanent").AllTypes()
.Query(qry => qry
.Bool(b => b
.Must(m => m
.QueryString(qs => qs
.DefaultField("_all")
.Query(query))))));
The existing answers rely on the presence of _all. In case anyone comes across this question at a later date, it is worth knowing that _all was removed in ElasticSearch 6.0
There's a really good video explaining the reasons behind this and the way the replacements work from ElasticOn starting at around 07:30 in.
In short, the _all query can be replaced by a simple_query_string and it will work with same way. The form for the _search API would be;
GET <index>/_search
{
"query": {
"simple_query_string" : {
"query": "<queryTerm>"
}
}
}
The NEST pages on Elastic's documentation for this query are here;

How to treat certain field values as null in `Elasticsearch`

I'm parsing log files which for simplicity's sake let's say will have the following format :
{"message": "hello world", "size": 100, "forward-to": 127.0.0.1}
I'm indexing these lines into an Elasticsearch index, where I've defined a custom mapping such that message, size, and forward-to are of type text, integer, and ip respectively. However, some log lines will look like this :
{"message": "hello world", "size": "-", "forward-to": ""}
This leads to parsing errors when Elasticsearch tries to index these documents. For technical reasons, it's very much untrivial for me to pre-process these documents and change "-" and "" to null. Is there anyway to define which values my mapping should treat as null ? Is there perhaps an analyzer I can write which works on any field type whatsoever that I can add to all entries in my mapping ?
Basically I'm looking for somewhat of the opposite of the null_value option. Instead of telling Elasticsearch what to turn a null_value into, I'd like to tell it what it should turn into a null_value. Also acceptable would be a way to tell Elasticsearch to simply ignore fields that look a certain way but still parse the other fields in the document.
So this one's easy apparently. Add the following to your mapping settings :
{
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
}
}
}
}
This will still index the field (contrary to what I've understood from the documentation...) but it will be ignored during aggregations (so if you have 3 entries in an integer field that are "1", 3, and "hello world", an averaging aggregation will yield 2).
Keep in mind that because of the way the option was implemented (and I would say this is a bug) this still fails for and object that is entered as a concrete value and vice versa. If you'd like to get around that you can set the field's enabled value to false like this :
{
"mappings": {
"my_mapping_name": {
"properties": {
"my_unpredictable_field": {
"enabled": false
}
}
}
}
}
This comes at a price though, since this means the field won't be indexed, but the values entered will be still be stored so you can still accessing them by searching for that document through another field. This usually shouldn't be an issue as you likely won't be filtering documents based on the value of such an unpredictable field, but that depends on your specific case use. See here for the official discussion of this issue.

Elasticsearch and NEST: Filtering with Lookups

I'm using Elasticsearch 1.7.x (and NEST 1.7.2) and trying to take advantage of filtering with lookups as documented here: Terms Filter Lookup. I'm able to craft the JSON by hand for the request I desire and execute it using Sense. Works great, awesome feature! However, in the NEST library, I don't see a way to create such a terms clause. For example, from the link referenced above, I can do something like:
"terms" : {
"proteins" : {
"index" : "microarrays",
"type" : "experiment",
"id" : "experiment1234",
"path" : "upregulated_proteins"
},
"_cache_key" : "experiment_1234"
}
Is there a way to build this query using NEST? If not, is there a way I can inject some JSON into a NEST query as I'm building it? I don't know if NEST 2.x+ supports this but upgrading to ES 2.x is a longer term plan for us and I'd like to leverage functionality that is already available in ES 1.7.
Awesome, I've already received an answer from Greg Marzouka of Elastic! He says:
It's mapped as TermsLookup() or TermsLookupFilter in 1.x. Check out the unit tests for some examples.
client.Search<Paper>(s => s
.Query(q => q
.Filtered(fq => qf
.Filter(f => f
.CacheKey("experiment_1234")
.TermsLookup(t => t
.Lookup<Protein>(p => p.UnregulatedProteins, "experiment1234", "microarrays", "experiment")
)
)
)
));
In 2.x it's a little more aligned with the ES query DSL.

How to use wildcards with ngrams in ElasticSearch

Is it possible to combine wildcard matches and ngrams in ElasticSearch? I'm already using ngrams of length 3-11.
As a very small example, I have records C1239123 and C1230123. The user wants to return both of these. This is the only info they know: C123?12
The above case won't work on my full match analyzer because the query is missing the 3 on the end. I was under the impression wildcard matches would work out of the box, but if I perform a search similar to the above I get gibberish.
Query:
.Search<ElasticSearchProject>(a => a
.Size(100)
.Query(q => q
.SimpleQueryString(query => query
.OnFieldsWithBoost(b => b
.Add(f => f.Summary, 2.1)
.Add(f => f.Summary.Suffix("ngram"), 2.0)
.Query(searchQuery))));
Analyzer:
var projectPartialMatch = new CustomAnalyzer
{
Filter = new List<string> { "lowercase", "asciifolding" },
Tokenizer = "ngramtokenizer"
};
Tokenizer:
.Tokenizers(t=>t
.Add("ngramtokenizer", new NGramTokenizer
{
TokenChars = new[] {"letter","digit","punctuation"},
MaxGram = 11,
MinGram = 3
}))
EDIT:
The main purpose is to allow the user to tell the search engine exactly where the unknown characters are. This preserves the match order. I do not ngram the query, only the indexed fields.
EDIT 2 with more test results:
I had simplified my prior example a bit too much. The gibberish was being caused by punctuation filters. With a proper example there's no gibberish, but results aren't returned in a relevant order. Seeing below, I'm unsure why the first 2 results match at all. Ngram is not applied to the query.
Searching for c.a123?.7?0 gives results in this order:
C.A1234.560
C.A1234.800
C.A1234.700 <--Shouldn't this be first?
C.A1234.950
To anyone looking for a resolution to this, wildcards are used on ngrammed tokens by default. My problem was due to my queries having punctuation in them and using a standard analyzer on my query (which breaks on punctuation).
Duc.Duong's suggestion to use the Inquisitor plugin helped show exactly how data would be analyzed.

Django-Haystack elasticsearch queries

Haystack generates elasticsearch queries to get results from elasticsearch. The queries get prepended with a filter containing the following query:
"query": {
"query_string": {
"query": "django_ct:(customers.customer)"
}
}
What is the meaning of the django_ct(..) query? Is this a function that haystack installs in elasticsearch? Is it some caching magic? Can I get rid of this part altogether?
The reason why I'm asking is that I have to build a custom query to use an elasticsearch multi_field. In order to change the queries I want to understand first how haystack generates its own queries.
Haystack uses Django's content types to determine which model attributes to search against in Elasticsearch. This is not really best practice, but it's how it's done in HS.
Basically, the code in HS looks something like this:
app_name, model_name = django_ct.split('.')
ct = ContentType.objects.get_by_natural_key(app_name, model_name)
model = ct.model_class()
# do stuff with model
So, you really don't want to ignore it when using haystack, if you are indexing more than one model in your index.
I have a couple other answers based on elasticsearch here: index analyzer vs query analyzer in haystack - elasticsearch? and here: Django Haystack Distinct Value for Field
EDIT regarding multi-fields:
I've used Haystack and multifields in the past, so I'm not sure you need to write you own backend. The key is understanding how haystack creates searches. As I said in one of the other posts, everything goes into query_string and from there it creates a lucene based search string. Again, not really best practice.
So let's say you have a multi-field that looks like this:
"some_field": {
"type": "multi_field",
"fields": {
"some_field_edgengram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"some_field": {
"type": "string",
"index": "not_analyzed"
}
}
},
In haystack, you can just search against some_field and some_field_edgengram directly.
For example SearchQuerySet().filter(some_field="cat") and SearchQuerySet().filter(some_field_edgengram="cat") will both work, but the first will only match tokens that have cat exactly and the second will match cat, cats, catlin, catch, etc, at least using my edgengram analyzers.
However, just because you use haystack for indexing and search doesn't mean you have to use it for 100% of your search solutions. In the past, I've used PYES in some areas of the app and haystack in others, because haystack lacked the support for more advanced features and the query_string parsing was losing some of the finer grained accuracy we were looking for.
In your case, you could get results from the search engine via elasticutils or python-elasticseach directly for some more advanced searches and use haystack for the other more routine searches.

Resources