How to do facet search with mpdreamz Nest - elasticsearch

does anybody know how to do facet search with Nest?
My index is https://gist.github.com/3606852
would like to search for some keyword in 'NumberEvent' and dispaly the result if the keyword exist.Please help me !!!

This is using the assumption that the MyPoco class exists and maps to your elasticsearch document. If it doesn't you can use dynamic but you'l have to swap the lambda based field selectors with strings.
var result = client.Search<MyPoco>(s=>s
.From(0)
.Size(10)
.Filter(ff=>ff.
.Term(f=>f.Categories.Types.Events.First().NumberEvent.event, "keyword")
)
.FacetTerm(q=>q.OnField(f=>f.Categories.Types.Facets.First().Person.First().entity))
);
result.Documents now holds your documents
result.Facet<TermFacet>(f => f.Categories.Types.Facets.First().Person.First().entity); now holds your facets
Your document seems a bit strange though in the sense that it already has Facets with counts in them.

Related

Simple query without a specified field searching in whole ElasticSearch index

Say we have an ElasticSearch instance and one index. I now want to search the whole index for documents that contain a specific value. It's relevant to the search for this query over multiple fields, so I don't want to specify every field to search in.
My attempt so far (using NEST) is the following:
var res2 = client.Search<ElasticCompanyModelDTO>(s => s.Index("cvr-permanent").AllTypes().
Query(q => q
.Bool(bo => bo
.Must( sh => sh
.Term(c=>c.Value(query))
)
)
));
However, the query above results in an empty query:
I get the following output, ### ES REQEUST ### {} , after applying the following debug on my connectionstring:
.DisableDirectStreaming()
.OnRequestCompleted(details =>
{
Debug.WriteLine("### ES REQEUST ###");
if (details.RequestBodyInBytes != null) Debug.WriteLine(Encoding.UTF8.GetString(details.RequestBodyInBytes));
})
.PrettyJson();
How do I do this? Why is my query wrong?
Your problem is that you must specify a single field to search as part of a TermQuery. In fact, all ElasticSearch queries require a field or fields to be specified as part of the query. If you want to search every field in your document, you can use the built-in "_all" field (unless you've disabled it in your mapping.)
You should be sure you really want a TermQuery, too, since that will only match exact strings in the text. This type of query is typically used when querying short, unanalyzed string fields (for example, a field containing an enumeration of known values like US state abbreviations.)
If you'd like to query longer full-text fields, consider the MultiMatchQuery (it lets you specify multiple fields, too.)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
Try this
var res2 = client.Search<ElasticCompanyModelDTO>(s =>
s.Index("cvr-permanent").AllTypes()
.Query(qry => qry
.Bool(b => b
.Must(m => m
.QueryString(qs => qs
.DefaultField("_all")
.Query(query))))));
The existing answers rely on the presence of _all. In case anyone comes across this question at a later date, it is worth knowing that _all was removed in ElasticSearch 6.0
There's a really good video explaining the reasons behind this and the way the replacements work from ElasticOn starting at around 07:30 in.
In short, the _all query can be replaced by a simple_query_string and it will work with same way. The form for the _search API would be;
GET <index>/_search
{
"query": {
"simple_query_string" : {
"query": "<queryTerm>"
}
}
}
The NEST pages on Elastic's documentation for this query are here;

ElasticSearch - Filter stop words from Top words

I have a list of documents I am indexing like this:
ElasticIndex.CreateIndex(IndexName, _ => _
.Mappings(__ => __
.Map<AlbumMetadata>(
M => M.AutoMap()
.Properties(P => P.Text(T => T.Name(N => N.Keywords)
.Analyzer("stop")
.Fields(F => F.Keyword(K => K.Name("keywords"))))))));
In my class AlbumMetaData, the field Keywords is a list:
[Keyword]
public List<string> Keywords { get; set; }
When I want to retrieve the top terms, I do the following query (you can ignore Category and Type, they're not relevant to the problem):
var Match = Driver.Search<AlbumMetadata>(_ => _
.Query(Q => Q
.Term(P => P.Category, (int)Category) && Q
.Term(P => P.Type, (int)Type))
.Source(F => F.Includes(S => S.Fields(L => L.Keywords)))
.Aggregations(A => A
.Terms("Tags", T => T
.Field(E => E.Keywords)
.Size(Limit)
)
));
var Tags = Match.Aggs.Terms("Tags").Buckets.ToDictionary(K => K.Key, V => V.DocCount);
The problem is that in the output, I get some stop words as well as some symbols, like / - & |
What am I doing wrong?
Edit:
In order to clarify the question, here is what I am trying to achieve:
I have documents that have titles (full English sentences) and tags (list of single words, sometimes a tag is a two word tag).
I need to be able to perform a search that will find documents based on the title and tags (and ideally using word stems, ignoring plurals, etc).
I also need to extract the list of top words. The Keywords list is a concatenation of all words from the title and all the entries from the tags list.
Is the way I create the index appropriate in this context? Also, is the way I do the aggregation the right way?
There's a few things:
When you create the index, .AutoMap() on the mapping will infer Elasticsearch field datatypes from the POCO property types and the attributes applied to them. Then, .Properties() overrides any of these inferred mappings. So, the end result of your mapping for Keywords is a text datatype field with the stop analyzer applied, and a multi-field sub field of "keywords" (queryable via "keywords.keywords"), set as a keyword datatype.
The aggregation is running on the "keywords" text field with the stop analyzer applied. The stop analyzer uses English stop words by default, but you can configure the stop analyzer with other stop words by defining a custom stop analyzer in the index. The stop analyzer will not remove symbols like /, -, & and |.
With a terms aggregation, you generally want to get back aggregations on the verbatim terms for a field, which you can get with your mapping by using the "keywords.keywords" field in the aggregation. You can apply a normalizer to a keyword field which is similar to an analyzer, except it produces only one token. This is because a keyword field uses doc_values, an on-disk columnar data structure that is suited for well performing, large scale aggregations.
You can run the aggregation on a text field too as you're doing, but you also need to enable fielddata and be aware of how it works. text fields can't use doc_values.

ElasticSearch get only document ids, _id field, using search query on index

For a given query I want to get only the list of _id values without getting any other information (without _source, _index, _type, ...).
I noticed that by using _source and requesting non-existing fields it will return only minimal data but can I get even less data in return ?
Some answers suggest to use the hits part of the response, but I do not want the other info.
Better to use scroll and scan to get the result list so elasticsearch doesn't have to rank and sort the results.
With the elasticsearch-dsl python lib this can be accomplished by:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE)
s = s.fields([]) # only get ids, otherwise `fields` takes a list of field names
ids = [h.meta.id for h in s.scan()]
I suggest to use elasticsearch_dsl for python. They have a nice api.
from elasticsearch_dsl import Document
# don't return any fields, just the metadata
s = s.source(False)
results = list(s)
Afterwards you can get the the id with:
first_result: Document = results[0]
id: Union[str,int] = first_result.meta.id
Here is the official documentation to get some extra information: https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#extra-properties-and-parameters

How can I find the true score from Elasticsearch query string with a wildcard?

My ElasticSearch 2.x NEST query string search contains a wildcard:
Using NEST in C#:
var results = _client.Search<IEntity>(s => s
.Index(Indices.AllIndices)
.AllTypes()
.Query(qs => qs
.QueryString(qsq => qsq.Query("Micro*")))
.From(pageNumber)
.Size(pageSize));
Comes up with something like this:
$ curl -XGET 'http://localhost:9200/_all/_search?q=Micro*'
This code was derived from the ElasticSearch page on using Co-variants. The results are co-variant; they are of mixed type coming from multiple indices. The problem I am having is that all of the hits come back with a score of 1.
This is regardless of type or boosting. Can I boost by type or, alternatively, is there a way to reveal or "explain" the search result so I can order by score?
Multi term queries like wildcard query are given a constant score equal to the boosting by default. You can change this behaviour using .Rewrite().
var results = client.Search<IEntity>(s => s
.Index(Indices.AllIndices)
.AllTypes()
.Query(qs => qs
.QueryString(qsq => qsq
.Query("Micro*")
.Rewrite(RewriteMultiTerm.ScoringBoolean)
)
)
.From(pageNumber)
.Size(pageSize)
);
With RewriteMultiTerm.ScoringBoolean, the rewrite method first translates each term into a should clause in a bool query and keeps the scores as computed by the query.
Note that this can be CPU intensive and there is a default limit of 1024 bool query clauses that can be easily hit for a large document corpus; running your query on the complete StackOverflow data set (questions, answers and users) for example, hits the clause limit for questions. You may want to analyze some text with an analyzer that uses an edgengram token filter.
Wildcard searches will always return a score of 1.
You can boost by a particular type. See this:
How to boost index type in elasticsearch?

Group By Elasticsearch

I have document A, B, C in the same document type. All 3 has a property is_type = 'Normal', is_type = 'Normal', is_type = 'AbNormal'. I want to get search Response in one single query and then just use Search Response API to get the list of Documents which were having type as normal and abnormal. I know aggregation will not help in getting the document as it's just aggregation. Any help would be appreciated.

Resources