How to find what index a field belongs to in elasticsearch? - elasticsearch

I am new to elasticsearch. I have to write a query using a given field but I don't know how to find the appropriate index. How would I find this information?

Edit:
Here's an easier/better way using mapping API
GET _mapping/field/<fieldname>
One of the ways you can find is to get records where the field exist
Replace the <fieldName> with your fields name. /_search will search across all indices and return any document that matches or has the field. Set _source to false, since you dont care about document contents but only index name.
GET /_search
{
"_source": false,
"query": {
"exists": {
"field": "<fieldName>"
}
}
}

Another, more visual way to do that is through the kibana Index Management UI (assuming you have privileges to access the site).
There you can click on the indices and open the mappings tab to get all fields of the particular index. Then just search for the desired field.
Summary:
#Polynomial Proton's answer is the way of choice in 90% of the time. I just wanted to show you another way to solve your issue. It will require more manual steps than #Polynomial Proton's answer. Also, if you have a large amount of indices this way is not appropriate.

Related

Reverse searching with Elastic Search

We have a site and want to give the users the oportunity to save a search query and be notified once an object have been added that would have been a hit they might be interested in.
We Have an index that contains search queries that the users have saved. Every time a new object is added to the object index, we want to do a reverse search in order to find the search queries that would have resulted in a hit for that object. This is in order to avoid doing one search for each saved query every time an object is added.
The problem is that the object contains all data, but the search queries only contain the properties that are interesting. So we are getting zero hits for most queries.
Example:
Search query:
{
"make": "foo",
"model": "bar
}
Newly added object:
{
"make": "foo",
"model: "bar",
"type": "jazz"
}
As you can see, the user is interested in any object with make "foo" and model "bar", and we want a query that would result in a hit because type "jazz" is missing in the index. What we get is zero hits.
We use the nest client version 7.13.0 in a dotnet6 application and Elastic Search version 7.13.4.
Would it be possible to reverse search so that a null in the index would be considered as a hit for any search query?
Thank you
You can achieve this with Percolate Query in Elasticsearch.
I have recently written blog on Percolate Query where I have explained with an example.
You can save a user query with Percolate query and when you index document at that time you can call search API and check if any query is matched the document or not. As you are using Nest client this will be easy to implement.

Elasticsearch Join

I have two indices. One indices "indications" which have some set of values.
Other is "projects". In this indices, I will add indications value like " indication = oncology".
Now I want to show all indications. Which I can do using terms aggregations. But my issue is that I also want to show count of project in which that indication is used .
So for that, I need to write join query.
Can anyone help me to resolve this issue?
Expected result example:
[{name:"onclogogy",projectCount:"12"}]
You cannot have joins in Elasticsearch. What you can do is store indication name in project index and then apply the term aggregation on project index. That basically will get you the different indications from all the project documents and count of each indication.
Something of the sort:
GET /project/_search
{
"query": {},
"aggs": {
"indcation":{
"terms": {
"field": "indication_name"
}
}
}
}
Elasticsearch does not supports joins. That's the whole point of having NoSQL that you keep the data as denormalised as you can. Make the documents more and more self sufficient.
There are some ways with which you can add some sort of relationship b/w your data. This is a nice blog on it.

difference between a field and the field.keyword

If I add a document with several fields to an Elasticsearch index, when I view it in Kibana, I get each time the same field twice. One of them will be called
some_field
and the other one will be called
some_field.keyword
Where does this behaviour come from and what is the difference between both of them?
PS: one of them is aggregatable (not sure what that means) and the other (without keyword) is not.
Update : A short answer would be that type: text is analyzed, meaning it is broken up into distinct words when stored, and allows for free-text searches on one or more words in the field. The .keyword field takes the same input and keeps as one large string, meaning it can be aggregated on, and you can use wildcard searches on it. Aggregatable means you can use it in aggregations in elasticsearch, which resembles a sql group by if you are familiar with that. In Kibana you would probably use the .keyword field with aggregations to count distinct values etc.
Please take a look on this article about text vs. keyword.
Briefly: since Elasticsearch 5.0 string type was replaced by text and keyword types. Since then when you do not specify explicit mapping, for simple document with string:
{
"some_field": "string value"
}
below dynamic mapping will be created:
{
"some_field": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
As a consequence, it will both be possible to perform full-text search on some_field, and keyword search and aggregations using the some_field.keyword field.
I hope this answers your question.
Look at this issue. There is some explanation of your question in it. Roughly speaking some_field is analyzed and can be used for fulltext search. On the other hand some_field.keyword is not analyzed and can be used in term queries or in aggregation.
I will try to answer your questions one by one.
Where does this behavior come from?
It is introduced in Elastic 5.0.
What is the difference between the two?
some_field is used for full text search and some_field.keyword is used for keyword searching.
Full text searching is used when we want to include individual tokens of a field's value to be included in search. For instance, if you are searching for all the hotel names that has "farm" in it, such as hay farm house, Windy harbour farm house etc.
Keyword searching is used when we want to include the whole value of the field in search and not individual tokens from the value. For eg, suppose you are indexing documents based on city field. Aggregating based on this field will have separate count for "new" and "york" instead of "new york" which is usually the expected behavior.
From Elastic 5.0 onwards, strings now will be mapped both as keyword and text by default.

elasticsearch: allow discovery of document, without exposing source?

I'm trying to set up elasticsearch so that it allows users to discover the existence of documents, without having access to the document itself. For example, imagine a site that aggregates academic articles: they allow full-text search over the body, but only present the abstract.
I am trying to set up a system where different user groups have access to different documents, but everyone has access to the entire index.
What is the path of least resistance for me to set up restricted content search on elasticsearch? Is it a setting? A plugin? Write my own plugin? Fork?
To answer first part of your query,
First way: You can disable returning _source field for particular query by this.
{
"_source": false,
"query": {
"term": {
"user": "kimchy"
}
}
}
Second way: If you never want to see _source field, you can disable storing it.
{
"tweet": {
"_source": {
"enabled": false
}
}
}
Second part, you mentioned
I didn't exactly get your requirements but Shield can be useful if you want simple authentication, role based access control so some set of users can't modify documents and so on.
If you have your user-facing system, you can achieve it simply by adding access permission field in each document and mapping the permissions with user. Then you can use the filters when searching for documents. This is in-case if you don't get into details of Shield.

How to retrieve unique count of a field using Kibana + Elastic Search

Is it possible to query for a distinct/unique count of a field using Kibana? I am using elastic search as my backend to Kibana.
If so, what is the syntax of the query? Heres a link to the Kibana interface I would like to make my query: http://demo.kibana.org/#/dashboard
I am parsing nginx access logs with logstash and storing the data into elastic search. Then, I use Kibana to run queries and visualize my data in charts. Specifically, I want to know the count of unique IP addresses for a specific time frame using Kibana.
For Kibana 4 go to this answer
This is easy to do with a terms panel:
If you want to select the count of distinct IP that are in your logs, you should specify in the field clientip, you should put a big enough number in length (otherwise, it will join different IP under the same group) and specify in the style table. After adding the panel, you will have a table with IP, and the count of that IP:
Now Kibana 4 allows you to use aggregations. Apart from building a panel like the one that was explained in this answer for Kibana 3, now we can see the number of unique IPs in different periods, that was (IMO) what the OP wanted at the first place.
To build a dashboard like this you should go to Visualize -> Select your Index -> Select a Vertical Bar chart and then in the visualize panel:
In the Y axis we want the unique count of IPs (select the field where you stored the IP) and in the X axis we want a date histogram with our timefield.
After pressing the Apply button, we should have a graph that shows the unique count of IP distributed on time. We can change the time interval on the X axis to see the unique IPs hourly/daily...
Just take into account that the unique counts are approximate. For more information check also this answer.
Be aware with Unique count you are using 'cardinality' metric, which does not always guarantee exact unique count. :-)
the cardinality metric is an approximate algorithm. It is based on the
HyperLogLog++ (HLL) algorithm. HLL works by hashing your input and
using the bits from the hash to make probabilistic estimations on the
cardinality.
Depending on amount of data I can get differences of 700+ entries missing in a 300k dataset via Unique Count in Elastic which are otherwise really unique.
Read more here: https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html
Create "topN" query on "clientip" and then histogram with count on "clientip" and set "topN" query as source. Then you will see count of different ips per time.
Unique counts of field values are achieved by using facets. See ES documentation for the full story, but the gist is that you will create a query and then ask ES to prepare facets on the results for counting values found in fields. It's up to you to customize the fields used and even describe how you want the values returned. The most basic of facet types is just to group by terms, which would be like an IP address above. You can get pretty complex with these, even requiring a query within your facet!
{
"query": {
"match_all": {}
},
"facets": {
"terms": {
"field": "ip_address"
}
}
}
Using Aggs u can easily do that.
Writing down query for now.
GET index/_search
{
"size":0,
"aggs": {
"source": {
"terms": {
"field": "field",
"size": 100000
}
}
}
}
This would return the different values of field with there doc counts.
For Kibana 7.x, Unique Count is available in most visualizations.
For example, in Lens:
In aggregation based visualizations:
And even in TSVB (supporting normal fields as well as Runtime Fields, Scripted Fields are not supported):

Resources