Search elasticsearch all content from especific source - elasticsearch

i want to know if it is possible to search all content from a specific _source in elasticsearch.
for example i have this:
{
"_shards":{
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits":{
"total" : 1,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
]
}
}
i want query all users from source without specifying the name.
for eg: something similar in SQL is like this
SELECT * user from twitter
and with that give all users
thanks and sorry for my bad english
edit:
i want search only for the source.
i give you an example, i have a source who store random word, sometimes store, sometimes not. i want to search for this source only when have new words.
the plan is verify from last 10 minutes if in my specific source have something new, if not, i don't care

You can just:
$ curl -XGET 'http://localhost:9200/twitter/_search'
That by default will return 10 documents. You can sipecify a size:
$ curl -XGET 'http://localhost:9200/twitter/_search?size=BIGNUM'
Or you can use scrool: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

Related

How to Curl Kibana Dashboard with GET Response to Find All Dashboard IDs and/or Dashboard Content

So I am trying to automate the scrape of our internal Kibana Dashboards from within our environments for information gathering purposes. I have looked through the following link, but Elasticsearch doesn't seem to really provide good examples of what I am trying to do or accomplish here. Several constraints I have: 1. the commands must be in BASH, 2. I cannot use any compiler such as Python and the Requests and/or Beautifulsoup modules to grab the information and parse it.
Here is my Dilemma:
I log in to the Kibana Dashboard:
Some http://<IP_ADDRESS>:5601/app/kibana#/dashboards?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-1h,mode:quick,to:now))
It will look like this if I am properly tunneled into the environment.
There are three dashboards that I want to collect:
API RESPONSES
Logs
Notifications
The example curl command I am using is as follows to scrape the dashboards:
curl -s http://<IP_ADDRESS>:5601/app/kibana#/dashboard/API\ RESPONSES
curl -s http://<IP_ADDRESS>:5601/app/kibana#/dashboard/logs
curl -s http://<IP_ADDRESS>:5601/app/kibana#/dashboard/notifications
Now the Elasticsearch Documentation mentions something of a Dashboard ID, to which I cannot see it unless I open a webpage and use the inspect tool on a particular element I am sending the GET request to. I am trying to accomplish that by curling the main dashboard page:
curl -s http://<IP_ADDRESS>:5601/app/kibana#/dashboard/_search?pretty
My output will return an HTML output, but it doesn't seem to change and I cannot properly acquire the Dashboards without knowing the Dashboard ID. Furthermore, I am trying to see what dashboards are available and scrape all of them depending on what a person has set up within the environment so it's important that this process is dynamic. My eventual and ultimate goals are to get:
Dashboard IDs Available
Scrape the Dashboards by IDs
Basically I want to curl this output to get the return JSON.
Any thoughts would be greatly appreciated.
So apparently, I was curling the wrong location.
I needed to curl the VIP and port 9200 for the Kibana Index to pull in the available dashboards.
rbarrett#cfg01:~$ curl -s http://<IP_ADDRESS>:9200/.kibana/dashboard/_search?pretty
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : ".kibana",
"_type" : "dashboard",
"_id" : "logs",
"_score" : 1.0,
"_source" : {
"description" : "",
"hits" : 0,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : "{\"filter\":[{\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"query\":\"*\"}}}]}"
},
"optionsJSON" : "{\"darkTheme\":true}",
"panelsJSON" : "[{\"col\":1,\"columns\":[\"Hostname\",\"Logger\",\"programname\",\"severity_label\",\"Payload\",\"environment_label\"],\"id\":\"search-logs\",\"panelIndex\":5,\"row\":13,\"size_x\":12,\"size_y\":12,\"sort\":[\"Timestamp\",\"desc\"],\"type\":\"search\"},{\"col\":1,\"id\":\"NUMBER-OF-LOG-MESSAGES-PER-SEVERITY\",\"panelIndex\":7,\"row\":9,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":7,\"id\":\"TOP-10-PROGRAMS\",\"panelIndex\":9,\"row\":5,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":1,\"id\":\"LOG-MESSAGES-OVER-TIME-PER-SOURCE\",\"panelIndex\":10,\"row\":1,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":7,\"id\":\"TOP-10-HOSTS\",\"panelIndex\":11,\"row\":9,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":1,\"id\":\"TOP-10-SOURCES\",\"panelIndex\":14,\"row\":5,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":7,\"id\":\"LOG-MESSAGES-OVER-TIME-PER-SEVERITY\",\"panelIndex\":16,\"row\":1,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"}]",
"timeFrom" : "now-1h",
"timeRestore" : true,
"timeTo" : "now",
"title" : "Logs",
"uiStateJSON" : "{\"P-10\":{\"vis\":{\"legendOpen\":true}},\"P-11\":{\"vis\":{\"colors\":{\"Count\":\"#629E51\"},\"legendOpen\":true}},\"P-12\":{\"spy\":{\"mode\":{\"fill\":false,\"name\":null}},\"vis\":{\"colors\":{\"Count\":\"#2F575E\"},\"legendOpen\":false}},\"P-14\":{\"vis\":{\"legendOpen\":true}},\"P-7\":{\"vis\":{\"legendOpen\":false}},\"P-9\":{\"vis\":{\"colors\":{\"Count\":\"#99440A\"},\"legendOpen\":true}}}",
"version" : 1
}
},
{
"_index" : ".kibana",
"_type" : "dashboard",
"_id" : "notifications",
"_score" : 1.0,
"_source" : {
"description" : "",
"hits" : 0,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : "{\"filter\":[{\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"query\":\"*\"}}}]}"
},
"optionsJSON" : "{\"darkTheme\":true}",
"panelsJSON" : "[{\"col\":1,\"columns\":[\"Logger\",\"publisher\",\"severity_label\",\"event_type\",\"old_state\",\"old_task_state\",\"state\",\"new_task_state\",\"environment_label\",\"display_name\"],\"id\":\"search-notifications\",\"panelIndex\":1,\"row\":14,\"size_x\":12,\"size_y\":11,\"sort\":[\"Timestamp\",\"desc\"],\"type\":\"search\"},{\"col\":1,\"id\":\"NOTIFICATIONS-OVER-TIME-PER-SOURCE\",\"panelIndex\":2,\"row\":1,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":7,\"id\":\"NOTIFICATIONS-OVER-TIME-PER-SEVERITY\",\"panelIndex\":3,\"row\":1,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":7,\"id\":\"EVENT-TYPE-BREAKDOWN\",\"panelIndex\":4,\"row\":5,\"size_x\":6,\"size_y\":5,\"type\":\"visualization\"},{\"col\":1,\"id\":\"SOURCE-BREAKDOWN\",\"panelIndex\":5,\"row\":5,\"size_x\":6,\"size_y\":5,\"type\":\"visualization\"},{\"col\":1,\"id\":\"HOST-BREAKDOWN\",\"panelIndex\":6,\"row\":10,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"},{\"col\":7,\"id\":\"NOTIFICATIONS-PER-SEVERITY\",\"panelIndex\":7,\"row\":10,\"size_x\":6,\"size_y\":4,\"type\":\"visualization\"}]",
"timeFrom" : "now-1h",
"timeRestore" : true,
"timeTo" : "now",
"title" : "Notifications",
"uiStateJSON" : "{\"P-4\":{\"vis\":{\"legendOpen\":true}},\"P-7\":{\"vis\":{\"legendOpen\":false}}}",
"version" : 1
}
}
]
}
}
Afterwhich I was able to pull out the existing IDs with JQ:

Using field instead of "_id" for more-like-this query

I have a slug field that I want to use to identify object to use as a reference instead of "_id" field. But instead of using it as a reference, doc seems to use it as query to comapre against. Since slug is a unique field with a simple analyzer, it just returns exactly one result like the following. As far as I know, there is no way to use a custom field as _id field:
https://github.com/elastic/elasticsearch/issues/6730
So is double look up, finding out elasticsearch's id first then doing more_like_this the only way to achieve what I am looking for? Someone seems to have asked a similar question three years ago, but it doesn't have an answer.
ArticleDocument.search().query("bool",
should=Q("more_like_this",
fields= ["slug", "text"],
like={"doc": {"slug": "OEXxySDEPWaUfgTT54QvBg",
}, "_index":"article", "_type":"doc"},
min_doc_freq=1,
min_term_freq=1
)
).to_queryset()
Returns:
<ArticleQuerySet [<Article: OEXxySDEPWaUfgTT54QvBg)>]>
You can make some of your documents field as "default" _id while ingesting data.
Logstash
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_name"
document_id => "%{some_field_id}"
}
}
Spark (Scala)
DF.saveToEs("index_name" + "/some_type", Map("es.mapping.id" -> "some_field_id"))
Index API
PUT twitter/_doc/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 2
},
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"result" : "created"
}

Elasticsearch catch-all field slowness after upgrade

We upgraded a 2.4 cluster to 6.2 cluster using the reindex from remote approach. In 2.4, we were using the catch-all _all field to perform searches and were seeing response times under 500 ms for all our queries.
In 6.2, the _all field is no longer available for the new index, so we ended up creating a new text type field called all like "all": {"type": "text"} and set copy_to on all our other fields (about 2000 of them). But now, searches on this new catch-all field all are taking 2 to 10 times longer than the search on the 2.4 _all field. (We flushed the caches on both clusters before performing the queries.)
Both clusters are single data center, single node 8GB memory on the same AWS zone, hosted through elastic cloud. Both indices have the same number of documents (about 6M) and have about 150 Lucene segment files.
Any clues as to why?
UPDATE: Both indices return documents without the catch-all field i.e. they do not store the catch-all field.
Here is an example query and response:
$ curl --user "$user:$password" \
> -H 'Content-Type: application/json' \
> -XGET "$es/$index/$mapping/_search?pretty" -d'
> {
> "size": 1,
> "query" : {
> "match" : { "all": "sherlock" }
> }
> }
> '
{
"took" : 42,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 28133,
"max_score" : 2.290815,
"hits" : [ {
"_index" : "sherlock",
"_type" : "doc",
"_id" : "513763",
"_score" : 2.290815,
"_source" : {
"docid" : 513763,
"age" : 115,
"essay" : "Has Mr. Sherlock Holmes?",
"name" : {
"last" : "Pezzetti",
"first" : "Lilli"
},
"ssn" : 834632279
}
} ]
}
}
UPDATE 2: Another point I forgot to mention is that the 2.4 cluster is currently being used by a staging app, which sends a few queries to it every few minutes. Could this bring other factors like OS caching into play?
Did you store the _all field and returned it in your original setup? Do you return it now? If you didn't and now you do then that's a response overhead you are seeing and not a search overhead. Basically you should omit that field in your response (from you _source) if you don't need it (and any other field for that matter).
Check _source filtering for more

Not able to discover term in elasticsearch through kibana but curl request is working fine

I had some data already residing in ElasticSearch. I added a term {"oncall" :"true"} to some document based on some conditions by python post requests.When I am going to Kibana and trying to search it on discover page. I am not getting any results. But when I am doing following curl request I am getting the results.
curl -XPOST "http://localhost:9200/logstash*/logs/_search?pretty" -d '
{
"query" : {
"term" : {"oncall" :"true"}
}
}'
results
"hits" : {
"total" : 47,
"max_score" : 12.706658,
"hits" : [ {
"_index" : "logstash-2015.10.20",
"_type" : "logs",
.......
.......
I want to ask that why i am not able to see the results in kibana and what setting do i need to change.
the query which I am writing on discover-page text box is
oncall:true #this is giving me no results

Elasticsearch index last update time

Is there a way to retrieve from ElasticSearch information on when a specific index was last updated?
My goal is to be able to tell when it was the last time that any documents were inserted/updated/deleted in the index. If this is not possible, is there something I can add in my index modification requests that will provide this information later on?
You can get the modification time from the _timestamp
To make it easier to return the timestamp you can set up Elasticsearch to store it:
curl -XPUT "http://localhost:9200/myindex/mytype/_mapping" -d'
{
"mytype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
}
}
}'
If I insert a document and then query on it I get the timestamp:
curl -XGET 'http://localhost:9200/myindex/mytype/_search?pretty' -d '{
> fields : ["_timestamp"],
> "query": {
> "query_string": { "query":"*"}
> }
> }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "mytype",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"_timestamp" : 1417599223918
}
} ]
}
}
updating the existing document:
curl -XPOST "http://localhost:9200/myindex/mytype/1/_update" -d'
{
"doc" : {
"field1": "data",
"field2": "more data"
},
"doc_as_upsert" : true
}'
Re-running the previous query shows me an updated timestamp:
"fields" : {
"_timestamp" : 1417599620167
}
I don't know if there are people who are looking for an equivalent, but here is a workaround using shards stats for > Elasticsearch 5 users:
curl XGET http://localhost:9200/_stats?level=shards
As you'll see, you have some informations per indices, commits and/or flushs that you might use to see if the indice changed (or not).
I hope it will help someone.
Just looked into a solution for this problem. Recent Elasticsearch versions have a <index>/_recovery API.
This returns a list of shards and a field called stop_time_in_millis which looks like it is a timestamp for the last write to that shard.

Resources