GET query based on timestamp - elasticsearch

I’m looking for help on building a query that will retrieve the last number of documents for a given time frame, for example last 30 minutes.
The documents are syslogs like:
{
"_index": "logstash-2017.01.16",
"_type": "syslog",
"_id": "AVmnIUFGd2leAWt2KJSr",
"_score": 1,
"_source": {
"#timestamp": "2017-01-16T11:54:48.318Z",
"syslog_severity_code": 5,
"syslog_facility": "user-level",
"#version": "1",
"host": "10.0.0.1",
"syslog_facility_code": 1,
"message": "Test Syslog Message",
"type": "syslog",
"syslog_severity": "notice",
tags": [
"_grokparsefailure"
]
}
My idea is to build this query into another script that will check for new items being added to ES.

Use Range Query:
GET index/type/_count
{
"query": {
"range": {
"#timestamp": {
"from": "now-30m",
"to" : "now"
}
}
}
}
This will give output like :
{
"count": 2,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
}
}
where count will carry the number of document matched.
Read more about Range Query here

Related

Elasticsearch query with fuzziness AUTO not working as expected

From the Elasticsearch documentation regarding fuzziness:
AUTO
Generates an edit distance based on the length of the term. Low and high distance arguments may be optionally provided AUTO:[low],[high]. If not specified, the default values are 3 and 6, equivalent to AUTO:3,6 that make for lengths:
0..2
Must match exactly
3..5
One edit allowed
>5
Two edits allowed
However, when I am trying to specify low and high distance arguments in the search query the result is not what I am expecting.
I am using Elasticsearch 6.6.0 with the following index mapping:
{
"fuzzy_test": {
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text"
},
"id": {
"type": "keyword"
}
}
}
}
}
}
Inserting a simple document:
{
"id": "1",
"description": "hello world"
}
And the following search query:
{
"size": 10,
"timeout": "30s",
"query": {
"match": {
"description": {
"query": "helqo",
"fuzziness": "AUTO:7,10"
}
}
}
}
I assumed that fuzziness:AUTO:7,10 would mean that for the input term with length <= 6 only documents with the exact match will be returned. However, here is a result of my query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.23014566,
"hits": [
{
"_index": "fuzzy_test",
"_type": "_doc",
"_id": "OQtUu2oBABnEwrgM3Ejr",
"_score": 0.23014566,
"_source": {
"id": "1",
"description": "hello world"
}
}
]
}
}
This is strange but seems like that bug exists only in version the Elasticsearch 6.6.0. I've tried 6.4.2 and 6.6.2 and both of them work just fine.

Understand Elasticsearch Multivalue Fields

I am trying to understand the position_increment_gap as it is explained on the Elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/guide/current/_multivalue_fields_2.html
I created the same index as in the example and inserted a single document
PUT /my_index/groups/1
{
"names": [ "John Abraham", "Lincoln Smith", "Justin Trudeau"]
}
Then I try a phrase query for Abraham Lincoln and it matches, as expected
GET /my_index/groups/_search
{
"query": {
"match_phrase": {
"names": "Abraham Lincoln"
}
}
}
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "names",
"_type": "doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"names": [
"john abraham",
"lincoln smith",
"justin trudeau"
]
}
}
]
}
}
The documentation explains that the match occurs because ES produces the tokens john abraham lincoln smith justin trudeau and it recommends inserting a position_increment_gap of 100 to avoid matching abraham lincoln unless I have a slop of 100.
I changed the index to have a position_increment_gap of 1 as shown below:
PUT names
{
"mappings": {
"doc": {
"properties": {
"names": {
"type":"text",
"position_increment_gap": 1
}
}
}
}
}
If I'm understanding the documentation, using a gap of 1 should allow me to match "abraham smith". But it doesn't match. Nor does "abraham lincoln", "abraham justin", or "abraham trudeau". "lincoln smith", "john abraham" and "justin trudeau" all continue to match.
I must be misunderstanding the documentation.
Thanks for any suggestions.

Elasticsearch result with group by filters

I'm implementing Elasticsearch on my system, but I have a question:
I am creating a job portal, I currently use the request below to list all available positions:
GET /companies/job/_search
{
"sort" : [
{ "post_date" : {"order" : "asc"}},
"_score"
]
}
And with that, I get the result, all available positions (2357), example:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2537,
"max_score": 1.9790175,
"hits": [
{
"_index": "companies",
"_type": "job",
"_id": "2",
"_score": 1.9790175,
"_source": {
"name": "HTML Developer (1 - 2 Yrs Exp.)",
"category": "Graphic Designer",
"location": "Nolda",
"skills": "Javascript"
}
},
{
"_index": "companies",
"_type": "job",
"_id": "114",
"_score": 0.30432263,
"_source": {
"name": "PHP Developer (2 Yrs Exp.)",
"category": "Engineering Job",
"location": "Pune",
"skills": "PHP"
}
}
]
}
}
But I wanted to display filters in a sidebar based on this list that was returned. Similar to the attached prototype.
Example:
2357 vacancies were returned.
In the list next to, shows that of the total vacancies, grouping by categories, we have 214 are graphic designer, 514 are for engineering, etc ...
Grouping by Location, we have: 1254 for Nolda, 221 for Pune, etc ...
I would like to know, if in the same request I make the query to return all available jobs, it would be possible to also bring the groupings.
Or if I have to make two requests, one to bring all the jobs, and another to bring the groupings (and the counters for each item in the grouping).
Try this,
GET companies/job/_search {
"size": 0, "aggs": {
"group_by_state": {
"terms": {
"field": "category.keyword", "size": 15
}
}
}
}
where field is your field that you like to category,location,skills
then size will be your required results count for a page.
Hope it will help :)

How to filter out elements from an array that doesn’t match the query?

stackoverflow won't let me write that much example code so I put it on gist.
So I have this index
with this mapping
here is a sample document I insert into newly created mapping
this is my query
GET products/paramSuggestions/_search
{
"size": 10,
"query": {
"filtered": {
"query": {
"match": {
"paramName": {
"query": "col",
"operator": "and"
}
}
}
}
}
}
this is the unwanted result I get from previous query
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.33217794,
"hits": [
{
"_index": "products",
"_type": "paramSuggestions",
"_id": "1",
"_score": 0.33217794,
"_source": {
"productName": "iphone 6",
"params": [
{
"paramName": "color",
"value": "white"
},
{
"paramName": "capacity",
"value": "32GB"
}
]
}
}
]
}
}
and finally the wanted result, how I want the query result to look like
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.33217794,
"hits": [
{
"_index": "products",
"_type": "paramSuggestions",
"_id": "1",
"_score": 0.33217794,
"_source": {
"productName": "iphone 6",
"params": [
{
"paramName": "color",
"value": "white"
},
]
}
}
]
}
}
How should the query look like to achieve the wanted result with filtered array field which matches the query? In other words, all other non-matching array items should not appear in the final result.
The final result is the _source document that you indexed. There is no feature that lets you mask field elements of your document out of the Elasticsearch response.
That said, depending on your goal, you can look into how Highlighters and Suggesters identify result terms matching the query, or possibly, roll-your-own client-side masking using info returned from setting "explain": true in your query.

Get specific fields from index in elasticsearch

I have an index in elastic-search.
Sample structure :
{
"Article": "Article7645674712",
"Genre": "Genre92231455",
"relationDesc": [
"Article",
"Genre"
],
"org": "user",
"dateCreated": {
"date": "08/05/2015",
"time": "16:22 IST"
},
"dateModified": "08/05/2015"
}
From this index i want to retrieve selected fields: org and dateModified.
I want result like this
{
"took": 265,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 1,
"hits": [
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "3",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "08/05/2015"
}
}
},
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "4",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "10/05/2015"
}
}
}
]
}
}
How to query elastic-search to get only selected specific fields ?
You can retrieve only a specific set of fields in the result hits using the _source parameter like this:
curl -XGET localhost:9200/couchrecords/couchbaseDocument/_search?_source=org,dateModified
Or in this format:
curl -XPOST localhost:9200/couchrecords/couchbaseDocument/_search -d '{
"_source": ["doc.org", "doc.dateModified"], <---- you just need to add this
"query": {
"match_all":{} <----- or whatever query you have
}
}'
That's easy. Considering any query of this format :
{
"query": {
...
},
}
You'll just need to add the fields field into your query which in your case will result in the following :
{
"query": {
...
},
"fields" : ["org","dateModified"]
}
{
"_source" : ["org","dateModified"],
"query": {
...
}
}
Check ElasticSearch source filtering.

Resources