Where does logstash's elasticsearch index date come from? - elasticsearch

From the docs:
Default value is "logstash-%{+YYYY.MM.dd}"
I'm wondering where logstash gets the information for YYYY.MM.dd? Is it the #timestamp field? And if so, can it be told to use a different field (#mydate, for example)?

The index YYYY.MM.dd is according to the #timestamp time.
You can refer to elasticsearch.rb about the 'event.sprintf' to print the logstash index.
index = event.sprintf(#index)
and then you can study event.rb to see what the sprintf do.
t = #data["#timestamp"]
formatter = org.joda.time.format.DateTimeFormat.forPattern(key[1 .. -1])\
.withZone(org.joda.time.DateTimeZone::UTC)
#next org.joda.time.Instant.new(t.tv_sec * 1000 + t.tv_usec / 1000).toDateTime.toString(formatter)
# Invoke a specific Instant constructor to avoid this warning in JRuby
# > ambiguous Java methods found, using org.joda.time.Instant(long)
org.joda.time.Instant.java_class.constructor(Java::long).new_instance(
t.tv_sec * 1000 + t.tv_usec / 1000
).to_java.toDateTime.toString(formatter)
So, If you want the index follow your own field, you have to modify the event.rb to use your own field instead of timestamp. Or you can change the timestamp value to your own field time.

Related

Rails compare same object's 1 field with another + addition of string in Active Record

I've two string fields which contains dates in string like field_1 = "2003.11.14" and I use them in ORM and they are working just fine. Now I want to compare 1 field value with another field's - 18.months. Here is a example
User.where("users.field_1 > '#{Date.today - 18.months}' AND users.field_2 > (users.fields_1 - 18.months)")
something like. Can anyone help me?
Thanks in advance
Most databases support data calculations in SQL. Something like this should work.
query = User.where("users.field_1 > ?", 18.months.ago)
query.where("users.field_2 > users.field_1 - :time", time: 18.months.ago)
edit: Just saw that the values are stored as strings, then you can not use SQL.
can not do that because the table has millions of records
I don't really understand why the size of the table limits to use the correct data type?

Elasticsearch 5 - Field distinct values without aggregation

I'm working with a time-based index storing syslog events.
All the data is coming from different sources (PCs).
Suppose I have this kind of events:
timestamp = 0
source = PC-1
event = event_type_1
timestamp = 1
source = PC-1
event = event_type_1
timestamp = 1
source = PC-2
event = event_type_1
I want to make a query that will retrieve all the distinct value of "source" field for documents where match event = event_type_1
I am expecting to have all exact values (no approximations).
To achieve it I have written a cardinality query with an aggregation specifying the correct size, because I have no prior knowledge of the number of distinct sources. I think this is a expensive work to do as it consumes a lot of memory.
Is there any other alternative to get this done?

NEST (2.x) Date Histogram Aggregation with fractional interval values

I am using NEST (2.3.3) object initializer syntax for creating Date Histogram Aggregation. How can I specify the Fractional values for the Interval?
DateHistogramAggregation dateHistogram =
new DateHistogramAggregation("dateHistogram")
{
Field = "TimestampFieldName",
Interval = DateInterval.Hour
}
In the above data histogram aggregation I want to specify for example 1.5 hours. Is there a way I can do that?
Interval is a Union<DateInterval, Time> which means that it can take either a DateInterval enum value or a Time instance. Additionally, a string has an implicit conversion to an instance of Time. Putting these together, to set an interval of 1.5 hours would be
DateHistogramAggregation dateHistogram =
new DateHistogramAggregation("dateHistogram")
{
Field = "TimestampFieldName",
Interval = new Time("1.5h")
};
In this case, we can't take advantage of the implicit conversion from string to Time (and then Time to Union<DateInterval,Time>) because there is no implicit conversion from string to Union<DateInterval, Time>. In this case, we can just use the Time constructor and pass it a string value for 1.5 hours, and assign this instance of Time to the interval.

Searching without duplication - aggregations and tophit

I am beginning with ElasticSearch and really like it, hovewer I am stuck with quite simple scenario.
I am indexing such structure of a Worker:
NAME SURENAME ID AGE SEX NAME_SURENAME BIRTH_DATE
NAME_SURENAME - not analyzed - this field is indexed for grouping purposes
NAME, SURENAME - analyzed
The task is simple - search 5 unique workers sorted by birth_date (unique means the same name and surename, even if they are in different age and are different people)
I read about aggregation queries and as I understand, I can get only aggregations without documents. Unfortunatelly I aggregate by name and surename so I won't have other fields in results in buckets, like for example document ID field at least. But I also read about TopHit aggregation, that it returns document, and i tried it - the second idea below.
I have two ideas
1) Not use aggregations, just search 5 workers, filter duplicates in java and again search workers and filter duplicates in Java till I reach 5 unique results
2) Use aggregations. I event tried it like below, it even works on test data but since it is my first time, please advice, whether it works accidentially or it is done correctly? So generally I thought I could get 5 buckets with one TopHit document. I have no idea how TopHit document is chosen but it seems to work. Below is the code
String searchString = "test";
BoolQueryBuilder query = boolQuery().minimumNumberShouldMatch(1).should(matchQuery("name", searchString).should(matchQuery("surename", searchString));
TermsBuilder terms = AggregationBuilders.terms("namesAgg").size(5);
terms.field("name_surename");
terms.order(Terms.Order.aggregation("birthAgg", false)).subAggregation(AggregationBuilders.max("birthAgg")
.field("birth_date")
.subAggregation(AggregationBuilders.topHits("topHit").setSize(1).addSort("birth_date", SortOrder.DESC));
SearchRequestBuilder searchRequestBuilder = client.prepareSearch("workers")
.addAggregation(terms).setQuery(query).setSize(1).addSort(SortBuilders.fieldSort("birth_date")
.order(SortOrder.DESC));
Terms aggregations = searchRequestBuilder.execute().actionGet().getAggregations().get("namesAgg");
List<Worker> results = new ArrayList<>();
for (Terms.Bucket bucket : aggregations.getBuckets()) {
Optional<Aggregation> first = bucket.getAggregations().asList().stream().filter(aggregation -> aggregation instanceof TopHits).findFirst();
SearchHit searchHitFields = ((TopHits) first.get()).getHits().getHits()[0];
Transformer<SearchHit, Worker> transformer = transformers.get(Worker.class);
Worker transform = transformer.transform(searchHitFields);
results.add(transform);
}
return results;//

Retrieving documents in order they were inserted

I am wanting to know how to create an index in rethinkdb that will return rows in the order they were added, to use it as a kind of log.
You will want to set a datetime field of some sort in your documents like so:
# Shorthand for table
test = r.db("test").table("test")
# Create index
test.createIndex("datetime", r.row("datetime"))
# Insert document with datetime field
test.insert({
datetime: r.now(),
})
# To get all documents in sorted order
test.order_by(index="datetime")
# To get documents after a certain point
test.between(<some sort of datetime object>, r.maxval, index="datetime")
https://www.rethinkdb.com/api/python/order_by/
https://rethinkdb.com/api/python/between/

Resources