How to read all data from druid datasource

How to read all data from druid datasource - hadoop

I am using below json to read all data from a druid datasource.
But in the request threshold field/value is mandatory.
It returns number of rows specified in threshold.
{
"queryType" : "select",
"dataSource" : "wikiticker",
"granularity" : "day",
"intervals" : [ "1000/3000" ],
"filter" :null,
"dimensions" : [ ],
"metrics" : [ ],
"descending" : "false",
"pagingSpec" : {
"threshold" : 10000,
"pagingIdentifiers" : null
},
"aggregations" : [ ]
}
Is there any way to retrieve all the data by setting the threshold to some value that returns all the data from datasource.
For eg:if intervals column is set to [ "1000/3000" ] gets data from all intervals.

The distributed nature of the system makes it hard to have an exact count of rows per interval of time, therefor the answer is no. Also keep in mind that select query will materialize all the rows in-memory so you might want to avoid pulling all the data at once and use pagination spec.

Related

Conditional indexing in metricbeat using Ingest node pipeline creates a datastream

I am trying to achieve conditional indexing for namespaces in elastic using ingest node pipelines. I used the below pipeline but the index getting created when I add the pipeline in metricbeat.yml is in form of datastreams.
PUT _ingest/pipeline/sample-pipeline
{
"processors": [
{
"set": {
"field": "_index",
"copy_from": "metricbeat-dev",
"if": "ctx.kubernetes?.namespace==\"dev\"",
"ignore_failure": true
}
}
]
}
Expected index name is metricbeat-dev but i am getting the value in _index as .ds-metricbeat-dev.
This works fine when I test with one document but when I implement it in yml file I get the index name starting with .ds- why is this happening?
update for the template :
{
"metricbeat" : {
"order" : 1,
"index_patterns" : [
"metricbeat-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "metricbeat",
"rollover_alias" : "metricbeat-metrics"
},

If you have data streams enabled in the index templates it has potential to create a datastream. This would depend upon how you configure the priority. If priority is not mentioned then it would create legacy index but if priority higher than 100 is mentioned in the index templates. Then this creates a data stream(legacy index has priority 100 so use priority value more than 100 if you want index in form of data stream).
If its create a data stream and its not expected please check if there is a template pointing to index you are writing where data stream is enabled! This was the reason in my case.
Have been working with this for few months and this is what I have observed!

How to store search data in redis db for caching to gain maximum performance improvement

We are implementing a online book store where the user can search books by filter,price,
author, sorting etc..Since we have million's of records in database(mysql) the search query
becomes slow. SO improve performance we are planning to implement caching mechanism using redis . BUt i am struggling to determine how to store the search data in redis db.
For example:-
If the search criteria is below:
{
searchFilter:
[
{
"field":"title",
"operator" : "LIKE",
"value" : "JAVA"
}
]
}
Then we will fetch data from from mysql db using above json. Then we are storing the data received in redis server.
In redis server we are using above json as "key" and all the books as value i.e
:
{searchFilter:[{"field":"title","operator" : "LIKE","value" : "JAVA"}]} : booksList
Now say some other user again fired a search request as below
{
searchFilter:
[
{
"field":"title",
"operator" : "LIKE",
"value" : "JAVA"
},
{
"field":"price",
"operator" : "GREATER_THAN",
"value" : "500"
},
{
"field":"price",
"operator" : "GREATER_THAN",
"value" : "500"
}
]
}
Then again we will fetch data from from mysql db using above json. Then we are storing the data received in redis server so that for next request with same search critria
we will get data from redis cache.
In redis server we are using above json as "key" and all the books as value i.e
:
{searchFilter:[{"field":"title","operator" : "LIKE","value" : "JAVA"},{"field":"price","operator" : "GREATER_THAN","value" : "500",},{"field":"price","operator" : "GREATER_THAN","value" : "500"}]} : booksList
So my question is using the json search criteria as key in redis is good idea?
If yes, then for every search request criteia having any small difference if we store the data in redis then caching server will consume lot of memeory.
So what is the ideal aproach to design our redis cache server to gain maximum performance ?

Cosmos DB Collection not using _id index when querying by _id?

I have a CosmosDb - MongoDb collection that I'm using purely as a key/value store for arbitrary data where the _id is the key for my collection.
When I run the query below:
globaldb:PRIMARY> db.FieldData.find({_id : new BinData(3, "xIAPpVWVkEaspHxRbLjaRA==")}).explain(true)
I get this result:
{
"_t" : "ExplainResponse",
"ok" : 1,
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "data.FieldData",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 106,
"totalKeysExamined" : 0,
"totalDocsExamined" : 3571,
"executionStages" : {
},
"allPlansExecution" : [ ]
},
"serverInfo" : #REMOVED#
}
Notice that the totalKeysExamined is 0 and the totalDocsExamined is 3571 and the query took over 106ms. If i run without .explain() it does find the document.
I would have expected this query to be lightning quick given that the _id field is automatically indexed as a unique primary key on the collection. As this collection grows in size, I only expect this problem to get worse.
I'm definitely not understanding something about the index and how it works here. Any help would be most appreciated.
Thanks!

How to create a multi-value tag metric gauge?

Already read this but with no lucky.
All examples I've found just show how to create a single value tag like this:
{
"name" : "jvm.gc.memory.allocated",
"measurements" : [ {
"statistic" : "COUNT",
"value" : 1.98180864E8
} ],
"availableTags" : [ {
"tag" : "stack",
"values" : [ "prod" ]
}, {
"tag" : "region",
"values" : [ "us-east-1" ]
} ]
}
But I need to create a multi value tag like this:
availableTags: [
{
tag: "method",
values: [
"POST",
"GET"
]
},
My code so far:
List<Tag> tags = new ArrayList<Tag>();
tags.add( Tag.of("test", "John") );
tags.add( Tag.of("test", "Doo") );
tags.add( Tag.of("test", "Foo Bar") );
Metrics.gauge("my.metric", tags, new AtomicLong(3) );
As you can see I think I can just repeat the key but this is not the case and the second parameter of Tag.of is a String and not a String Array.

I don't think this was the real intent of authors of these metering libraries - to provide a multi-value tag for a metric.
The whole point of metrics tags is to provide a "discriminator" - something that can be used later to retrieve metrics whose tag has a specific, single, value.
Usually, this value is used in metrics storage systems, like Prometheus, DataDog, InfluxDB and so on. And above this Grafana can incorporate a single tag value in its queries.
The only possible use case of such a request that I see is that it will be possible to see the metrics value in an actuator in a kind of more convenient way, but again it's not the main point of the whole capability here, so, bottom line I doubt that its possible at all.

Not getting incremental data with jdbc importer from sql to elastic search

As per jdbc importer :
It is recommended to use timestamps in UTC for synchronization. This example fetches all product rows which has added since the last run, using a millisecond resolution column mytimestamp:
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "",
"password" : "",
"sql" : [
{
"statement" : "select * from \"products\" where \"mytimestamp\" > ?",
"parameter" : [ "$metrics.lastexecutionstart" ]
}
],
"index" : "my_jdbc_index",
"type" : "my_jdbc_type"
}
}
I want to input data incrementally based on a column modified data whose format is 2015-08-20 14:52:09 also i use a scheduler which runs every minute . I tried with the value of sql key as
"statement" : "select * from \"products\" where \"modifiedDate\" > ?",
But data was not loaded.
Am I missing out something ?

the format of lastexecutionstart like this "2016-03-27T06:37:09.165Z".
it contain 'T' and 'Z' . So that is why your data was not loaded.
If you want to know more.
here is link
https://github.com/jprante/elasticsearch-jdbc

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to read all data from druid datasource - hadoop

The distributed nature of the system makes it hard to have an exact count of rows per interval of time, therefor the answer is no. Also keep in mind that select query will materialize all the rows in-memory so you might want to avoid pulling all the data at once and use pagination spec.

Related

Conditional indexing in metricbeat using Ingest node pipeline creates a datastream

How to store search data in redis db for caching to gain maximum performance improvement

Cosmos DB Collection not using _id index when querying by _id?

How to create a multi-value tag metric gauge?

Not getting incremental data with jdbc importer from sql to elastic search

Categories

Resources