logstash output to kafka - set partition key - logstash-configuration

I am new to logstash.
I am trying to read json data from files, and send it to kafka. the json i am reading contains keys for the topic, partition and the actual message.
I cant find how to set the partition key.
input {
file {
path => "/data/files/*.*"
start_position => "beginning"
codec => "json"
}
}
filter {
json {
source => message
}
}
output {
kafka {
bootstrap_servers => "localhost:9092"
topic_id => "%{topic}"
message_key => "%{dataAsString}"
}
}
Help please...
Regards, ido

AFAIK, you can't set the partition number from Logstash. All you have is the message_key which logstash kafka producer uses to select the partition, please check the below Kafka DefaultPartitioner.scala.
package kafka.producer
private[kafka] class DefaultPartitioner[T] extends Partitioner[T] {
private val random = new java.util.Random
def partition(key: T, numPartitions: Int): Int = {
if(key == null)
random.nextInt(numPartitions)
else
key.hashCode % numPartitions
}
}
As you can see if you don't give the key, then a random partition is selected. If you specify a key, then the key is hashed and moded to select between the available partitions.
To achieve what you asked for, you have to write a class like this, change logstash plugin to let you specify this class and within that plugin select the partition number.
Apache Flume lets you set the default partitioner class but I cant see a similar attirbute in logstash kafka output plugin.

Related

Not able to read data from elasticsearch using alpakka

I was trying to read document(json data stored in ES) from elasticsearch using alpakka.
I got this alpakka-Elasticsearch.
Here it says that you can stream messages from or to Elasticsearch using the
ElasticsearchSource, ElasticsearchFlow or the ElasticsearchSink.
I tried to impliment ElasticsearchSource method. So my code looks like this
val url = "http://localhost:9200"
val connectionSettings = ElasticsearchConnectionSettings(url)
val sourceSettings = ElasticsearchSourceSettings(connectionSettings)
val elasticsearchParamsV7 = ElasticsearchParams.V7("category_index")
val copy = ElasticsearchSource
.typed[CategoryData](
elasticsearchParamsV7,
query = query,
sourceSettings
).map { message: ReadResult[CategoryData] =>
println("Inside message==================> "+message)
WriteMessage.createIndexMessage(message.id, message.source)
} .runWith(
ElasticsearchSink.create[CategoryData](
elasticsearchParamsV7,ElasticsearchWriteSettings(connectionSettings)
)
)
println("Final data==============>. "+copy)
At the end, copy value returning Future[Done].
But I was not able to read data from ES.
Is there Something I missing?
And also is there any other way using akka http client api to do the same?
What is preferred way to use ES in akka?
To read data from Elasticsearch, sth like this should be enough:
val matchAllQuery = """{"match_all": {}}"""
val result = ElasticsearchSource
.typed[CategoryData](
elasticsearchParamsV7,
query = matchAllQuery,
sourceSettings
).map { message: ReadResult[CategoryData] =>
println("Read message==================> "+message)
}.runWith(Sink.seq)
result.onComplete(res => res.foreach(col => println(s"Read: ${col.size} records")))
If the type CategoryData does not match correctly to what is stored in the index, the query may not return results.
If in doubt, it's possible to read raw JSON:
val elasticsearchSourceRaw = ElasticsearchSource
.create(
elasticsearchParamsV7,
query = matchAllQuery,
settings = sourceSettings
)

How to expose graphql field with different name

I am exploring GraphQL and would like to know if there is any way of renaming the response field for example i have a POJO with these field
class POJO {
Long id;
String name;
}
GraphQL query:
type POJO {
id: Long
name: String
}
My response is something like this
{
"POJO" {
"id": 123,
"name": "abc"
}
}
Can i rename the name field to something like userName so that my response is below
{
"POJO" {
"id": 123,
"userName": "abc"
}
}
You can use GraphQL Aliases to modify individual keys in the JSON response.
If this is your original query
query {
POJO {
id
name
}
}
you can introduce a GraphQL alias userName for the field name like so:
query {
POJO {
id
userName: name
}
}
You can also use GraphQL aliases to use the same query or mutation field multiple times in the same GraphQL operation. This get's especially interesting when using field parameters:
query {
first: POJO(first: 1) {
id
name
}
second: POJO(first: 1, skip: 1) {
id
name
}
}
The question is: how are you creating the schema in the first place? There's no intrinsic connection between Java and GraphQL types - they are completely unrelated unless you correlate them. So you can name the fields any way you want in the schema, and make a resolver (DataFetcher) that gets the value from anywhere (thus any POJO field too).
If you're using a tool to generate the schema from Java types (graphql-java-annotations, graphql-spqr etc), then use that tool's facilities to drive the mapping. Both the mentioned tools allow customizing the mapping via annotations. GraphQL-SPQR enables the same via external configuration as well.
If you clarify your question further, I'll be able to give a more precise answer.
Looks like GraphQLName annotation can help.
Example from documentation :
"Additionally, #GraphQLName can be used to override field name. You can use #GraphQLDescription to set a description."
These can also be used for field parameters:
public String field(#GraphQLName("val") String value) {
return value;
}
I know this question is very old but following code is used for renaming the field:
public class ProductReviewType: ObjectGraphType<ProductReview>
{
public ProductReviewType()
{
Field(x => x.ProductReviewId, type: typeof(IdGraphType)).Description("some desc here");
Field(x => x.ProductId).Description("some desc here");
Field("reviewername", x => x.ReviewerName).Description("some desc here");
Field("reviewdate",x => x.ReviewDate).Description("some desc here");
Field("emailaddress", x => x.EmailAddress).Description("some desc here");
Field("rating", x => x.Rating).Description("some desc here");
Field("comments",x => x.Comments).Description("some desc here");
Field("modifieddate", x => x.ModifiedDate).Description("some desc here");
}
}
In the above code, modifieddate would be the field name for property "ModifiedDate".

Using event field as hash variable

I'm receving events in Logstash containing measurement, values and tags. I do not know ahead of time what field there are and what tags. So i wanted to do something like this:
input {
http {}
}
filter {
ruby {
code => '
tags = event.get("stats_tags").split(",")
samples = event.get("stats_samples").split(" ")
datapoints = {}
samples.each {|s|
splat = s.split(" ")
datapoints[splat[0]] = splat[1]
}
event.set("[#metadata][stats-send-as-tags]", tags)
event.set("[#metadata][stats-datapoints]", datapoints)
'
}
}
output {
influxdb {
host => "influxdb"
db => "events_db"
measurement => measurement
send_as_tags => [#metadata][stats-send-as-tags]
data_points => [#metadata][stats-datapoints]
}
}
But this produce error. After much googling to no avail i'm starting to think this is imposible.
Is there a way to pass hash and array from event field to output/filter configuration?
EDIT: If i doublequote it, the error i'm getting is
output {
influxdb {
# This setting must be a hash
# This field must contain an even number of items, got 1
data_points => "[#metadata][stats-datapoints]"
...
}
}

How to index an object with completion fields

Following http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
How can I index/insert (I can do mapping) an object using Nest client library to be able to provide following options:
"input": ...,
"output": ...,
"payload" : ...,
"weight" : ...
I would like to be able to provide multiple values in 'input' option.
Can't find anyway of doing this using NEST.
Thank you
NEST provides the SuggestField type in order to assist in indexing completion suggestions. You don't necessarily need to use this type, you can provide your own that contains the expected completion fields (input, output, etc...), but the purpose of SuggestField is to make the whole process easier by already providing a baked in type.
Usage:
Add a suggest field to the document/type you are indexing:
public class MyType
{
public SuggestField Suggest { get; set; }
}
Your mapping should look something like:
client.Map<MyType>(m => m
.Properties(ps => ps
.Completion(c => c.Name(x => x.Suggest).Payloads(true))
)
);
Indexing example:
var myType = new MyType
{
Suggest = new SuggestField
{
Input = new [] { "Nevermind", "Nirvana" },
Output = "Nirvana - Nevermind",
Payload = new { id = 1234 },
Weight = 34
}
};
client.Index<MyType>(myType);
Hope that helps.

Duplicate data in kibana from logstash

I'm trying display some Mongo data that I've been collecting using logstash using the Mongostat tool. It displays things with a suffix like "b", "k", "g" to signify byte, kilobyte, gigabyte, which is fine if I'm just reading the output, but I want to throw this into kibana and display it in a graphical format to see trends.
I've done this with several other log files and everything is fine. When I use a grok filter everything is fine but I've added a Ruby filter and now data seems to be duplicated in all fields other than the logstash generated fields and my new field created in my Ruby filter.
Here is the relevant parts of my conf file:
input {
file {
path => "/var/log/mongodb/mongostat.log"
type => "mongostat"
start_position => "end"
}
}
filter {
if [type] == "mongostat" {
grok {
patterns_dir => "/opt/logstash/patterns"
match => ["message","###a bunch of filtering that i know works###"]
add_tag => "mongostat"
}
if [mongoMappedQualifier] == 'b' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f / 1024"
}
}
if [mongoMappedQualifier] == 'k' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f * 1"
}
}
if [mongoMappedQualifier] == 'm' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f * 1024"
}
}
if [mongoMappedQualifier] == 'g' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f * 1048576"
}
}
}
}
output {
if [type] == "mongostat" {
redis {
host => "redis"
data_type => "list"
key => "logstash-mongostat"
}
}
}
Any idea why or how this can be fixed?

Resources