Whitespace in statsd metric name - statsd

I need to pass whitespace as part of metric name when pushing data into a statsd collector. The specification doesn't mention if this is allowed, but inserting values with whitespace isn't working.
ref locale:23.5|g
https://github.com/b/statsd_spec
Is there a workaround such as special character escaping?

Where are you sending your statsd metrics to? Statsd might allow spaces in metric names, but I believe Graphite doesn't.
See https://answers.launchpad.net/graphite/+question/171766:
These metrics are stored on disk as the name provided. It's a core
part of the current Whisper storage system.
And http://mingbowan.blogspot.com.es/2012/08/enable-special-character-support-in.html:
Graphite doesn’t support special characters like “ “ (empty space),
“/” slash etc. Because it expect everything to be just ASCII to
split/processing them, and then make directories based on metric name.
I'm sorry I couldn't find an authoritative source.
I think your best bet would be to replace spaces in your metrics by some other character like "_".

Related

Artemis ActiveMQ broker.xml wildcard-addresses configuration reference

I'm currently configuring an Artemis ActiveMQ Broker and need to change the default ''wildcard-addresses''. So I found the ''wildcard-addresses'' tag, but I didn't find the information I need, so I have two questions:
I want to set routing-enabled to true, but just for the tag ''any-words'' and disable the ''single-word'' tag (or just want to know if this is even possible).
I didn't find the answer in the official documentation, so I'm wondering if someone found a good reference which explains the different tags for the ''wildcard-addresses'' configuration, which is in the style of the ''Configuration Reference'', but includes a section about ''wildcard-addresses''.
What I've found so far but does not satisfy me:
Wildcard-syntax
Routing Messages with wildcards
Configuration reference
Thanks in advance,
Alex
There is no way to disable certain match types (i.e. single word or any words), and it's not clear why one would want to.
The wildcard-addresses block is for enabling/disabling wild-card routing and for customizing the wild-card syntax.
Here's the basics (as described in the documentation):
A wildcard expression contains words delimited by the character defined by delimiter (i.e. . by default).
The character defined by any-words (i.e. # by default) means "match any sequence of zero or more words."
The character defined by single-word (i.e. * by default) means "match a single word."

tarantool java connector & space ids

Tarantool java connector provides API to select/update/insert/delete/... tuples in spaces. The first argument in these API methods is a space ID. There is no documentation for this API and I don't clearly undestand how to get these IDs.
The sample code from github gets IDs evaluating box.space.<space>.id - not using API but directly "writing" command into socket... It seems this is not a good approach (?).
As I see system spaces _space/_vspace have constant IDs = 280/281. Is it good approach to use these constants to select spaces IDs?
UPD: I found constant _VSPACE = 281 in the class SQLDatabaseMetadata. It's used in Tarantool JDBC driver. It's protected.
You are right. You need to fetch space id-name mapping from _VSPACE first and then use these values to perform requests against certain spaces. Or you can lean on the fact that a first user-defined space has id 512, then next one 513, etc.
We plan to support automatic schema loading and space names, but don't support it yet: https://github.com/tarantool/tarantool-java/issues/137

I have an AWS Elasticsearch instance that I want to change the delimiter used when tokenizing

I'm currently using Jest to communicate with an AWS Elasticsearch instance running Elasticsearch 5.3. One of the fields is a URL, but I don't think a single period without following white space is considered a delimiter by default when Elasticsearch tokenizes. Therefore, I can't search for "www.google.com" with "google," for example.
I'd really like to be able to add a single period to the delimiter pattern. I've seen documentation on Elasticsearch's website about how to alter the delimiter when using Elasticsearch natively, but I haven't seen anyone change it through Jest. Is this possible, and if so, how would I go about doing so?
I'd like to configure it using some client in a Java application if possible.
I believe that a pattern Tokenizer could help. See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html
Or a char filter where you could replace a dot by a space? See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html

How to read data in logs using logstash?

I have just started log stash, i have log files in that log file whole object is printed in the logs, Since my object is huge i cant write the grok patterns to the whole object and also i expecting only two values out of those object. Can you please let us know how can i get that?
my logs files looks like below
2015-06-10 13:02:57,903 your done OBJ[name:test;loc:blr;country:india,acc:test#abe.com]
This is just an example my object has lot attributes in int , in those object i need to get only name and acc.
Regards
Mohan.
You can use the following pattern for the same
%{GREEDYDATA}\[name:%{WORD:name};%{GREEDYDATA},acc:%{NOTSPACE:account}\]
GREEDYDATA us defined as follows -
GREEDYDATA .*
The key lie in understanding greedydata macro.
It eats up all possible characters as possible.
Logstash patterns don't have to match the entire line. You could also pull the leading information off (date, time, etc) in one grok{} and then use a different grok{} to pull off just the two fields that you want.

How to configure standard tokenizer in elasticsearch

I have a multi language data set and a Standard analyzer that takes care of the tokenizing for this data set very nicely. The only bad part is that it removes the special characters like #, #, :, etc.. Is there any way that I can use the standard tokenizer and still be able to search on the special characters?
I have already looked into combo analyzer plugin which did not work as I had hoped.Apparently the combination of analyzers do not work in a chain like the token filters. They work independently which is not useful for me.
Also I looked into the char mapping filter in order to process the data before tokenizing it, but it does not work like the word delimiter token filter where we can specify "type_table" to convert a special character into an ALPHANUM. It just maps one word to another word. As a result I won't be able to search on the special characters.
Also, I have looked into the pattern analyzers, which would work for the special characters but they are not recommended for a multi language data set.
Can anybody point me in the right direction in order to solve this problem?
Thanks in advance!

Resources