Injecting structured json logs into journald

Injecting structured json logs into journald - systemd

I get the impression that there might be a way to write to the systemd journal, json data directly without first converting it to the format the sd_journal* functions expect. Is this possible or not?
My suspicion is because of some comments about an inbuilt json parser. However the man pages suggest otherwise.
Also, I note that if your write to stdout in the format
<priority> message
The priority will end up in the PRIORITY="priority" field and message will end up in MESSAGE="message" field. But can other structured field data be input?
Note: The man pages do not talk about the last feature I mention. So I wouldn't be surprised if they are slightly out of date which is why I am asking.

journald doesn't accept arbitary JSON. Just Key/Value pairs. So it's not possible to send nested data structures. You can send data directly via the Unix Domain socket:
echo -e "MESSAGE=Hello\nFOO=BAR\nMY_ID=12345\n" |socat - UNIX-SENDTO:/run/systemd/journal/socket
results in:
{
"__CURSOR" : "s=46dc1bd66d0e4a48a6809e45228511e2;i=84cc;b=fd9144999d6846c8827d58f56c2635db;m=850161136;t=55669a307fdd6;x=887a021a37840789",
"__REALTIME_TIMESTAMP" : "1502386590318038",
"__MONOTONIC_TIMESTAMP" : "35703361846",
"_BOOT_ID" : "fd9144999d6846c8827d58f56c2635db",
"_TRANSPORT" : "journal",
"_UID" : "1001",
"_GID" : "1001",
"_CAP_EFFECTIVE" : "0",
"_SYSTEMD_OWNER_UID" : "1001",
"_SYSTEMD_SLICE" : "user-1001.slice",
"_SYSTEMD_USER_SLICE" : "-.slice",
"_MACHINE_ID" : "6e7b40640bf6473189165f19f8be2536",
"_HOSTNAME" : "samson",
"_SYSTEMD_UNIT" : "user#1001.service",
"_SYSTEMD_INVOCATION_ID" : "e5ed32fbb1004545b1ddf73a0d928d87",
"_SYSTEMD_CGROUP" : "/user.slice/user-1001.slice/user#1001.service/gnome-terminal-server.service",
"_SYSTEMD_USER_UNIT" : "gnome-terminal-server.service",
"_COMM" : "socat",
"_EXE" : "/usr/bin/socat",
"_CMDLINE" : "socat - UNIX-SENDTO:/run/systemd/journal/socket",
"FOO" : "BAR",
"MESSAGE" : "Hello",
"MY_ID" : "12345",
"_PID" : "19868",
"_SOURCE_REALTIME_TIMESTAMP" : "1502386590317991"
}

Related

In nightwatch, how do I specify additional string arguments after the selenium_port

came across a similar question here which wasn't truly addressed - https://github.com/nightwatchjs/nightwatch/issues/1911
You cannot do what #beatfactor suggested with the above example, the port is in the middle i.e. "selenium_host" : "us1.appium.testobject.com:443/wd/hub",
I'm facing a similar problem right now, how do I provide arguments so it attempts to hit a host like the above? Currently, my failing options are providing no port which defaults to 4444 or providing a port which results in attempting to hit us1.appium.testobject.com/wd/hub:443
The desired result is :
"selenium_host" : "us1.appium.testobject.com:443/wd/hub",
TLDR - How do you provide a port in the middle of your selenium host argument given the port is always appended to the end and if you don't provide one, a default is used?

Just define your selenium_port upstream, in the declaration section and use a Template Literal:
const selenium_port = '443';
"test_settings" : {
"default" : {
"launch_url" : "http://test.com",
"selenium_port" : selenium_port
"selenium_host" : `us1.appium.testobject.com:${selenium_port}/wd/hub`,
"silent" : true,
"screenshots" : {
"enabled" : true,
"path" : "screenshots"
}
},
Hope I understood correctly. Cheers!

HDFS Visulization of block distribution

I'm trying to create a visulaization of the HDFS block distribution of a cluster.
I plan to create this using Tableau but was wondering what type of visualizations would be able to give you an idea of what nodes need re-balancing, and also an efficient way to get the server log data into tableau?

Before investing too much time in this, you might want to take a look at Twitter's open source HDFS-DU project. This provides a view of utilization based on paths within the file system rather than DataNodes within the cluster, but perhaps that's still helpful for your requirements.
If the goal is just to identify nodes in need of rebalancing, then this information is already accessible on the NameNode web UI "Datanodes" tab. You could also run hdfs dfsadmin -report to get utilization stats for each node in a script.
If none of the above meets your requirements, and you need to proceed with integrating the information into an external reporting tool like Tableau, then a helpful integration point might be the JMX metrics exposed via HTTP on the NameNode. See below for an example curl command that queries some of this information from the NameNode. Note in particular the LiveNodes section, which contains capacity information about each DataNode.
Some additional information about these metrics is available in the Apache Hadoop Metrics documentation.
> curl 'http://127.0.0.1:9870/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeInfo",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
"Threads" : 46,
"Version" : "3.0.0-alpha2-SNAPSHOT, rdf497b3a739714c567c9c2322608f0659da20cc4",
"Used" : 5263360,
"Free" : 884636377088,
"Safemode" : "",
"NonDfsUsedSpace" : 114431086592,
"PercentUsed" : 5.266863E-4,
"BlockPoolUsedSpace" : 5263360,
"PercentBlockPoolUsed" : 5.266863E-4,
"PercentRemaining" : 88.52252,
"CacheCapacity" : 0,
"CacheUsed" : 0,
"TotalBlocks" : 50,
"NumberOfMissingBlocks" : 0,
"NumberOfMissingBlocksWithReplicationFactorOne" : 0,
"LiveNodes" : "{\"192.168.0.117:9866\":{\"infoAddr\":\"127.0.0.1:9864\",\"infoSecureAddr\":\"127.0.0.1:0\",\"xferaddr\":\"127.0.0.1:9866\",\"lastContact\":2,\"usedSpace\":5263360,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":114431086592,\"capacity\":999334871040,\"numBlocks\":50,\"version\":\"3.0.0-alpha2-SNAPSHOT\",\"used\":5263360,\"remaining\":884636377088,\"blockScheduled\":0,\"blockPoolUsed\":5263360,\"blockPoolUsedPercent\":5.266863E-4,\"volfails\":0}}",
"DeadNodes" : "{}",
"DecomNodes" : "{}",
"BlockPoolId" : "BP-1429209999-10.195.15.240-1484933797029",
"NameDirStatuses" : "{\"active\":{\"/Users/naurc001/hadoop-deploy-trunk/data/dfs/name\":\"IMAGE_AND_EDITS\"},\"failed\":{}}",
"NodeUsage" : "{\"nodeUsage\":{\"min\":\"0.00%\",\"median\":\"0.00%\",\"max\":\"0.00%\",\"stdDev\":\"0.00%\"}}",
"NameJournalStatus" : "[{\"manager\":\"FileJournalManager(root=/Users/naurc001/hadoop-deploy-trunk/data/dfs/name)\",\"stream\":\"EditLogFileOutputStream(/Users/naurc001/hadoop-deploy-trunk/data/dfs/name/current/edits_inprogress_0000000000000000862)\",\"disabled\":\"false\",\"required\":\"false\"}]",
"JournalTransactionInfo" : "{\"MostRecentCheckpointTxId\":\"861\",\"LastAppliedOrWrittenTxId\":\"862\"}",
"NNStartedTimeInMillis" : 1485715900031,
"CompileInfo" : "2017-01-03T21:06Z by naurc001 from trunk",
"CorruptFiles" : "[]",
"NumberOfSnapshottableDirs" : 0,
"DistinctVersionCount" : 1,
"DistinctVersions" : [ {
"key" : "3.0.0-alpha2-SNAPSHOT",
"value" : 1
} ],
"SoftwareVersion" : "3.0.0-alpha2-SNAPSHOT",
"NameDirSize" : "{\"/Users/naurc001/hadoop-deploy-trunk/data/dfs/name\":2112351}",
"RollingUpgradeStatus" : null,
"ClusterId" : "CID-4526ea43-52e6-4b3f-9ddf-5fd4412e322e",
"UpgradeFinalized" : true,
"Total" : 999334871040
} ]
}

Elasticsearch indexing is very slow

I have a Titan database with Cassandra storage backend, and I am trying to create a mixed index based on two property keys.
I am able to register the Index using following commands:
graph=TitanFactory.open(config);
graph.tx().rollback()
m = graph.openManagement();
m.buildIndex("titleBodyMixed", Vertex.class).addKey(m.getPropertyKey("title")).addKey(m.getPropertyKey("body")).buildMixedIndex("search");
m.commit();
m.awaitGraphIndexStatus(graph, 'titleBodyMixed').status(SchemaStatus.REGISTERED).timeout(3, java.time.temporal.ChronoUnit.MINUTES).call();
And when I am checking, the Index is successfully registered after a few seconds. At next step, I try to reindex the database using the following commands:
m = graph.openManagement();
m.updateIndex(m.getGraphIndex('titleBodyMixed'), SchemaAction.REINDEX).get();
However, updateIndex command is not finishing, (After 12 hours).
I have about 300k data entry in the database and each data entry has one Title and one Body to index.
My question is that how can I speed up the indexing?
When I am using top command I see that my CPU is not saturated by indexing processes:
My Titan config file is as bellow:
config =new BaseConfiguration();
config.setProperty("storage.backend","cassandra");
config.setProperty("storage.hostname", "127.0.0.1");
config.setProperty("storage.cassandra.keyspace", "smartgraph");
config.setProperty("index.search.elasticsearch.interface", "NODE");
config.setProperty("index.search.backend", "elasticsearch");
The following is showing elasticsearch service properties:
curl -X GET 'http://localhost:9200'
{
"status" : 200,
"name" : "Ms. Marvel",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.2",
"build_hash" : "e43676b1385b8125d647f593f7202acbd816e8ec",
"build_timestamp" : "2015-09-14T09:49:53Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

The idea is, the index reindexing process will not start unless all sessions are closed. You most probably have sessions open with the database. Therefore, the reindex job is never triggered.
With this Gremlin script, you could close all sessions. You should see that the indexing will take place afterwards.
Will that help?

How to hndle dynamic values coming as string in jmeter

I am using jmeter for testing the performance of a mobile appication which is using IBM Worklight. I am getting 3 dynamic values which comes as a string and i need to handle these values. i trie reular expression extractor but it didnt work. Can any one help me to find out a solution. Dyanamic values are
["{\"jsessionid\":\"0000Mhn7GqWMU1P7Xi9dpJ7mgFb\",\"mbparam\":\"ZjurDsggbg9CZBgd5miAIHMIH%2B5oC7XdSukctItof7AJnpe8UNhlBsgM%2F8w%3D\",\"MP-AUTH-TOKEN\":\"leXozMVUXFYixuYwxgV58EXuRg1Vd0xtpZeouAMQtk6Pd0I1D618motg\"}"]
Updated
I tried the regular expression you provided but it's also not working.
These are the steps i have performed. Please guide me if i have done anything wrong.
Updated
This is the response i am getting is
{
: "customerName":"abc",
: "homeEmail":"",
: "profileDebitAcc":"01234567",
: "sessKey":"0000V3EgdxpY937GTWQ3yogRLGq",
: "mbParam":"hDurAxWHjPT%2BtB7xEyz7Huu51oDOAH8gyNSWIBnHmA9UWuF0lcHGiOy82S0%3D",
: "responseHeaders":
: {
: : "Content-Language":"en-AU",
: : "Date":"Thu, 12 Nov 2015 05:59:50 GMT",
: : "Content-Length":"6759",
: : "Expires":"0",
: : "Content-Type":"text/html; charset=ISO-8859-1",
: : "X-Powered-By":"Servlet/3.0",
: : "Cache-Control":"no-cache",
: : "Pragma":"no-cache"
: },
: "AuthToken":"AHWXZlUt6Rupm1FeBWGu2TEVHZemZwVGbmwmpVxXJR7TMhCA8pWN96ae",
: "statusCode":200,
:
I need to extract sesskey, mbParam and AuthToken values and send them as list in the next request body.
In the request these values are displayed as
["{\"jsessionid\":\"0000gPQCV4FJ1NQvB8d4Ifd_P9I\",\"mbparam\":\"hDu7DhU%2FjA81TEjwbREmytgqIItmUS4b6rhEojYtcalv0PUs6iaewmtZu6U%3D\",\"MP-AUTH-TOKEN\":\"4fU7Bg20sRRUikHnzmZKcC4ZPyCjVxJnmm7QMnSm6mfT7GlqnySQS2YP\"}"]
How to handle these values?

Use the following Regular Expression Extractor configuration:
Reference Name: anything meaningful, i.e. dynamicvalues
Regular Expresssion:
\["\{\\"jsessionid\\":\\"(.+?)\\",\\"mbparam\\":\\"(.+?)\\",\\"MP-AUTH-TOKEN\\":\\"(.+?)\\"\}"\]
Template: $1$$2$$3$
Refer Extracted values as:
${dynamicvalues_g1} - for jsessionid
${dynamicvalues_g2} - for mbparam
${dynamicvalues_g3} - for MP-AUTH-TOKEN
While developing your regular expression remember that you need to escape the following characters with a backslash:
[
{
\
}
]
Other special characters which need escaping are: .^$*+?()|
References:
Regular Expressions page of JMeter's User Manual
PCRE Regex Cheatsheet
Using RegEx (Regular Expression Extractor) with JMeter
jMeter - Regular Expressions

AWS Datapipeline, EmrActivity step to run a hive script fails immediately with 'No such file or directory'

I've got a simple DataPipeline job which has only a single an EmrActivity with a single step attempting to execute a hive script from my s3 bucket.
The config for the EmrActivity looks like this:
{
"name" : "Extract and Transform",
"id" : "HiveActivity",
"type" : "EmrActivity",
"runsOn" : { "ref" : "EmrCluster" },
"step" : ["command-runner.jar,/usr/share/aws/emr/scripts/hive-script --run-hive-script --args -f s3://[bucket-name-removed]/s1-tracer-hql.q -d INPUT=s3://[bucket-name-removed] -d OUTPUT=s3://[bucket-name-removed]"],
"runsOn" : { "ref": "EmrCluster" }
}
And the config for the corresponding EmrCluster resource it's running on:
{
"id" : "EmrCluster",
"type" : "EmrCluster",
"name" : "Hive Cluster",
"keyPair" : "[removed]",
"masterInstanceType" : "m3.xlarge",
"coreInstanceType" : "m3.xlarge",
"coreInstanceCount" : "2",
"coreInstanceBidPrice": "0.10",
"releaseLabel": "emr-4.1.0",
"applications": ["hive"],
"enableDebugging" : "true",
"terminateAfter": "45 Minutes"
}
The error message I'm getting is always the following:
java.io.IOException: Cannot run program "/usr/share/aws/emr/scripts/hive-script --run-hive-script --args -f s3://[bucket-name-removed]/s1-tracer-hql.q -d INPUT=s3://[bucket-name-removed] -d OUTPUT=s3://[bucket-name-removed]" (in directory "."): error=2, No such file or directory
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:139)
at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:13)
...
The main error msg being "... (in directory "."): error=2, No such file or directory".
I've logged into the master node and verified the existence of /usr/share/aws/emr/scripts/hive-script. I've also tried specifying an s3-based location for the hive-script, among a few other places; always the same error result.
I can manually create a cluster directly in EMR that looks exactly like what I'm specifying in this DataPipeline, with a Step that uses the identical "command-runner.jar,/usr/share/aws/emr/scripts/hive-script ..." command string, and it works without error.
Has anyone experienced this, and can advise me on what I'm missing and/or doing wrong? I've been at this one for awhile now.

I'm able to answer my own q, after some long research and try-error.
There were 3 things, maybe 4, wrong with my Step script:
needed the 'script-runner.jar', rather than the 'command-runner.jar', as we're running a script (which I ended up just pulling from EMR's libs dir on s3)
need to get the 'hive-script' from elsewhere - so, also went to the public EMR libs dir in s3 for this
a fun one, yay thanks AWS; for the Steps args (everything after the 'hive-script' specification)...need to comma-separate every value in it when in DataPipeline (as opposed to space-separating as you do when specifying args in a Step directly in EMR)
And then the "maybe 4th":
included the base folder in s3 and specific hive release we're working with for the hive-script (I added this as result of seeing something similar in an AWS blog, but haven't yet tested whether it makes a difference in my case, too drained with everything else)
So, in the end, my working EmrActivity ended looking like so:
{
"name" : "Extract and Transform",
"id" : "HiveActivity",
"type" : "EmrActivity",
"runsOn" : { "ref" : "EmrCluster" },
"step" : ["s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar,s3://us-east-1.elasticmapreduce/libs/hive/hive-script,--base-path,s3://us-east-1.elasticmapreduce/libs/hive/,--hive-versions,latest,--run-hive-script,--args,-f,s3://[bucket-name-removed]/s1-tracer-hql.q,-d,INPUT=s3://[bucket-name-removed],-d,OUTPUT=s3://[bucket-name-removed],-d,LIBS=s3://[bucket-name-removed]"],
"runsOn" : { "ref": "EmrCluster" }
}
Hope this helps save someone else from the same time-sink I invested. Happy coding!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Injecting structured json logs into journald - systemd

Related

In nightwatch, how do I specify additional string arguments after the selenium_port

HDFS Visulization of block distribution

Elasticsearch indexing is very slow

How to hndle dynamic values coming as string in jmeter

AWS Datapipeline, EmrActivity step to run a hive script fails immediately with 'No such file or directory'

Categories

Resources