Golang: healthd and healthtop of the library "gocraft/health" - go

Im using gocraft/health to check the health of my service and have the metrics of each endPoint.
Im usin The JSON polling sink to get the metrics.
sink := health.NewJsonPollingSink(time.Minute*5, time.Minute*5)
stream.AddSink(sink)
I want to use healthtop and healthd here Link they explain how.
I fixed the environment variables: export HEALTHD_MONITORED_HOSTPORTS=:5001 HEALTHD_SERVER_HOSTPORT=:5002 healthd
as they said
after they said "Now you can run it". how, they didn't give any command to do it.I didn't realy understand what they mean.
I navigated to src/github.com/gocraft/health/cmd/healthd. I found main.go when I run it I got that in the console
[openrtb#sd-69536 healthd]$ go run main.go
[2015-06-17T23:04:20.871743758Z]: job:general event:starting kvs:[health_host_port::5002 monitored_host_ports::5001,:5002 server_host_port::5002]
[2015-06-17T23:04:20.87810814Z]: job:poll status:success time:4 ms kvs:[host_port::5002]
[2015-06-17T23:04:20.881896459Z]: job:poll status:success time:8 ms kvs:[host_port::5001]
[2015-06-17T23:04:20.882338024Z]: job:recalculate status:success time:231 μs
[2015-06-17T23:04:23.275370787Z]: job:recalculate status:success time:6 μs
[2015-06-17T23:04:30.875230839Z]: job:poll status:success time:1573 μs kvs:[host_port::5002]
[2015-06-17T23:04:30.881415193Z]: job:poll status:success time:7 ms kvs:[host_port::5001]
.
.
but no reslute on the those endpoints
localhost:5002/jobs: Lists top jobs
localhost:5002/hosts: Lists all monitored hosts and their statuses
it gave me {"error": "not_found"}
excepte this localhost:5002/health I got this JSON responce
{
"instance_id": "sd-69536.1291",
"interval_duration": 3600000000000,
"aggregations": [
{
"interval_start": "2015-06-18T01:00:00+02:00",
"serial_number": 48,
"jobs": {
"general": {
"timers": {},
"events": {
"starting": 1
},
"event_errs": {},
"count": 0,
"nanos_sum": 0,
"nanos_sum_squares": 0,
"nanos_min": 0,
"nanos_max": 0,
"count_success": 0,
"count_validation_error": 0,
"count_panic": 0,
"count_error": 0,
"count_junk": 0
},
"poll": {
"timers": {},
"events": {},
"event_errs": {},
"count": 24,
"nanos_sum": 107049159,
"nanos_sum_squares": 6.06770682813009e+14,
"nanos_min": 1581783,
"nanos_max": 8259442,
"count_success": 24,
"count_validation_error": 0,
"count_panic": 0,
"count_error": 0,
"count_junk": 0
},
"recalculate": {
"timers": {},
"events": {},
"event_errs": {},
"count": 23,
"nanos_sum": 3501601,
"nanos_sum_squares": 6.75958305123e+11,
"nanos_min": 70639,
"nanos_max": 290877,
"count_success": 23,
"count_validation_error": 0,
"count_panic": 0,
"count_error": 0,
"count_junk": 0
}
},
"timers": {},
"events": {
"starting": 1
},
"event_errs": {}
}
]
}
but no idea what this result mean, because it doesn't have any relation with my
localhost:5001/health EndPoint that should normaly aggregate as they said.

What you downloaded is a binary so you can just invoke it with healthd if you're in the correct directory, they actually provide this example;
HEALTHD_MONITORED_HOSTPORTS=:5020 HEALTHD_SERVER_HOSTPORT=:5032 healthd
Which isn't setting env var as much as invoking healthd with those two values (export or something would be required to persist the change beyond the one command). healthtop more clearly states what it is but as you can see by their paths, they're both commands gocraft/health/cmd/healthtop. They have several examples of using healthtop from bash, not so explicit about healthd but it's the same.
If you ran that command (as you show in your question) then you may want to try healthtop jobs or something to that effect. I don't know a ton about this project and don't care to research it but from what I can tell healthd is just a service that collects results from various /health endpoints and makes them available in on API. It seems like they intend for you to use healthtop to on top of it to view reports.
Also note this;
Great! To get a sense of the type of data healthd serves, you can manually navigate to:
/jobs: Lists top jobs
/aggregations: Provides a time series of aggregations
/aggregations/overall: Squishes all time series aggregations into one aggregation.
/hosts: Lists all monitored hosts and their statuses.
However, viewing raw JSON is just to give you a sense of the data. See the next section...
I'm not sure what the domain is (localhost:5032 if you're running locally?) but you should probably just be able to go to localhost:5032/jobs and see the healthd is running and doing something. Also check your apps to confirm it's up and running. Don't expect any output from it directly, that's what healthtop is for.

Related

Slack's files list API gives warning max_page_limit

I am using below API and listing 200 files per page.
https://slack.com/api/files.list?count=200&page={{pageNumber}}
I have 60000 files in my slack account. So on first API call received 200 files with pagination response like below.
"paging": {
"count": 200,
"total": 60000,
"page": 1,
"pages": 300
}
We continue fetching files with increasing page number in API query parameter like 2,3,4,.......
https://slack.com/api/files.list?count=200&page=2
"paging": {
"count": 200,
"total": 60000,
"page": 2,
"pages": 300
}
When we reached page number 101 the page parameter in paging response becomes 1 with warning max_page_limit. Can't we list all files with same pagination fashion? or Slack file list API allows us to list files till page 100 only? We didn't find anything in Slack documentation for this use case. Any help regarding this issue will be much appreciated.
https://slack.com/api/files.list?count=200&page=101
"paging": {
"count": 200,
"total": 60000,
"page": 1,
"pages": 300,
"warnings": [
"max_page_limit"
]
}
Here is what I got reply from slack forum.
There is indeed a page limit of 100 pages on files.list. I've contacted the documentation team to add this detail to the documentation for the method. You should be able to get your 60000 files with a highter count of 600 though.
There are other ways to filter down the expected number of results. For example, you could specify a time period for file creation date using the ts_from and ts_to arguments and do batches of calls within specified time periods, or batch your searches by channel by passing the channel argument. These techniques should always allow you to keep a batch within 100,000 files, as 1000 would be the max accepted limit.

Connect 3+ OpenDaylight controllers to mininet topology

I would like to ask
I have created a cluster according to this
https://docs.opendaylight.org/en/stable-magnesium/getting-started-guide/clustering.html
And i would like to verify it is working can someone help me how to do it?
Also is it able to connect this cluster or those 3 controllers to one mininet topology? Or it cant be done?
EDIT
I would like to ask why
Not all bundle are active?
Is there gonna be some problem with that ?
I'm not sure if you can specify multiple controllers on the mininet command
line, but it's worth a try. Otherwise you can try like this person explains
in this post setting up the controllers in a mininet .py config file.
To verify the cluster is working, there are many ways, but you can try some
rest calls to check the status of things. We have some examples in upstream
CSIT tests. If you install the feature odl-jolokia, you can send a GET to:
jolokia/read/org.opendaylight.controller:Category=Shards,name=member-1-shard-default-config,type=DistributedConfigDatastore
that is checking the default shard status for the config datastore. You'll get
some output like this:
content={
"request": {
"mbean": "org.opendaylight.controller:Category=Shards,name=member-1-shard-default-config,type=DistributedConfigDatastore",
"type": "read"
},
"status": 200,
"timestamp": 1588524930,
"value": {
"AbortTransactionsCount": 0,
"CommitIndex": 70,
"CommittedTransactionsCount": 0,
"CurrentTerm": 7,
"FailedReadTransactionsCount": 0,
"FailedTransactionsCount": 0,
"FollowerInfo": [],
"FollowerInitialSyncStatus": true,
"InMemoryJournalDataSize": 33,
"InMemoryJournalLogSize": 1,
"LastApplied": 70,
"LastCommittedTransactionTime": "1970-01-01 00:00:00.000",
"LastIndex": 70,
"LastLeadershipChangeTime": "2020-05-03 16:54:45.034",
"LastLogIndex": 70,
"LastLogTerm": 7,
"LastTerm": 7,
"Leader": "member-2-shard-default-config",
"LeadershipChangeCount": 1,
"PeerAddresses": "member-3-shard-default-config: akka.tcp://opendaylight-cluster-data#10.30.170.119:2550/user/shardmanager-config/member-3-shard-default-config, member-2-shard-default-config: akka.tcp://opendaylight-cluster-data#10.30.170.113:2550/user/shardmanager-config/member-2-shard-default-config",
"PeerVotingStates": "member-3-shard-default-config: true, member-2-shard-default-config: true",
"PendingTxCommitQueueSize": 0,
"RaftState": "Follower",
"ReadOnlyTransactionCount": 0,
"ReadWriteTransactionCount": 0,
"ReplicatedToAllIndex": 69,
"ShardName": "member-1-shard-default-config",
"SnapshotCaptureInitiated": false,
"SnapshotIndex": 69,
"SnapshotTerm": 7,
"StatRetrievalError": null,
"StatRetrievalTime": "557.3 \u03bcs",
"TxCohortCacheSize": 0,
"VotedFor": "member-2-shard-default-config",
"Voting": true
}
}
Lots of info there, but the raftstate says Follower, so you know this node
is one of the two followers. One node will be leader.
Another thing we check is syncstatus to make sure it's "true". Use this
URI:
jolokia/read/org.opendaylight.controller:Category=ShardManager,name=shard-manager-operational,type=DistributedOperationalDatastore
example output

Append an array to a json using jq in BASH

I have a json that looks like this:
{
"failedSet": [],
"successfulSet": [{
"event": {
"arn": "arn:aws:health:us-east-1::event/AWS_RDS_MAINTENANCE_SCHEDULED_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
"endTime": 1502841540.0,
"eventTypeCategory": "scheduledChange",
"eventTypeCode": "AWS_RDS_MAINTENANCE_SCHEDULED",
"lastUpdatedTime": 1501208541.93,
"region": "us-east-1",
"service": "RDS",
"startTime": 1502236800.0,
"statusCode": "open"
},
"eventDescription": {
"latestDescription": "We are contacting you to inform you that one or more of your Amazon RDS DB instances is scheduled to receive system upgrades during your maintenance window between August 8 5:00 PM and August 15 4:59 PM PDT. Please see the affected resource tab for a list of these resources. \r\n\r\nWhile the system upgrades are in progress, Single-AZ deployments will be unavailable for a few minutes during your maintenance window. Multi-AZ deployments will be unavailable for the amount of time it takes a failover to complete, usually about 60 seconds, also in your maintenance window. \r\n\r\nPlease ensure the maintenance windows for your affected instances are set appropriately to minimize the impact of these system upgrades. \r\n\r\nIf you have any questions or concerns, contact the AWS Support Team. The team is available on the community forums and by contacting AWS Premium Support. \r\n\r\nhttp://aws.amazon.com/support\r\n"
}
}]
}
I'm trying to add a new key/value under successfulSet[].event (key name as affectedEntities) using jq, I've seen some examples, like here and here, but none of those answers really show how to add a possible one key with multiple values (the reason why I say possible is because as of now, AWS is returning one value for the affected entity, but if there are more, then I'd like to list them).
EDIT: The value of the new key that I want to add is stored in a variable called $affected_entities and a sample of that value looks like this:
[
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
]
The value could look like this:
[
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
...
...
...
]
You can use this jq,
jq '.successfulSet[].event += { "new_key" : "new_value" }' file.json
EDIT:
Try this:
jq --argjson argval "$new_value" '.successfulSet[].event += { "affected_entities" : $argval }' file.json
Test:
sat~$ new_value='[
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
]'
sat~$ jq --argjson argval "$new_value" '.successfulSet[].event += { "affected_entities" : $argval }' file.json
Note that --argjson works with jq 1.5 and above.

Is there a way I can get historic performance data of various alerts in Nagios as json/xml?

I am looking to get performance data of various alerts setup in my Nagios Core/XI. I think it is stored in RRDs. Are there ways I can get access to it?
If you're using Nagios XI you can get this data a few different ways.
If you're using XI 5 or later, then the easiest way that springs to mind is the API. Log in to your XI server as an administrator, navigate to 'Help' menu, then select 'Objects Reference' on the left hand side navigation and find 'GET objects/rrdexport' from the Objects Reference navigation box (or just scroll down to near the bottom).
An example curl might look like this:
curl -XGET "http://nagiosxi/nagiosxi/api/v1/objects/rrdexport?apikey=YOURAPIKEY&pretty=1&host_name=localhost"
Your response should look something like:
{
"meta": {
"start": "1453838100",
"step": "300",
"end": "1453838400",
"rows": "2",
"columns": "4",
"legend": {
"entry": [
"rta",
"pl",
"rtmax",
"rtmin"
]
}
},
"data": {
"row": [
{
"t": "1453838100",
"v": [
"6.0373333333e-03",
"0.0000000000e+00",
"1.7536000000e-02",
"3.0000000000e-03"
]
},
{
"t": "1453838400",
"v": [
"6.0000000000e-03",
"0.0000000000e+00",
"1.7037333333e-02",
"3.0000000000e-03"
]
}
]
}
}
BUT WAIT, THERE IS ANOTHER WAY
This way will work no matter what version you're on, and would actually work if you were processing performance data with NPCD on a Core system as well.
Log in to your server via ssh or console and get your butt over to the /usr/local/nagios/share/perfdata directory. From here we're going to use the localhost object as an example..
$ cd /usr/local/nagios/share/perfdata/
$ ls
localhost
$ cd localhost/
$ ls
Current_Load.rrd Current_Users.xml HTTP.rrd PING.xml SSH.rrd Swap_Usage.xml
Current_Load.xml _HOST_.rrd HTTP.xml Root_Partition.rrd SSH.xml Total_Processes.rrd
Current_Users.rrd _HOST_.xml PING.rrd Root_Partition.xml Swap_Usage.rrd Total_Processes.xml
$ rrdtool dump _HOST_.rrd
Once you run the rrdtool dump command, there is going to be an awful lot of output, so I keep that as an exercise for you, the reader ;)
If you're trying to automate something of some kind, then you should note that the xml files contain meta data for the rrd files and could potentially be useful to parse first.
Also, if you're anything like me, you love reading technical manuals. Here is a great one to read: RRDTool documentation
Hope this helped!

NoShardAvailableException after starting the Elasticsearch.bat

I have started elasticsearch.bat and I completed first indexing using Nest
ElasticClient.Index query.
Then I made my first query using
var results = ElasticClient.Search<Product>(body =>
body.Query(query =>
query.QueryString(qs => qs.Query(key))));
This is all I have done. Later I restarted elasticsearch console using elasticsearch.bat and now it keeps giving me error message NoShardAvailableException. I deleted and redownloaded a new elasticsearch.bat and i keep getting same error. How can I resolve it?
I am using 1.7.1 version and btw I installed Marvel plugin also.
Your problem is not related with a version, so updating will not resolve the issue. The issue is that shards cannot be assigned to nodes. As shown by your call, see "status": "red" and "unassigned_shards": 8:
{
"cluster_name": "elasticsearch",
"status": "red",
"timed_out": false,
"number_of_nodes": 2,
"number_of_data_nodes": 2,
"active_primary_shards": 8,
"active_shards": 16,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 8,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0
}
First off, you can try reassigning the unassigned_shards, using (see es for more on this):
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands": [
{"allocate": {
"index": "{your_index_name}",
"shard": 3,
"node": "{your_assigning_node_ide}",
"allow_primary": true }
}]
}'
Which shards are unassigned? To see this, use:
curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $0}'
When you know which shards create the problem, you can start by trying to recover the indices, using (indices recovery:
curl -XGET http://localhost:9200/index1,index2/_recovery
I find the grep UNASSIGNED statement particularly useful if, a couple out of a lot, are unassigned. Sometimes it is just easier (of course depending on the ease of refilling you indices), to delete and refill you index, in that case (delete indices) :
curl -XDELETE 'http://localhost:9200/concept_cv,concept_pl,concept_pt/'
Then reinsert your data.
This issue most probably was due to incorrect shutdown from your cluster, possibly also OOM exceptions. For more information on status : red:
https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html
http://elasticsearch-users.115913.n3.nabble.com/how-to-resolve-elasticsearch-status-red-td4020369.html

Resources