Connect 3+ OpenDaylight controllers to mininet topology - bundle

I would like to ask
I have created a cluster according to this
https://docs.opendaylight.org/en/stable-magnesium/getting-started-guide/clustering.html
And i would like to verify it is working can someone help me how to do it?
Also is it able to connect this cluster or those 3 controllers to one mininet topology? Or it cant be done?
EDIT
I would like to ask why
Not all bundle are active?
Is there gonna be some problem with that ?

I'm not sure if you can specify multiple controllers on the mininet command
line, but it's worth a try. Otherwise you can try like this person explains
in this post setting up the controllers in a mininet .py config file.
To verify the cluster is working, there are many ways, but you can try some
rest calls to check the status of things. We have some examples in upstream
CSIT tests. If you install the feature odl-jolokia, you can send a GET to:
jolokia/read/org.opendaylight.controller:Category=Shards,name=member-1-shard-default-config,type=DistributedConfigDatastore
that is checking the default shard status for the config datastore. You'll get
some output like this:
content={
"request": {
"mbean": "org.opendaylight.controller:Category=Shards,name=member-1-shard-default-config,type=DistributedConfigDatastore",
"type": "read"
},
"status": 200,
"timestamp": 1588524930,
"value": {
"AbortTransactionsCount": 0,
"CommitIndex": 70,
"CommittedTransactionsCount": 0,
"CurrentTerm": 7,
"FailedReadTransactionsCount": 0,
"FailedTransactionsCount": 0,
"FollowerInfo": [],
"FollowerInitialSyncStatus": true,
"InMemoryJournalDataSize": 33,
"InMemoryJournalLogSize": 1,
"LastApplied": 70,
"LastCommittedTransactionTime": "1970-01-01 00:00:00.000",
"LastIndex": 70,
"LastLeadershipChangeTime": "2020-05-03 16:54:45.034",
"LastLogIndex": 70,
"LastLogTerm": 7,
"LastTerm": 7,
"Leader": "member-2-shard-default-config",
"LeadershipChangeCount": 1,
"PeerAddresses": "member-3-shard-default-config: akka.tcp://opendaylight-cluster-data#10.30.170.119:2550/user/shardmanager-config/member-3-shard-default-config, member-2-shard-default-config: akka.tcp://opendaylight-cluster-data#10.30.170.113:2550/user/shardmanager-config/member-2-shard-default-config",
"PeerVotingStates": "member-3-shard-default-config: true, member-2-shard-default-config: true",
"PendingTxCommitQueueSize": 0,
"RaftState": "Follower",
"ReadOnlyTransactionCount": 0,
"ReadWriteTransactionCount": 0,
"ReplicatedToAllIndex": 69,
"ShardName": "member-1-shard-default-config",
"SnapshotCaptureInitiated": false,
"SnapshotIndex": 69,
"SnapshotTerm": 7,
"StatRetrievalError": null,
"StatRetrievalTime": "557.3 \u03bcs",
"TxCohortCacheSize": 0,
"VotedFor": "member-2-shard-default-config",
"Voting": true
}
}
Lots of info there, but the raftstate says Follower, so you know this node
is one of the two followers. One node will be leader.
Another thing we check is syncstatus to make sure it's "true". Use this
URI:
jolokia/read/org.opendaylight.controller:Category=ShardManager,name=shard-manager-operational,type=DistributedOperationalDatastore
example output

Related

Slack's files list API gives warning max_page_limit

I am using below API and listing 200 files per page.
https://slack.com/api/files.list?count=200&page={{pageNumber}}
I have 60000 files in my slack account. So on first API call received 200 files with pagination response like below.
"paging": {
"count": 200,
"total": 60000,
"page": 1,
"pages": 300
}
We continue fetching files with increasing page number in API query parameter like 2,3,4,.......
https://slack.com/api/files.list?count=200&page=2
"paging": {
"count": 200,
"total": 60000,
"page": 2,
"pages": 300
}
When we reached page number 101 the page parameter in paging response becomes 1 with warning max_page_limit. Can't we list all files with same pagination fashion? or Slack file list API allows us to list files till page 100 only? We didn't find anything in Slack documentation for this use case. Any help regarding this issue will be much appreciated.
https://slack.com/api/files.list?count=200&page=101
"paging": {
"count": 200,
"total": 60000,
"page": 1,
"pages": 300,
"warnings": [
"max_page_limit"
]
}
Here is what I got reply from slack forum.
There is indeed a page limit of 100 pages on files.list. I've contacted the documentation team to add this detail to the documentation for the method. You should be able to get your 60000 files with a highter count of 600 though.
There are other ways to filter down the expected number of results. For example, you could specify a time period for file creation date using the ts_from and ts_to arguments and do batches of calls within specified time periods, or batch your searches by channel by passing the channel argument. These techniques should always allow you to keep a batch within 100,000 files, as 1000 would be the max accepted limit.

Elasticsearch cluster health intermittently flaps between 'GREEN' and 'YELLOW'

We are running a 7 node cluster with "ZERO" replicas, like this:
{
"cluster_name": "my_cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 7,
"number_of_data_nodes": 7,
"active_primary_shards": 3325,
"active_shards": 3325,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
elasticsearch cluster state changes from "Green" to "Yellow" intermittently. The other interesting thing I noticed was during this intermittent cluster state changes, there is shard initializing taking place, which correlates with the cluster state changes. Is this due to the cluster running with "ZERO" replicas?
What could cause the above behavior ?
1.find that indices with
http://IP_MASTER:9200/_cat/indices?v
2.find the node that has the shard of that indices is going assign and unassigned.
http://IP_MASTER:9200/_cat/shards?v
restart service elasticsearch on that node
if problem exist you have two option.
A. lucene check index (just check that shard)
java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /mnt/nas/elasticsearch/graylog-production/nodes/0/indices/graylog_92/0/index/ -verbose -exorcise
if it say doesn't find the segment, try to find and cd on that path and run the command.
B. elasticsearch fix index (it check all index and is very slow)
index.shard.check_on_startup: fix
you should set this config on elasticsearch.yml of that node.

ElasticSearch v1.4 Recover Lost data

After restarting cluster I lose my data,How Can I recover it
ElasticSearch Version 1.4
{
"cluster_name": "mycluster",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 113,
"active_shards": 113,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 110
}
"status": "yellow" means that not all the replicas could be assigned (actually "unassigned_shards": 110), but all your primary shards and thus all your data is there: "active_primary_shards": 113,
Why are the replicas not allocated? Because you only have a single node in your cluster: "number_of_nodes": 1,. So either only one node is up or if there are multiple ones then they couldn't form a cluster (more details should be in the logs then).
Generally, your Elasticsearch version is ancient and a lot has been improved around resiliency. If you value your data, start planning an upgrade.

NoShardAvailableException after starting the Elasticsearch.bat

I have started elasticsearch.bat and I completed first indexing using Nest
ElasticClient.Index query.
Then I made my first query using
var results = ElasticClient.Search<Product>(body =>
body.Query(query =>
query.QueryString(qs => qs.Query(key))));
This is all I have done. Later I restarted elasticsearch console using elasticsearch.bat and now it keeps giving me error message NoShardAvailableException. I deleted and redownloaded a new elasticsearch.bat and i keep getting same error. How can I resolve it?
I am using 1.7.1 version and btw I installed Marvel plugin also.
Your problem is not related with a version, so updating will not resolve the issue. The issue is that shards cannot be assigned to nodes. As shown by your call, see "status": "red" and "unassigned_shards": 8:
{
"cluster_name": "elasticsearch",
"status": "red",
"timed_out": false,
"number_of_nodes": 2,
"number_of_data_nodes": 2,
"active_primary_shards": 8,
"active_shards": 16,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 8,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0
}
First off, you can try reassigning the unassigned_shards, using (see es for more on this):
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands": [
{"allocate": {
"index": "{your_index_name}",
"shard": 3,
"node": "{your_assigning_node_ide}",
"allow_primary": true }
}]
}'
Which shards are unassigned? To see this, use:
curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $0}'
When you know which shards create the problem, you can start by trying to recover the indices, using (indices recovery:
curl -XGET http://localhost:9200/index1,index2/_recovery
I find the grep UNASSIGNED statement particularly useful if, a couple out of a lot, are unassigned. Sometimes it is just easier (of course depending on the ease of refilling you indices), to delete and refill you index, in that case (delete indices) :
curl -XDELETE 'http://localhost:9200/concept_cv,concept_pl,concept_pt/'
Then reinsert your data.
This issue most probably was due to incorrect shutdown from your cluster, possibly also OOM exceptions. For more information on status : red:
https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html
http://elasticsearch-users.115913.n3.nabble.com/how-to-resolve-elasticsearch-status-red-td4020369.html

Golang: healthd and healthtop of the library "gocraft/health"

Im using gocraft/health to check the health of my service and have the metrics of each endPoint.
Im usin The JSON polling sink to get the metrics.
sink := health.NewJsonPollingSink(time.Minute*5, time.Minute*5)
stream.AddSink(sink)
I want to use healthtop and healthd here Link they explain how.
I fixed the environment variables: export HEALTHD_MONITORED_HOSTPORTS=:5001 HEALTHD_SERVER_HOSTPORT=:5002 healthd
as they said
after they said "Now you can run it". how, they didn't give any command to do it.I didn't realy understand what they mean.
I navigated to src/github.com/gocraft/health/cmd/healthd. I found main.go when I run it I got that in the console
[openrtb#sd-69536 healthd]$ go run main.go
[2015-06-17T23:04:20.871743758Z]: job:general event:starting kvs:[health_host_port::5002 monitored_host_ports::5001,:5002 server_host_port::5002]
[2015-06-17T23:04:20.87810814Z]: job:poll status:success time:4 ms kvs:[host_port::5002]
[2015-06-17T23:04:20.881896459Z]: job:poll status:success time:8 ms kvs:[host_port::5001]
[2015-06-17T23:04:20.882338024Z]: job:recalculate status:success time:231 μs
[2015-06-17T23:04:23.275370787Z]: job:recalculate status:success time:6 μs
[2015-06-17T23:04:30.875230839Z]: job:poll status:success time:1573 μs kvs:[host_port::5002]
[2015-06-17T23:04:30.881415193Z]: job:poll status:success time:7 ms kvs:[host_port::5001]
.
.
but no reslute on the those endpoints
localhost:5002/jobs: Lists top jobs
localhost:5002/hosts: Lists all monitored hosts and their statuses
it gave me {"error": "not_found"}
excepte this localhost:5002/health I got this JSON responce
{
"instance_id": "sd-69536.1291",
"interval_duration": 3600000000000,
"aggregations": [
{
"interval_start": "2015-06-18T01:00:00+02:00",
"serial_number": 48,
"jobs": {
"general": {
"timers": {},
"events": {
"starting": 1
},
"event_errs": {},
"count": 0,
"nanos_sum": 0,
"nanos_sum_squares": 0,
"nanos_min": 0,
"nanos_max": 0,
"count_success": 0,
"count_validation_error": 0,
"count_panic": 0,
"count_error": 0,
"count_junk": 0
},
"poll": {
"timers": {},
"events": {},
"event_errs": {},
"count": 24,
"nanos_sum": 107049159,
"nanos_sum_squares": 6.06770682813009e+14,
"nanos_min": 1581783,
"nanos_max": 8259442,
"count_success": 24,
"count_validation_error": 0,
"count_panic": 0,
"count_error": 0,
"count_junk": 0
},
"recalculate": {
"timers": {},
"events": {},
"event_errs": {},
"count": 23,
"nanos_sum": 3501601,
"nanos_sum_squares": 6.75958305123e+11,
"nanos_min": 70639,
"nanos_max": 290877,
"count_success": 23,
"count_validation_error": 0,
"count_panic": 0,
"count_error": 0,
"count_junk": 0
}
},
"timers": {},
"events": {
"starting": 1
},
"event_errs": {}
}
]
}
but no idea what this result mean, because it doesn't have any relation with my
localhost:5001/health EndPoint that should normaly aggregate as they said.
What you downloaded is a binary so you can just invoke it with healthd if you're in the correct directory, they actually provide this example;
HEALTHD_MONITORED_HOSTPORTS=:5020 HEALTHD_SERVER_HOSTPORT=:5032 healthd
Which isn't setting env var as much as invoking healthd with those two values (export or something would be required to persist the change beyond the one command). healthtop more clearly states what it is but as you can see by their paths, they're both commands gocraft/health/cmd/healthtop. They have several examples of using healthtop from bash, not so explicit about healthd but it's the same.
If you ran that command (as you show in your question) then you may want to try healthtop jobs or something to that effect. I don't know a ton about this project and don't care to research it but from what I can tell healthd is just a service that collects results from various /health endpoints and makes them available in on API. It seems like they intend for you to use healthtop to on top of it to view reports.
Also note this;
Great! To get a sense of the type of data healthd serves, you can manually navigate to:
/jobs: Lists top jobs
/aggregations: Provides a time series of aggregations
/aggregations/overall: Squishes all time series aggregations into one aggregation.
/hosts: Lists all monitored hosts and their statuses.
However, viewing raw JSON is just to give you a sense of the data. See the next section...
I'm not sure what the domain is (localhost:5032 if you're running locally?) but you should probably just be able to go to localhost:5032/jobs and see the healthd is running and doing something. Also check your apps to confirm it's up and running. Don't expect any output from it directly, that's what healthtop is for.

Resources