I have a ruby script that uses rsolr rubygem to generate XMLs and POST them to Apache Solr (javadoc Update Command) surfaced by Jetty Server. My script logs certain time using the following code
405 unless docs.empty?
406 begin
407 log.info("Adding to solr")
408 response = solr.add(docs)
409 log.info("#{(id_2*100.0)/last_id}% Done")
410 if response['responseHeader']['status'] != 0
411 log.fatal("Document ids not sent")
412 #log.fatal(Solr::Request::AddDocument.new(docs_single).to_s)
413 log.close
414 exit
415 end
416 log.info("#{Time.now.to_f - starttime}s to feed Solr. #{id_1} to #{id_2}")
417 rescue Exception => e
418 log.fatal("Document ids not sent => ")
419 #log.fatal(Solr::Request::AddDocument.new(docs_single).to_s)
420 #log.fatal(docs)
421 log.close
422 exit
423 end
The log generated goes like
I, [2011-10-09T15:03:42.617048 #30092] INFO -- : Executing - SELECT * FROM solr_feeddata_2 WHERE id >= 5879999 AND id < 5881999
I, [2011-10-09T15:03:44.086661 #30092] INFO -- : External Data1 fetch time: 1.45462989807129
I, [2011-10-09T15:03:44.109514 #30092] INFO -- : External Data2 fetch time: 0.0226790904998779
I, [2011-10-09T15:03:44.109611 #30092] INFO -- : 1.49255704879761s to fetch details from database. 5879999 to 5881999
I, [2011-10-09T15:03:44.109702 #30092] INFO -- : Adding data1, data2, building docs
I, [2011-10-09T15:03:45.912603 #30092] INFO -- : 3.29554414749146s to build documents. 5879999 to 5881999
I, [2011-10-09T15:03:45.912730 #30092] INFO -- : Adding to solr
I, [2011-10-09T15:04:24.797620 #30092] INFO -- : 61.180855194502% Done
I, [2011-10-09T15:04:24.797744 #30092] INFO -- : 42.180694103241s to feed Solr. 5879999 to 5881999
According to this log, Solr took (42.18 - 3.29 - 1.49 - 2) 35.4s to respond. (See below comment)
At the same time my Solr log for this particular update goes like
INFO: {add=[5879999, 5880000, 5880001, 5880002, 5880003, 5880004, 5880005, 5880007, ... (1468 adds)]} 0 5780
Oct 9, 2011 3:04:24 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/update params={wt=ruby} status=0 QTime=5780
Oct 9, 2011 3:04:42 PM org.apache.solr.update.processor.LogUpdateProcessor finish
This clearly shows that Solr took 5.78s to add the docs, initiate response send and closed the log updater.
Both the services run on same machine, inside the network, and their ping summary is
rtt min/avg/max/mdev = 0.008/0.010/0.022/0.006 ms
This pattern is clearly visible for every batch data processed. Despite my sincere efforts to get this mystery out, I am not able to get the reason for this behavior.
My Solr mergeFactor is 10, autoCommit is off.
Related
I created a kafka cluster with 3 brokers and following details:
Created 3 topics, each one with replication factor=3 and partitions=2.
Created 2 producers each one writing to one of the topics.
Created a Streams application to process messages from 2 topics and write to the 3rd topic.
It was all running fine till now but I suddenly started getting the following warning when starting the Streams application:
[WARN ] 2018-06-08 21:16:49.188 [Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1] org.apache.kafka.clients.consumer.internals.Fetcher [Consumer clientId=Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1-restore-consumer, groupId=] Attempt to fetch offsets for partition Stream3-KSTREAM-OUTEROTHER-0000000005-store-changelog-0 failed due to: Disk error when trying to access log file on the disk.
Due to this warning, Streams application is not processing anything from the 2 topics.
I tried following things:
Stopped all brokers, deleted kafka-logs directory for each broker and restarted the brokers. It didn't solve the issue.
Stopped zookeeper and all brokers, deleted zookeeper logs as well as kafka-logs for each broker, restarted zookeeper and brokers and created the topics again. This too didn't solve the issue.
I am not able to find anything related to this error on official docs or web. Does anyone have an idea of why am I getting this error suddenly?
EDIT:
Out of 3 brokers, 2 brokers(broker-0 and broker-2) continously emit these logs:
Broker-0 logs:
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-2 logs:
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-1 shows following logs:
[2018-06-09 01:21:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:31:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:39:44,667] ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2018-06-09 01:41:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
I again stopped zookeeper and brokers, deleted their logs and restarted. As soon as I create the topics again, I start getting the above logs.
Topic details:
[zk: localhost:2181(CONNECTED) 3] get /brokers/topics/initial11_topic
{"version":1,"partitions":{"1":[1,0,2],"0":[0,2,1]}}
cZxid = 0x53
ctime = Sat Jun 09 01:25:42 EDT 2018
mZxid = 0x53
mtime = Sat Jun 09 01:25:42 EDT 2018
pZxid = 0x54
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 4] get /brokers/topics/initial12_topic
{"version":1,"partitions":{"1":[2,1,0],"0":[1,0,2]}}
cZxid = 0x61
ctime = Sat Jun 09 01:25:47 EDT 2018
mZxid = 0x61
mtime = Sat Jun 09 01:25:47 EDT 2018
pZxid = 0x62
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 5] get /brokers/topics/final11_topic
{"version":1,"partitions":{"1":[0,1,2],"0":[2,0,1]}}
cZxid = 0x48
ctime = Sat Jun 09 01:25:32 EDT 2018
mZxid = 0x48
mtime = Sat Jun 09 01:25:32 EDT 2018
pZxid = 0x4a
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
Any clue?
I found out the issue. It was due to following incorrect config in server.properties of broker-1:
advertised.listeners=PLAINTEXT://10.23.152.109:9094
Mistakenly port for advertised.listeners got changed to same as port of advertised.listeners of broker-2.
Below is my requirement:
Input:
0104919 ,08476,48528,2016,2016-08-29
00104919 ,08476,48528,2016,2016-09-05
00104919 ,08476,48528,2016,2016-09-12
00104919 ,08476,48528,2017,2016-08-29
Output after join should be:
2,00104919 ,08476,48528,2016,2016-09-05,2016-09-12
3,00104919 ,08476,48528,2016,2016-09-12,2016-08-29
Below is my code:
TABL = LOAD '/TABL/part-r-00000' using PigStorage('~') AS (a,b,c,d,e,f);
pre_Q1 = FOREACH TABL generate a,b,c,d,e;
DIST = DISTINCT pre_Q1;
ORDR = ORDER DIST BY *;
Q1 = rank ORDR;
Q2 = FOREACH Q1 GENERATE rank_ORDR + 1 AS rank_Q2, a, b, c, d, e;
Q_join = join Q2 by (rank_Q2, a, b, c, d), Q1 by (rank_ORDR, a, b, c, d);
C = limit Q_join 100;
dump C;
I am getting the below error.
Can someone point out what must be causing the below error.
Failed Jobs:
JobId Alias Feature Message Outputs
job_1474127474437_528208 C,Q2,Q_join HASH_JOIN Message: Job failed!
Input(s):
Successfully read 5235587 records (1516199217 bytes) from: "/TABL/part-r-00000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1474127474437_528166 -> job_1474127474437_528185,
job_1474127474437_528185 -> job_1474127474437_528190,
job_1474127474437_528190 -> job_1474127474437_528204,
job_1474127474437_528204 -> job_1474127474437_528206,
job_1474127474437_528206 -> job_1474127474437_528208,
job_1474127474437_528208 -> null,
null
2017-01-04 04:02:37,407 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,569 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,729 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,887 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2017-01-04 04:02:37,945 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C
Details at logfile: /var/log/gphd/pig/pig.log
Try to modify the first line as below :
TABL = LOAD '/TABL/part-r-00000' using PigStorage(',') AS (a,b,c,d,e,f);
And watch out to the space at the end of the column a, it may affect the join !
Given the following regex
/(500 Internal Server Error)/
How do we match only the 1st occurrence of this pattern in a single string that has many repeats of the same match?
For example:
Match 1
1. 500 Internal Server Error
Match 2
1. 500 Internal Server Error
Match 3
1. 500 Internal Server Error
How do we get 1. as our only answer?
Sample text for the match as follows (one big string on purpose)
[SoapUIMultiThreadedHttpConnectionManager$SoapUIDefaultClientConnection] Receiving response: HTTP/1.1 500 Internal Server Error12:09:26,638 INFO [SoapUIProTestCaseRunner] Assertion [Match content of [question]] has status VALID12:09:26,638 DEBUG [SoapUIMultiThreadedHttpConnectionManager$SoapUIDefaultClientConnection] Connection 0.0.0.0:41494<->23.6.55.1:80 shut down12:09:26,638 DEBUG [SoapUIMultiThreadedHttpConnectionManager$SoapUIDefaultClientConnection] Connection 0.0.0.0:41494<->23.6.55.1:80 closed12:09:26,638 INFO [SoapUIProTestCaseRunner] Finished running SoapUI testcase [Test_fieldsParameter], time taken: 384ms, status: FINISHED12:09:26,640 INFO [SoapUIProTestCaseRunner] Assertion [Match content of [errorCode]] has status VALID12:09:26,641 INFO [SoapUIProTestCaseRunner] Assertion [Match content of [message]] has status VALID12:09:26,641 INFO [SoapUIProTestCaseRunner] Assertion [Valid HTTP Status Codes] has status VALID12:09:26,641 INFO [SoapUIProTestCaseRunner] running step [OtherSortBy]12:09:26,643 DEBUG [HttpClientSupport$SoapUIHttpClient] Stale connection check12:09:26,645 DEBUG [HttpClientSupport$SoapUIHttpClient] Attempt 1 to execute request12:09:26,646 DEBUG [SoapUIMultiThreadedHttpConnectionManager$SoapUIDefaultClientConnection] Sending request: GET /api/review/v1/questions?prodId=570043&_sortby=other HTTP/1.112:09:26,666 DEBUG [SoapUIMultiThreadedHttpConnectionManager$SoapUIDefaultClientConnection] Receiving response: HTTP/1.1 500 Internal Server Error12:09:26,667 DEBUG [SoapUIMultiThreadedHttpConnectionManager$SoapUIDefaultClientConnection] Connection 0.0.0.0:41475<->23.6.55.1:80 shut down12:09:26,667 DEBUG
Please have a look, this proves the string captured is the first:
str = "Match 1\n1. 500 Internal Server Error\nMatch 2\n1. 500 Internal Server Error\nMatch 3\n1. 500 Internal Server Error"
re = /(500 Internal Server Error)/
mdata = re.match(str)
puts mdata.begin(1)
puts mdata
Output of a sample program proving it is the first occurrence (index of 12 in the input string):
12
500 Internal Server Error
I have a Sinatra app running on Rainbows.
I log the following :
before do
logger.info("#{Process.pid} #{Time.now} #{request.ip} #{request.path_info} # {params.to_s}")
end
and
after do
logger.info("#{Process.pid} #{Time.now} #{request.ip} #{request.path_info} #{params.to_s} => #{response.headers['X-API-Status']} (#{response.successful?})")
end
and in my logs I can read:
25988 2012-11-13 11:57:52 +0100 192.168.90.1 /req {"u"=>"810000027"}
25988 2012-11-13 11:57:59 +0100 192.168.90.1 /req {"u"=>"810000027"} => 200 (true)
192.168.90.1 - - [13/Nov/2012 11:57:59] "POST /req HTTP/1.1" **200** 14 7.5862
25988 2012-11-13 11:57:59 +0100 192.168.90.1 /req {"u"=>"810000027"}
25988 2012-11-13 11:57:59 +0100 192.168.90.1 /req {"u"=>"810000027"} => 200 (true)
192.168.90.1 - - [13/Nov/2012 11:57:59] "POST /req HTTP/1.1" 200 14 0.0223
E, [2012-11-13T11:58:04.099913 #25875] ERROR -- : worker=2 PID:**25988** timeout (12s > 11s), killing
E, [2012-11-13T11:58:04.106428 #25875] ERROR -- : reaped #<Process::Status: pid 25988 SIGKILL (signal 9)> worker=2
My worker (pid 25988) is being killed as if it had not responded to the first request... But it has, obviously! It even handled another request (and I use Base concurrency model -> no concurrency)
My Rainbows configuration is:
Rainbows! do
timeout(10)
end
listen(3000)
pid('/tmp/rainbows.pid')
stderr_path('/var/log/rainbows.log')
stdout_path('/var/log/rainbows.log')
working_directory('/opt/app')
worker_processes(4)
Do you have any idea of what happens ? Or how I could investigate further ?
Thanks !
Actually the issue lied in the request that was "kept alive" too long by the client (flash). Apparently there is no way to properly close the tcp connection in AS3...
I fixed my issue with:
Rainbows! do
keepalive_timeout(0)
end
which seems appropriate for me anyway.
When I run apache bench I get results like:
Command: abs.exe -v 3 -n 10 -c 1 https://mysite
Connection Times (ms)
min mean[+/-sd] median max
Connect: 203 213 8.1 219 219
Processing: 78 177 88.1 172 359
Waiting: 78 169 84.6 156 344
Total: 281 389 86.7 391 564
I can't seem to find the definition of Connect, Processing and Waiting. What do those numbers mean?
By looking at the source code we find these timing points:
apr_time_t start, /* Start of connection */
connect, /* Connected, start writing */
endwrite, /* Request written */
beginread, /* First byte of input */
done; /* Connection closed */
And when request is done some timings are stored as:
s->starttime = c->start;
s->ctime = ap_max(0, c->connect - c->start);
s->time = ap_max(0, c->done - c->start);
s->waittime = ap_max(0, c->beginread - c->endwrite);
And the 'Processing time' is later calculated as
s->time - s->ctime;
So if we translate this to a timeline:
t1: Start of connection
t2: Connected, start writing
t3: Request written
t4: First byte of input
t5: Connection closed
Then the definitions would be:
Connect: t1-t2 Most typically the network latency
Processing: t2-t5 Time to receive full response after connection was opened
Waiting: t3-t4 Time-to-first-byte after the request was sent
Total time: t1-t5
From http://chestofbooks.com/computers/webservers/apache/Stas-Bekman/Practical-mod_perl/9-1-1-ApacheBench.html:
Connect and Waiting times
The amount of time it took to establish the connection and get the first bits of a response
Processing time
The server response time—i.e., the time it took for the server to process the request and send a reply
Total time
The sum of the Connect and Processing times
I equate this to:
Connect time: the amount of time it took for the socket to open
Processing time: first byte + transfer
Waiting: time till first byte
Total: Sum of Connect + Processing
Connect: Time it takes to connect to remote host
Processing: Total time minus time it takes to connect to remote host
Waiting: Response first byte receive minus last byte sent
Total: From before connect till after the connection is closed