Because my server may run for a long time, the log file will be too large.Is there any way to cut logs according to size or time?
Since you worry about a large log file, try conditional logging or occassional logging
You can use the following macros to perform conditional logging:
LOG_IF(INFO, num_cookies > 10) << "Got lots of cookies";
The "Got lots of cookies" message is logged only when the variable num_cookies exceeds 10. If a line of code is executed many times, it may be useful to only log a message at certain intervals. This kind of logging is most useful for informational messages.
LOG_EVERY_N(INFO, 10) << "Got the " << google::COUNTER << "th cookie";
The above line outputs a log messages on the 1st, 11th, 21st, ... times it is executed. Note that the special google::COUNTER value is used to identify which repetition is happening.
You can combine conditional and occasional logging with the following macro.
LOG_IF_EVERY_N(INFO, (size > 1024), 10) << "Got the " << google::COUNTER
<< "th big cookie";
Instead of outputting a message every nth time, you can also limit the output to the first n occurrences:
LOG_FIRST_N(INFO, 20) << "Got the " << google::COUNTER << "th cookie";
Outputs log messages for the first 20 times it is executed. Again, the google::COUNTER identifier indicates which repetition is happening.
You can check here for more info
Now I found a way to split logs.
Using third party libraries.(ex:https://github.com/natefinch/lumberjack)
Related
In ruby code I am running a system call with Open3.popen3 and using the resultant IO for stdout and stderr to do some log message formatting before writing to one log file. I was wondering what would be the best way to do this so log messages will maintain the correct order, note I need to do separate formatting for error messages as for stdout messages.
Here's my current code (Assume logger is thread safe)
Open3.popen3("my_custom_script with_some_args") do |_in, stdout, stderr|
stdout_thr = Thread.new do
while line = stdout.gets.chomp
logger.info(format(:info, line))
end
end
stderr_thr = Thread.new do
while line = stderr.gets.chomp
logger.error(format(:error, line))
end
end
[stdout_thr, stderr_thr].each(&:join)
end
This has worked for me so far, but I'm not so confident that I can guarantee the correct order of the log messages. Is there a better way?
What you're trying to achieve is not possible with a guarantee. First thing to note is that your code could only possibly order based on the time that the data was received, not when it was produced, which is not quite the same. The only way to guarantee this would be to do something on the source which will add some guaranteed ordering between the two systems.
The below code should make it "more likely" to be correct by removing the threads. Assuming that you're using MRI, the threads are "green" so technically can't be running at the same time. That means you're beholden upon the scheduler choosing to run your thread at the "right" time.
Open3.popen3("my_custom_script with_some_args") do |_in, stdout, stderr|
for_reading = [stdout, stderr]
until(for_reading.empty?) do
wait_timeout = 1
# IO.select blocks until one of the streams is has something to read
# or the wait timeout is reached
readable, _writable, errors = IO.select(for_reading, [], [], wait_timeout)
# readable is nil in the case of a timeout - loop back again
if readable.nil?
Thread.pass
else
# In the case that both streams are readable (and thus have content)
# read from each of them. In this case, we cannot guarantee any order
# because we recieve the items at essentially the same time.
# We can still ensure that we don't mix data incorrectly.
readable.each do |stream|
buffer = ''
# loop through reading data until there is an EOF (value is nil)
# or there is no more data to read (value is empty)
while(true) do
tmp = stream.read_nonblock(4096, buffer, exception: false)
if tmp.nil?
# stream is EOF - nothing more to read on that one..
for_reading -= [stream]
break
elsif tmp.empty? || tmp == :wait_readable
# nothing more to read right now...
# continue on to process the buffer into lines and log them
break
end
end
if stream == stdout
buffer.split("\n").each { |line| logger.info(format(:info, line)) }
elsif stream == stderr
buffer.split("\n").each { |line| logger.info(format(:error, line)) }
end
end
end
end
end
Note that in a system generating a lot of output in a very short period of time there is more likely to be an overlap where things get out of order. This likelihood increases with the amount time taken to read the stream and process it. It would be best to ensure that the absolute minimum processing is done inside the loop. If the formatting (and writing) are expensive, consider moving those items into a separate thread reading from a single queue, and have the code inside the loop only push the buffer (and source identifier) onto the queue.
My test code as blow:
std::string strLogPath = "d:/logtest/";
google::InitGoogleLogging("test1");
FLAGS_log_dir = strLogPath;
FLAGS_stderrthreshold = google::GLOG_INFO;
FLAGS_minloglevel = google::GLOG_INFO;
//FLAGS_colorlogtostderr = true;
std::string strLogPath1 = "d:/logtest/L";
google::SetLogDestination(google::GLOG_INFO, strLogPath1.c_str());
google::SetLogDestination(google::GLOG_ERROR, strLogPath1.c_str());
google::SetLogDestination(google::GLOG_WARNING, strLogPath1.c_str());
google::SetLogDestination(google::GLOG_FATAL, strLogPath1.c_str());
LOG(INFO) << "infoinfo";
Sleep(1000);
LOG(WARNING) << "wwwww";
LOG(WARNING) << "wwwww";
LOG(ERROR) << "eeeeee";
Sleep(2000);
//LOG(FATAL) << "ffffff";
LOG(WARNING) << "wwwww";
LOG(WARNING) << "wwwww";
LOG(WARNING) << "wwwww";
google::ShutdownGoogleLogging();
I got two log files, one file contain all messages(INFO, WARNING and ERROR), and the other contain all WARNING and ERROR messages, but without INFO. This is quiet different from my expectation. I want all messages in one file, and don't like the WARNING and ERROR messages appear twice in different files.
It would be highly appreciated if someone can tell me the solution.
Thanks a lot in advance.
You probably already have a solution for your problem but I will post this anyway for other users.
glog is not designed for logging in one file as far as I know. Therefore you can't do what you want using only glog.
However there are other solutions for your problem.
First: Write your own little logging library. This isn't very complicated and is a great Programming training ;)
Second: For *nix only. Activate logging to stderr in glog using the logtostderr-Flag and redirect the output from stderr to the desired log file.
FLAGS_logtostderr=1;
LOG(INFO) << "Info";
LOG(WARNING) << "Warning";
LOG(ERROR) << "Error";
and on the shell: ./MyProg 2>logFile
Last: Keep everything as it is and delete the logfiles you don't need. Either within your program using c/c++ or with a call to system()
I have a log file which generates following data:
2015-07-06 11:07:24 +0522 [ERROR]
2015-07-06 11:07:29 +0522 [ERROR] index=healthe-int-legacy host=kdatamap.abc.com com.rp.keplar.collector.CollectorException: Could not process additional data, connection lost to data collector service
I want to store data in different section like date, time, index value and error related information like 'Could not process additional data, connection lost to data collector service' into database. How to parse so that I can easily store in DB? Please guide me.
You want to read up the manual on the really powerful File and String classes.
Consider this rather quick hack:
aFile = File.new("/your/file.dat")
aFile.each_line { |line|
arr = line.split
print "date = " + arr[0] + "\n"
print "time = " + arr[1] + "\n"
print "index = " + arr[4].split('=')[1]
}
It does not take into accout that the file might not exist or that the lines might be aligned differently. Have a look at '''regular''' expressions for implementation of a more robust (but unfortunately more difficult to read) matching algorighm.
Basic I/O is described at http://ruby-doc.com/docs/ProgrammingRuby/html/tut_io.html.
I use the Spyder IDE. Usually, when I am running non-parallelized scripts, I tend to debug using print statements. Depending on which statements are printed (or not), I can see where errors are occurring.
For example:
print "Started while loop..."
doWhileLoop = False
while doWhileLoop == True:
print "Doing something important!"
time.sleep(5)
print "Finished while loop..."
Above, I am missing a line that changes doWhileLoop to False at some point, so I will be stuck perpetually in the while loop, but my print statements let me see where it is in my code that I have hung up.
However, when running scripts that are parallelized, I get no output to the console until after the process has finished. Normally, what I do in this case is attempt to debug with a single process (i.e. temporarily deparallelize the program by running only one task, for instance), but currently, I am dealing with an error that seems to occur only when I am running more than one task.
So, I am having trouble figuring out what this error is using my usual methods -- how should I change my usual debugging practice in order to efficiently debug scripts employing multiprocessing?
Like #roippi said, debugging parallel things is hard. Another tool is using logging over print. Logging gives you severity, timestamps, and most importantly which process is doing something.
Example code:
import logging, multiprocessing, Queue
def myproc(arg):
return arg*2
def worker(inqueue, outqueue):
mylog = multiprocessing.get_logger()
mylog.info('start')
for job in iter(inqueue.get, 'STOP'):
mylog.info('got %s', job)
try:
outqueue.put( myproc(job), timeout=1 )
except Queue.Full:
mylog.error('queue full!')
mylog.info('done')
def executive(inqueue):
total = 0
mylog = multiprocessing.get_logger()
for num in iter(inqueue.get, 'STOP'):
total += num
mylog.info('got {}\ttotal{}', job, total)
logger = multiprocessing.log_to_stderr(
level=logging.INFO,
)
logger.info('setup')
inqueue, outqueue = multiprocessing.Queue(), multiprocessing.Queue()
if 0: # debug 'queue full!' issues
outqueue = multiprocessing.Queue(maxsize=1)
# prefill with 3 jobs
for num in range(3):
inqueue.put(num)
# signal end of jobs
inqueue.put('STOP')
worker_p = multiprocessing.Process(
target=worker, args=(inqueue, outqueue),
name='worker',
)
worker_p.start()
worker_p.join()
logger.info('done')
Example output:
[INFO/MainProcess] setup
[INFO/worker] child process calling self.run()
[INFO/worker] start
[INFO/worker] got 0
[INFO/worker] got 1
[INFO/worker] got 2
[INFO/worker] done
[INFO/worker] process shutting down
[INFO/worker] process exiting with exitcode 0
[INFO/MainProcess] done
[INFO/MainProcess] process shutting down
I saw an article which suggests the following code for a writer:
output = open("my_pipe", "w+") # the w+ means we don't block
output.puts "hello world"
output.flush # do this when we're done writing data
and a reader:
input = open("my_pipe", "r+") # the r+ means we don't block
puts input.gets # will block if there's nothing in the pipe
But could it happen that open, puts, gets will block the program? Is there some kind of timeout in place? Can one change it? Also, how come w+ means non-blocking call? Which open system call flags is it converted to?
Okay, let me share with you my picture of the world. As rogerdpack said, there are two options: 1) using select in blocking mode, 2) using non-blocking mode (O_NONBLOCK flag, read_nonblock, write_nonblock, select methods). I haven't tried, so these are just speculations.
As to why open, puts and gets may block the thread. open call blocks until there are at least one reader and at least one writer. And that must be the reason why we need to specify r+, w+ for open call. Judging from strace output they both are converted to O_RDWR flag. Then there must be some buffer, where not yet received data are stored. And that must be the reason why write methods may block. Read methods may block because they expect more data to be available, than it really is.
UPD
If a process attempts to read from an empty pipe, then read(2) will block until data is available. If a process attempts to write to a full pipe (see below), then write(2) blocks until sufficient data has been read from the pipe to allow the write to complete.
-- http://linux.die.net/man/7/pipe
The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.
Under Linux, opening a FIFO for read and write will succeed both in blocking and nonblocking mode. POSIX leaves this behavior undefined. This can be used to open a FIFO for writing while there are no readers available.
-- http://linux.die.net/man/7/fifo
And here's the implementation I came up with:
#!/home/yuri/.rbenv/shims/ruby
require 'timeout'
data = ((0..15).to_a.map { |v|
(v < 10 ? '0'.ord + v : 'a'.ord + v - 10).chr
} * 4096 * 2).reduce('', :+)
timeout = 10
start = Time.now
open('1.fifo', File::WRONLY | File::NONBLOCK) { |out|
out.flock(File::LOCK_EX)
nwritten = 0
data_len = data.length
begin
delta = out.write_nonblock data
data = data[delta..-1]
nwritten += delta
rescue IO::WaitWritable, Errno::EINTR
timeout_left = timeout - (Time.now - start)
if timeout_left < 0
puts Time.now - start
raise Timeout::Error
end
IO.select nil, [out], nil, timeout_left
retry
end while nwritten < data_len
}
puts Time.now - start
But for my problem at hand I decided to ignore this timeout thing. It probably will suffice to handle just situations when there is no reader on the other end of the pipe (Errno::ENXIO):
open('1.fifo', File::WRONLY | File::NONBLOCK) { |out|
out.flock(File::LOCK_EX)
nwritten = 0
data_len = data.length
begin
delta = out.write_nonblock data
data = data[delta..-1]
nwritten += delta
rescue IO::WaitWritable, Errno::EINTR
IO.select nil, [out]
retry
end while nwritten < data_len
}
P.S. Your feedback is appreciated.
This page should answer all your questions... http://www.ruby-doc.org/core-2.0.0/IO.html
In general, puts can always block the current thread, since they may have to wait for IO to complete for it to return. gets can also block the current thread because it will read and read forever until it hits the first newline, then it will return everything it read. HTH.