Reading message from chronicle queue does not move the current index to the next cycle automatically - chronicle

Reading message from chronicle queue does not move the current index to the next cycle automatically. I get the following logs messages:
697917 [SCHEDULER#4] INFO net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts - Rolled 2 times to find the next cycle file. This can occur if you appenders have not written anything for a while, leaving the cycle files with a gap.
What does this means?
My queue files are : 20160824.cq4 20160826.cq4 20160829.cq4 20160830.cq4. The 20160825.cq4 file does not exist, because there was no data added to the queue.

Can you clarify what the error is you are getting?.
We have tests which show skipping missing cycles works both forwards and backwards.
What does this means?
You have explained it here
"The 20160825.cq4 file does not exist, because there was no data added to the queue."
which is why a cycle is skipped and you see
"Rolled 2 times to find the next cycle file. This can occur if you appenders have not written anything for a while, leaving the cycle files with a gap."

This is just FYI since I can't comment as I just signed up. I was seeing a similar issue however not with the empty journals between days, just multiple days. I was getting a false return from ExcerptTailer.readDocument and noticed that the index hopped by a number greater than 2^32 between my last index and my first indicating a cycle shift. I only switched versions from
compile 'net.openhft:chronicle-queue:4.5.5'
compile 'net.openhft:chronicle-queue:4.5.27'
And the issue was resolved. As I am just in a prototyping phase I have not had a need to keep up so was quite far behind. I hope this helps and it does indeed look to be resolved for my case anyway.

Related

ClickHouse log shows hash of uncompressed files doesn't match

ClickHouse logs printed the error messages as below frequently:
2021.01.07 00:55:24.112567 [ 6418 ] {} <Error> vms.analysis_data (7056dab3-3677-455b-a07a-4d16904479b4):
Code: 40, e.displayText() = DB::Exception: Checksums of parts don't match:
hash of uncompressed files doesn't match (version 20.11.4.13 (official build)).
Data after merge is not byte-identical to data on another replicas. There could be several reasons:
1. Using newer version of compression library after server update.
2. Using another compression method.
3. Non-deterministic compression algorithm (highly unlikely).
4. Non-deterministic merge algorithm due to logical error in code.
5. Data corruption in memory due to bug in code.
6. Data corruption in memory due to hardware issue.
7. Manual modification of source data after server startup.
8. Manual modification of checksums stored in ZooKeeper.
9. Part format related settings like 'enable_mixed_granularity_parts' are different on different replicas.
We will download merged part from replica to force byte-identical result.
We use the same version(20.11.4.13) and the same compression method (LZ4) for all data nodes in the production environment, we wouldn't modify the data files or the values stored in Zookeeper also.
So my questions are:
How was the error caused? Furtherly, in which cases will the CickHouse server throws those exceptions?
Is there a checksum-checking mechanism among the replicas during the merging parts?
I also found that in one of our data nodes, there are many folders named like "ignored_20201208_23116_23116_0" in the detached folder, were these files the corrupted data caused by the referred problem?
Thanks.
You need to upgrade all nodes to 20.11.6.6 ASAP.
The reason of these errors is a serious bug related to AIO.
ignored_ -- it's not related. You can remove them.
gtranslate: Inactive parts are not deleted immediately, because when writing a new part, fsync is not called, i.e. for some time, the new part is only in the server's RAM (OS cache). So when the server is rebooted spontaneously, a new (merged) part can be lost or damaged. In this case, ClickHouse, during the startup process is checking the integrity of the parts, if it detects a problem, it returns the inactive chunks to the active list, and later merge them again. In this case, the broken piece is renamed (the prefix broken_ is added) and moved to the detached folder. If the integrity check detects no problems in the merged part, then the original inactive chunks are renamed (prefix ignored_ is added) and moved to the detached folder.

Nifi: how to avoid copying file that are partially written

I am trying to use Nifi to get a file from SFTP server. Potentially the file can be big , so my question is how to avoid getting the file while it is being written. I am planning to use ListSFTP+FetchSFTP but also okay with GetSFTP if it can avoid copying partially written files.
thank you
In addition to Andy's solid answer you can also be a bit more flexible by using the ListSFTP/FetchSFTP processor pair by doing some metadata based routing.
After ListSFTP each flowfile will have attributes such as 'file.lastModifiedTime' and others. You can read about them here https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.ListSFTP/index.html
You can put a RouteOnAttribute process in between the List and Fetch to detect objects that at least based on the reported last modified time are 'too new'. You could route those to a processor that is just a slow pass through to intentionally wait a bit. You can then run those back through the first router until they are 'old enough'. Now, this is admittedly a power user approach but it does give you a lot of flexibility and control. The approach I'm mentioning here is not fool proof as the source system may not report the last mod time correctly, it may not mean the source file is doing being written, etc.. But it gives you additional options IF you cannot do the definitely correct thing above that Andy talks about.
If you have control over the process which writes the file in, a common pattern to solve this is to initially write the file with a specific naming structure, such as beginning with .. After the successful write operation, the file is renamed without the . and it is picked up by the processor. Both GetSFTP and ListSFTP have a processor property called Ignore Dotted Files which is set to true by default and means those processors will not operate on or return files beginning with the dot character.
There is a minimum file age property you can use. The last modification time gets updated as the file is being written. Setting this value to something other than 0 will help fix the problem:

Incrementally reading logs

Looked around with numerous search strings but can't find anything quite like this:
I'm writing a custom log parser (ala analog or webalizer except not for webserver) and I want to be able to skip the hard work for the lines that have already been parsed. I have thought about using a history file like webalizer but have no idea how it actually works internally and my C is pretty poor.
I've considered hashing each line and writing the hashes out, then parsing the history file for their presence but I think this will perform poorly.
The only other method I can think of is storing the line number of the last parse and skipping until that number is reached the next time round. What happens when the log is rotated I am not sure.
Any other ideas would be appreciated. I will be writing the parser in ruby but tips in a similar language will help as well.
The solutions I can think of right now are bound to be brittle.
Even if you store the line number and later realize it would be past the length of the current file, what happens if old lines have been trimmed? You would start reading (well) after the last position.
If, on the other hand, you are sure your log files won't be tampered with and they will only be rotated, I only see two ways of doing what you want, and I'm not sure the second is applicable to you.
Anyway, here goes.
First solution
You store the last line you parsed along with a timestamp. At the next run, you consider all the rotated log files sorting them by their last modified date, figure out which one you read last time, and start reading from there.
I didn't think this through, there might be funny corner cases you will need to handle.
Second solution
You create a background script that continuously watches the log file. A quick search on Google turned out this gem, but I'm not sure if that's even an option for you. Even then, you might want to integrate this solution with the previous one just in case your daemon will get interrupted (because that's clearly bound to happen at some point).
As you read the file and parse the lines keep track of the byte count. Save that. On next read, try to seek to that byte offset in the file. If the file is smaller than the byte count, it's a new file so start at the beginning.

Pascal External:SIGSEGV

I have a program that uses linked lists. It crashes with External:SIGSEGV when it gets to
new(R);R:=queue;queue:=queue.Next;dispose(R);
where I'm getting rid of the first element of the queue list, after dispose(R). What's even more weird - when I change it to queue:=queue.Next that is, just moving forward without dumping the element - it still crashes, after this command. It's worth mentioning that the value of queue.Next=nil. And when I tried just queue:=nil; it crashed too, leaving me absolutely puzzled. Can somebody help me?
Edit: I've uploaded the whole code here, relevant line is no. 128.
The problem was that I was dereferencing the pointer later and lazarus wasn't able to backtrack to that position so he pointed at the line where the pointer was set to nil.

What can lead to failures in appending data to a file?

I maintain a program that is responsible for collecting data from a data acquisition system and appending that data to a very large (size > 4GB) binary file. Before appending data, the program must validate the header of this file in order to ensure that the meta-data in the file matches that which has been collected. In order to do this, I open the file as follows:
data_file = fopen(file_name, "rb+");
I then seek to the beginning of the file in order to validate the header. When this is done, I seek to the end of the file as follows:
_fseeki64(data_file, _filelengthi64(data_file), SEEK_SET);
At this point, I write the data that has been collected using fwrite(). I am careful to check the return values from all I/O functions.
One of the computers (windows 7 64 bit) on which we have been testing this program intermittently shows a condition where the data appears to have been written to the file yet neither the file's last changed time nor its size changes. If any of the calls to fopen(), fseek(), or fwrite() fail, my program will throw an exception which will result in aborting the data collection process and logging the error. On this machine, none of these failures seem to be occurring. Something that makes the matter even more mysterious is that, if a restore point is set on the host file system, the problem goes away only to re-appear intermittently appear at some future time.
We have tried to reproduce this problem on other machines (a vista 32 bit operating system) but have had no success in replicating the issue (this doesn't necessarily mean anything since the problem is so intermittent in the first place.
Has anyone else encountered anything similar to this? Is there a potential remedy?
Further Information
I have now found that the failure occurs when fflush() is called on the file and that the win32 error that is being returned by GetLastError() is 665 (ERROR_FILE_SYSTEM_LIMITATION). Searching google for this error leads to a bunch of reports related to "extents" for SQL server files. I suspect that there is some sort of journaling resource that the file system is reporting and this because we are growing a large file by opening it, appending a chunk of data, and closing it. I am now looking for understanding regarding this particular error with the hope for coming up with a valid remedy.
The file append is failing because of a file system fragmentation limit. The question was answered in What factors can lead to Win32 error 665 (file system limitation)?

Resources