Out of memory when exporting a FAT partition using NFS - linux-kernel

After several hours of data transfer to a FAT partition using an NFSv3 server the server has no free memory left.
What happens:
For each NFS Write command the NFS daemon opens the file, writes the
data received and release the file.
When the FAT partition is mounted with flush option there is a call
to a function named congestion_wait on each file release. This
function can wait for up to 100ms.
The kernel we are using is version 3.16 and we didn't have the
problem when we were using the version 2.6.37. I discover that one of
the differences between them is that from version 3.6 the fput
function (called by nfsd) use a work queue to release the file.
The problem is that it is possible that the NFS daemon have to process more NFS Write commands than the FAT file system can release file.
The work queue can grow until the memory is full.
In our case the memory fills at 100MB/hour with a transfer rate of 50Mbits/s.
I am looking for a way to avoid this issue and I am thinking of reducing the congestion_wait timeout from 100ms to 10ms.
Does anybody knows why 100ms was choosen and if it is safe to reduce this value?
FYI the FAT file system flush option was introduced by the commit https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/fat/file.c?id=ae78bf9c4f5fde3c67e2829505f195d7347ce3e4.

Related

How much space does spark streaming checkpoint take?

I am new to Spark Streaming and have little knowledge about checkpoint.Is streaming data stored in the checkpoint? Is the data stored in hdfs or memory ?How much space will it takes?
according to : Spark The definitive guide
The most important operational concern for a streaming application is
failure recovery. Faults are inevitable: you’re going to lose a
machine in the cluster, a schema will change by accident without a
proper migration, or you may even intentionally restart the cluster or
application. In any of these cases, Structured Streaming allows you to
recover an application by just restarting it. To do this, you must
configure the application to use checkpointing and write-ahead logs,
both of which are handled automatically by the engine. Specifically,
you must configure a query to write to a checkpoint location on a
reliable file system (e.g., HDFS, S3, or any compatible filesystem).
Structured Streaming will then periodically save all relevant progress
information (for instance, the range of offsets processed in a given
trigger) as well as the current intermediate state values to the
checkpoint location. In a failure scenario, you simply need to restart
your application, making sure to point to the same checkpoint
location, and it will automatically recover its state and start
processing data where it left off. You do not have to manually manage
this state on behalf of the application—Structured Streaming does it
for you.
I conclude that it is job progress information and intermediate results in which stored in checkpoint not the data, checkpoint location has to be a path in an HDFS compatible file system and the required space is based on the intermediate generated output.

How to archive data stored in HDFS files on another (non-distributed) server?

I have a project folder containing approx. 50 GB of parquet files on a hadoop cluster (CDH 5.14), which I need to archive and move to another host (non-distributed with Windows or Linux). This is only a one time job - I do not plan to bring the data back to HDFS any time soon, however there should be a way to deploy it back to a distributed file system. What would be the optimal way to do it? Unfortunately, I don't have another hadoop cluster or a cloud environment where I could place this data.
I would appreciate any hints.
The optimal solution can depend on the actual data (e.g. Tables, many/few flat files). If you know how they got in there, looking at the inverse could be a logical first step.
For example, if you just use put to place the files, consider using get.
If you use Nifi to get it in, try Nifi to get it out.
After the data is on your Linux box, you can use SCP or something like FTP or a mounted drive to move it to the desired computer.

Nifi Tailfile processor always ingesting full files

My Tailfile processor keeps taking the entirety of the file I'm tailing. This creates a situation where I have a 20mb file that is added to my flow, then about a minute later, I'll have a 21mb file, etc. Why is it doing that? Here are the configurations
Know that I have this processor feeding my nifi flow through minifi. So it is local on the machine, then sending it through a remote process to my nifi
Thanks!
Edit: File listing
First of all, you have sufficient privileges on file and directory to read.
And you should set Initial Start Position to Beginning of Time

How do s3n/s3a manage files?

I've been using services like Kafka Connect and Secor to persist Parquet files to S3. I'm not very familiar with HDFS or Hadoop but it seems like these services typically write temporary files either into local memory or to disk before writing in bulk to s3. Do the s3n/s3a file systems virtualize an HDFS-style file system locally and then push at configured intervals or is there a one-to-one correspondence between a write to s3n/s3a and a write to s3?
I'm not entirely sure if I'm asking the right question here. Any guidance would be appreciated.
S3A/S3N just implement the Hadoop FileSystem APIs against the remote object store, including pretending it has directories you can rename and delete.
They have historically saved all the data you write to the local disk until you close() the output stream, at which point the upload takes place (which can be slow). This means that you must have as much temporary space as the biggest object you plan to create.
Hadoop 2.8 has a fast upload stream which uploads the file in 5+MB blocks as it gets written, then in the final close() makes it visible in the object store. This is measurably faster when generating lots of data in a single stream. This also avoids needing so much disk space.

dncp_block_verification log file increases size in HDFS

We are using cloudera CDH 5.3. I am facing a problem wherein the size of "/dfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr" and "dncp-vlock-verification.log.prev" keeps increasing to TBs within hours. I read in some of the blogs and they mention it is an HDFS bug. A temporary solution to this problem is to stop the datanode services and delete these files. But we have observed that the log file increases in size on either of the datanodes (even on the same node after deleting it). Thus, it requires continuous monitoring.
Does anyone have a permanent solution to this problem?
One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS DataNode configuration the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.
Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, which includes the HDFS-7430 rewrite changes and associated bug fixes. These changes have done away with the use of such a local file, thereby removing the problem.

Resources