Solr ate all Memory and throws -bash: cannot create temp file for here-document: No space left on device on Server - bash

I have been started solr for long time approx 2 weeks then I saw that Solr ate around 22 GB from 28 GB RAM of my Server.
While checking status of Solr, using bin/solr -i it throws -bash: cannot create temp file for here-document: No space left on device
I stopped the Solr, and restarted the solr. It is working fine.
What's the problem actually. Didn't get?
And what is the solution for that?
I never want that Solr gets stop/halt while running.

First you should check the space on your file system. For example using df -h. Post the output here.
Is there any mount-point without free space?
2nd: find out the reason, why there is no space left. Your question handles two different thing: no space left on file system an a big usage of RAM.
Solr stores two different kind of data: the search index an the data himself.
Storing the data is only needed, if you like to output the documents after finding them in index. For example if you like to use highlighting. So take a look at your schema.xml an decide for every singe field, if it must be stored or if "indexing" the field is enough for your needs. Use the stored=true parameter for that.
Next: if you rebuild the index: keep in mind, that you need double space on disc during index rebuild.
You also could think about to move your index/data files to an other disk.
If you have solved you "free space" problem on disc, so you probably don't have an RAM issue any more.
If there is still a RAM problem, please post you java start parameter. There you can define, how much RAM is available for Solr. Solr needs a lot of virtual RAM, but an moderate size of physical RAM.
And: you could post the output of your logfile.

Related

Write to and read from free disk space using Windows API

Is it possible to write to free clusters on disk or read data from them using Windows APIs? I found Defrag API: https://learn.microsoft.com/en-gb/windows/desktop/FileIO/defragmenting-files
FSCTL_GET_VOLUME_BITMAP can be used to obtain allocation state of each cluster, FSCTL_MOVE_FILE can be used to move clusters around. But I couldn't find a way of reading data from free clusters or writing data to them.
Update: one of the workarounds which comes to mind is creating a small new file, writing some data to it, then relocating it to desired position and deleting the file (the data will remain in freed cluster). But that still doesn't solve reading problem.
What I'm trying to do is some sort of transparent cache, so user could still use his NTFS partition as usual and still see these clusters as free space, but I could store some data in them. Data safety is not of concern, it can be overwritten by user actions and will just be regenerated / redownloaded later when clusters become free again.
There is no easy solution in this way.
First of all, you should create own partition of the drive. It prevents from an accidental access to your data from OS or any process. Then call CreateFileA() with name of the partition. You will get raw access to the data. Please bear in mind that the function will fail for any partition accessed by OS.
You can perform the same trick with a physical drive too.
The docs
One way could be to open the volume directly via using CreateFile with the volumes UNC path as filename arguement (e.g.: \\.\C:).
You now can directly read and write to the volume.
So you maybe can achieve your desired goal with:
get the cluster size in bytes with GetDiskFreeSpace
get the map of free clusters with DeviceIoControl and FSCTL_GET_VOLUME_BITMAP
open the volume with CreateFile with its UNC path \\.\F:
(take a careful look into the documentation, especially the Remarks sections part about opening drives and volumes)
seek to the the offset of a free cluster (clusterindex * clusterByteSize) by using SetFilePointer
write/read your data with WriteFile/ReadFile on the handle, retreived by above CreateFile
(Also note that read/write access has to be sector aligned, otherwise the ReadFile/WriteFile calls fail)
Please note:
this is only meant as a starting point for your own research. This is not a bullet proof cooking receipt.
Backup your data before messing with the file system!!!
Also keep in mind that the free cluster bitmap will be outdated as soon as you get it (especially if using the system volume).
So I would strongly advise against use of such techniques in production or customer environments.

Creating an image of a drive cloned with ddrescue.

We have an old server with disk failures that we've tried to clone into VMSphere. This resulted in an error from what that error came from we couldn't pin point.
With ddrescue we cloned the machine to a 2TB external hard drive that we can use to lab around with without having any downtime.
Then we used normal dd to try to create an image that we then later could convert or insert into the virtual environment.
Problem is that we have don't have any workstations that are able to handle a 2TB file. Is there any way that we can create an image of the drive with the partitions, data and mbr? Basically everything except for the unallocated space.
You could try using "dd". If you have some idea how big the OS and data partitions were, just chop that much plus a bit extra off the front of the image and save in a new file. Say you guess it was 4GB of OS and 8GB of data, do something like this:
dd if=yourimage of=newsmallerimage bs=1024k count=14000
which will get you around 14GB. Any Virtual Machine will likely ignore any blank space at the end anyway.

How does file recovery software work?

I wanted to make some simple file recovery software, where I want to try to recover files which happen to have been deleted by pressing Shift + Delete. I'm working in Windows, can anyone show me any links or documents which can help me to do so programatically? I know C, C++, .NET. Any pointers?
http://www.google.hu/search?q=file+recovery+theory&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a :)
Mainly file recoveries are looking for file headers and/or filenames in the disk as I know, then try to get the whole file by the header information.
This could be a good start: http://geeksaresexy.blogspot.com/2006/02/theory-behind-deleted-files-recovery.html
The principle of all recovery tools is that deleting a file only removes a pointer in a folder and (quick) formatting of a partition only rewrites the first sectors of the partition which contains the headers of the filesystem. An in depth analysis of the partition data (at sector level) can rebuild a big part of the filesystem data, cluster allocation tables, folders, and file cluster chains.
All course if you use a surface test tool while formatting the partition that will rewrite all sectors to make sure that they are correct, nothing will be recoverable - unless you use specialized hardware to look at remanent magnetism on the edges of the actual tracks
In windows when a file is deleted(permanent delete) it's not actually deleted from disk but the file name added with char( _ I guess) in front of it and windows ignores these when showing in explorer... and recovery tools will search these kind of file names in the disk. And your file recover integrity based on some data over written on location of deleted file. Don't know this pattern still used by windows.. but long time back I have read this some where

Page error 0xc0000006 with VC++

I have a VS 2005 application using C++ . It basically importing a large XML of around 9 GB into the application . After running for more than 18 hrs it gave an exception 0xc0000006 In page error. THe virtual memory consumed is 2.6 GB (I have set the 3GB) flag.
Does any one have a clue as to what caused this error and what could be the solution
Instead of loading the whole file into the memory you can use SAX parsers to load only a part of the file to the memory.
9Gb seems overly large to read in. I would say that even 3Gb is too large in one go.
Is your OS 64bit?
What is the maximum pagefile size set to?
How much RAM do you have?
Were you running this in debug or release mode?
I would suggest that you try to reading the XML in smaller chunks.
Why are you trying to read in such a large file in one go?
I would imagine that your application took so long to run before failing as it started to copy the file into virtual memory, which is basically a large file on the hard disk. Thus the OS is reading the XML from the disk and writing it back onto a different area of disk.
**Edit - added text below **
Having had a quick peek at Expat XML parser it does look as if you're running into problems with stack or event handling, most likely you are adding too much to the stack.
Do you really need 3Gb of data on the stack? At a guess I would say that you are trying to process a XML database file, but I can't imagine that you have a table row that is so large.
I think that really you should use it to search for key areas and discard what is not wanted.
I know nothing other than what I have just read about Expat XML Parser but would suggest that you are not using it in the most efficient manner.

Are there alternatives for creating large container files that are cross platform?

Previously, I asked the question.
The problem is the demands of our file structure are very high.
For instance, we're trying to create a container with up to 4500 files and 500mb data.
The file structure of this container consists of
SQLite DB (under 1mb)
Text based xml-like file
Images inside a dynamic folder structure that make up the rest of the 4,500ish files
After the initial creation the images files are read only with the exception of deletion.
The small db is used regularly when the container is accessed.
Tar, Zip and the likes are all too slow (even with 0 compression). Slow is subjective I know, but to untar a container of this size is over 20 seconds.
Any thoughts?
As you seem to be doing arbitrary file system operations on your container (say, creation, deletion of new files in the container, overwriting existing files, appending), I think you should go for some kind of file system. Allocate a large file, then create a file system structure in it.
There are several options for the file system available: for both Berkeley UFS and Linux ext2/ext3, there are user-mode libraries available. It might also be possible that you find a FAT implementation somewhere. Make sure you understand the structure of the file system, and pick one that allows for extending - I know that ext2 is fairly easy to extend (by another block group), and FAT is difficult to extend (need to append to the FAT).
Alternatively, you can put a virtual disk format yet below the file system, allowing arbitrary remapping of blocks. Then "free" blocks of the file system don't need to appear on disk, and you can allocate the virtual disk much larger than the real container file will be.
Three things.
1) What Timothy Walters said is right on, I'll go in to more detail.
2) 4500 files and 500Mb of data is simply a lot of data and disk writes. If you're operating on the entire dataset, it's going to be slow. Just I/O truth.
3) As others have mentioned, there's no detail on the use case.
If we assume a read only, random access scenario, then what Timothy says is pretty much dead on, and implementation is straightforward.
In a nutshell, here is what you do.
You concatenate all of the files in to a single blob. While you are concatenating them, you track their filename, the file length, and the offset that the file starts within the blob. You write that information out in to a block of data, sorted by name. We'll call this the Table of Contents, or TOC block.
Next, then, you concatenate the two files together. In the simple case, you have the TOC block first, then the data block.
When you wish to get data from this format, search the TOC for the file name, grab the offset from the begining of the data block, add in the TOC block size, and read FILE_LENGTH bytes of data. Simple.
If you want to be clever, you can put the TOC at the END of the blob file. Then, append at the very end, the offset to the start of the TOC. Then you lseek to the end of the file, back up 4 or 8 bytes (depending on your number size), take THAT value and lseek even farther back to the start of your TOC. Then you're back to square one. You do this so you don't have to rebuild the archive twice at the beginning.
If you lay out your TOC in blocks (say 1K byte in size), then you can easily perform a binary search on the TOC. Simply fill each block with the File information entries, and when you run out of room, write a marker, pad with zeroes and advance to the next block. To do the binary search, you already know the size of the TOC, start in the middle, read the first file name, and go from there. Soon, you'll find the block, and then you read in the block and scan it for the file. This makes it efficient for reading without having the entire TOC in RAM. The other benefit is that the blocking requires less disk activity than a chained scheme like TAR (where you have to crawl the archive to find something).
I suggest you pad the files to block sizes as well, disks like work with regular sized blocks of data, this isn't difficult either.
Updating this without rebuilding the entire thing is difficult. If you want an updatable container system, then you may as well look in to some of the simpler file system designs, because that's what you're really looking for in that case.
As for portability, I suggest you store your binary numbers in network order, as most standard libraries have routines to handle those details for you.
Working on the assumption that you're only going to need read-only access to the files why not just merge them all together and have a second "index" file (or an index in the header) that tells you the file name, start position and length. All you need to do is seek to the start point and read the correct number of bytes. The method will vary depending on your language but it's pretty straight forward in most of them.
The hardest part then becomes creating your data file + index, and even that is pretty basic!
An ISO disk image might do the trick. It should be able to hold that many files easily, and is supported by many pieces of software on all the major operating systems.
First, thank-you for expanding your question, it helps a lot in providing better answers.
Given that you're going to need a SQLite database anyway, have you looked at the performance of putting it all into the database? My experience is based around SQL Server 2000/2005/2008 so I'm not positive of the capabilities of SQLite but I'm sure it's going to be a pretty fast option for looking up records and getting the data, while still allowing for delete and/or update options.
Usually I would not recommend to put files inside the database, but given that the total size of all images is around 500MB for 4500 images you're looking at a little over 100K per image right? If you're using a dynamic path to store the images then in a slightly more normalized database you could have a "ImagePaths" table that maps each path to an ID, then you can look for images with that PathID and load the data from the BLOB column as needed.
The XML file(s) could also be in the SQLite database, which gives you a single 'data file' for your app that can move between Windows and OSX without issue. You can simply rely on your SQLite engine to provide the performance and compatability you need.
How you optimize it depends on your usage, for example if you're frequently needing to get all images at a certain path then having a PathID (as an integer for performance) would be fast, but if you're showing all images that start with "A" and simply show the path as a property then an index on the ImageName column would be of more use.
I am a little concerned though that this sounds like premature optimization, as you really need to find a solution that works 'fast enough', abstract the mechanics of it so your application (or both apps if you have both Mac and PC versions) use a simple repository or similar and then you can change the storage/retrieval method at will without any implication to your application.
Check Solid File System - it seems to be what you need.

Resources