Ruby - Delete the last character in a file? - ruby

Seems like it must be easy, but I just can't figure it out. How do you delete the very last character of a file using Ruby IO?
I took a look at the answer for deleting the last line of a file with Ruby but didn't fully understand it, and there must be a simpler way.
Any help?

There is File.truncate:
truncate(file_name, integer) → 0
Truncates the file file_name to be at most integer bytes long. Not available on all platforms.
So you can say things like:
File.truncate(file_name, File.size(file_name) - 1)
That should truncate the file with a single system call to adjust the file's size in the file system without copying anything.
Note that not available on all platforms caveat though. File.truncate should be available on anything unixy (such as Linux or OSX), I can't say anything useful about Windows support.

I assume you are referring to a text file. The usual way of changing such is to read it, make the changes, then write a new file:
text = File.read(in_fname)
File.write(out_fname, text[0..-2])
Insert the name of the file you are reading from for in_fname and the name of the file you are writing to for 'out_fname'. They can be the same file, but if that's the intent it's safer to write to a temporary file, copy the temporary file to the original file then delete the temporary file. That way, if something goes wrong before the operations are completed, you will probably still have either the original or temporary file. text[0..-2] is a string comprised of all characters read except for the last one. You could alternatively do this:
File.write(out_fname, File.read(in_fname, File.stat(in_fname).size-1))

Related

Why truncate when we had to pass a 'w' as an extra parameter to open in Ruby

I am learning Ruby using the book Learn Ruby the Hard Way by Zed Shaw. I am stuck at Exercise 16: Reading and Writing Files, because I don't understand why we had to pass a 'w' as an extra parameter to open in Ruby and also called the truncate method later on the same file.
Here's the code sample
puts "Opening the file..."
target = open(filename, 'w')
puts "Truncating the file. Goodbye!"
target.truncate(0)
Any form of help will be highly appreciated.
Firstly, using the 'w' mode will tell Ruby to create a new file with the name you give to it, or it will completely overwrite any file that already has that name. This means it will replace everything in the existing file with whatever text that you give it.
While you can also overwrite the contents of a file using the truncate() method, with truncate(), you can declare how much of the file you want to remove, based on where you're currently at in the file. Without parameters (truncate()) or with a parameter of 0 (truncate(0)), truncate() acts like 'w', whereas 'w' always just wipes the whole file clean, truncate() helps you specify how much content of the file that you want to wipe.
So this line of codetarget = open(filename, 'w'), reads the file with the name filename and writes to it as well (it completely wipes the contents of this file). Also, the line target.truncate(0) also wipes the contents of this file with the filename filename.
In order words, the both methods 'w' and truncate() are doing the same operation of wiping the whole contents of the file. But Zed Shaw intuitively used the both of them in the exercise, and asked that research be conducted about why he used the both of them the same time, so that one can understand how the both methods are used, and that both of the methods perform similar operations.
Just to add as a form of extra knowledge, using the 'a' mode will still tell Ruby to create a new file with the name you give to it if that file doesn't exist already. But unlike the 'w' mode, if that file does exist, Ruby will start writing at the end of the file, so that you won't lose anything that's already there.
That's all.
I hope this helps

[WinError 32]The process cannot access the file because it is being used by another process:

I have code where I'm writing to a file, and the next time I run the code after the code successfully runs, it gives me the following error:
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process:'minicube_HE022222.fits'
So every single time I have to change the name of the fits file and then there are no errors. It's just really frustrating having to change the filename everytime I run the code. Here is my code snippet:
new_hdu = fits.HDUList([fits.PrimaryHDU(mini_data), fits.ImageHDU(mini_error)])
new_hdu[0].header = qso_header
new_hdu.writeto('minicube_HE022222.fits',overwrite=True)
new_hdu.close()
I get the error at:
new_hdu.writeto('minicube_HE022222.fits',overwrite=True)
I close the file after writing to it but that doesn't help either.
Any suggestions are appreciated.
Update:
Here is another portion of a code where this error occurs:
hdus=[]
hdus.append(fits.PrimaryHDU())
hdus.append(fits.ImageHDU(par[0,:,:],name='amp_Hb'))
hdus.append(fits.ImageHDU(par[1,:,:],name='amp_OIII5007'))
hdus.append(fits.ImageHDU(par[2,:,:],name='amp_OIII5007_br'))
hdus.append(fits.ImageHDU(par[3,:,:],name='amp_Hb_br'))
hdus.append(fits.ImageHDU(par[4,:,:],name='amp_Hb1'))
hdus.append(fits.ImageHDU(par[5,:,:],name='amp_Hb2'))
hdus.append(fits.ImageHDU(par[6,:,:],name='amp_Fe5018_1'))
hdus.append(fits.ImageHDU(par[7,:,:],name='amp_Fe5018_2'))
hdus.append(fits.ImageHDU(par[8,:,:],name='m'))
hdus.append(fits.ImageHDU(par[9,:,:],name='c'))
hdu = fits.HDUList(hdus)
hdu.writeto('subcube_par_HE12_lsq.fits',overwrite=True)
It's only at the 'xxx.writeto' where the error occurs. If there's another way I can write to a file or update the existing file with the new data, please let me know. Thanks
As this comment notes, the way file I/O works on Windows is such that you can't overwrite a file if you have that file already open in another process. Are you writing this file and opening it in another program? If you have that file open in any other program then you can't overwrite it.
Do you need to be able to make in-place updates the file while it is open in another program? If so that may still be possible, but you can't use HDUList.writeto() as that effectively deletes the existing file and replaces it with a new one (rather than updating the existing file in-place).
Also, how are you running this code? Is it in a script? You mentioned having to change the filename every time but you could design things such that you wouldn't have to. I noticed that you have the filename hard-coded in your code, and that can and should be fixed if you want to write a more versatile script. For example, you could accept the filename as command-line argument. You could also have the script append a number to the filename or something if the file already exists.
All that said, you should figure out why you have the same file open in multiple programs.
One small usage aside:
The new_hdu.close() in your example doesn't actually do anything. The HDUList.close() method only makes sense when you open an existing FITS file from disk. Here you're creating an HDUList (the data structure representing a FITS file) in memory, and calling the high-level HDUList.writeto() which opens a file, writes the in-memory data to that file, and the closes the file. So the .close() in this case is a no-op. I'm guessing maybe you added it to try to fix your problem, but it's actually not relevant.

Rules for file extensions?

Are there any rules for file extensions? For example, I wrote some code which reads and writes a byte pattern that is only understood by that specific programm. I'm assuming my anti virus programm won't be too happy if I give it the name "pleasetrustme.exe"... Is it gerally allowed to use those extensions? And what about the lesser known ones, like ".arw"?
You can use any file extension you want (or none at all). Using standard extensions that reflect the actual type of the file just makes things more convenient. On Windows, file extensions control stuff like how the files are displayed in Windows Explorer and what happens when you double click on it.
I wrote some code which reads and writes a byte pattern that is only
understood by that specific programm.
A file extension is only an indication of what type of data will be inside, never a guarantee that certain data formatted in a specific way will be inside the file.
For your own specific data structure it is of course always best to choose an extension that is not already in use for other file formats (or use a general extension like .dat or .bin maybe). This also has the advantage of being able to use an own icon without it being overwritten by other software using the same extension - or the other way around.
But maybe even more important when creating a custom (binary?) file format, is to provide a magic number as the first bytes of that file, maybe followed by a file header structure containing a version number etc. That way your own software can first check the header data to make sure it's the right type and version (for example: anyone could rename any file type to your extension, so your program needs to have a way to do some checks inside the file before reading the remaining data).

Ruby: Create files with metadata

We're creating an app that is going to generate some text files on *nix systems with hashed filenames to avoid too-long filenames.
However, it would be nice to tag the files with some metadata that gives a better clue as to what their content is.
Hence my question. Does anyone have any experience with creating files with custom metadata in Ruby?
I've done some searching and there seem to be some (very old) gems that read metadata:
https://github.com/kig/metadata
http://oai.rubyforge.org/
I also found: system file, read write o create custom metadata or attributes extended which seems to suggest that what I need may be at the system level, but dropping down there feels dirty and scary.
Anyone know of libraries that could achieve this? How would one create custom metadata for files generated by Ruby?
A very old but interesting question with no answers!
In order for a file to contain metadata, it has to have a format that has some way (implicitly or explicitly) to describe where and how the metadata is stored.
This can be done by the format, such as having a header that says where the "main" data is stored and where the "metadata" is stored, or perhaps implicitly, such as having a length to the "main" data, and storing metadata as anything beyond the "main" data.
This can also be done by the OS/filesystem by storing information along with the files, such as permission info, modtime, user, and more comprehensive file information like "icon" as you would find with iOS/Windows.
(Note that I am using "quotes" around "main" and "metadata" because the reality is that it's all data, and needs to be stored in some way that tools can retrieve it)
A true text file does not contain any headers or any such file format, and is essentially just a continuous block of characters (disregarding how the OS may store it). This also means that it can be generally opened by any text editor, which will merely read and display all the characters it finds.
So the answer in some sense is that you can't, at least not on a true text file that is truly portable to multiple OS.
A few thoughts on how to get around this:
Use binary at the end of the text file with hope/requirements that their text editor will ignore non-ascii.
Store it in the OS metadata for the file and make it OS specific (such as storing it in the "comments" section that an OS may have for a file.
Store it in a separate file that goes "along with" the file (i.e., file.txt and file.meta) and hope that they keep the files together.
Store it in a separate file and zip the text and the meta file together and have your tool be zip aware.
Come up with a new file format that is not just text but has a text section (though then it can no longer be edited with a text editor).
Store the metadata at the end of the text file in a text format with perhaps comments or some indicator to leave the metadata alone. This is similar to the technique that the vi/vim text editor uses to embed vim commands into a file, it just puts them as comments at the beginning or end of the file.
I'm not sure there are many other ways to accomplish what you want, but perhaps one of those will work.

Prepending to a multi-gigabyte file

What would be the most performant way to prepend a single character to a multi-gigabyte file (in my practical case, a 40GB file).
There is no limitation on the implementation to do this. Meaning it can be through a tool, a shell script, a program in any programming language, ...
There is no really simple solution. There are no system calls to prepend data, only append or rewrite.
But depending on what you're doing with the file, you may get away with tricks.
If the file is used sequentially, you could make a named pipe and put cat onecharfile.txt bigfile > namedpipe and then use "namedpipe" as file. The same can be achieved by cat onecharfile.txt bigfile | program if your program takes stdin as input.
For random access a FUSE filesystem could be done, but probably waay too complicated for this.
If you want to get your hands really dirty, figure out howto
allocate a datablock (about inode and datablock structure)
insert it into a file's chain as second block (or first and then you're practically done)
write the beginning of file into that block
write the single character as first in file
mark first block as if it uses only one byte of available payload (this is possible for last block, I don't know if it's possible for blocks in middle of file chain).
This has possibilities to majorly wreck your filesystem though, so not recommended; good fun.
Let the file have an initial block of null characters. When you prepend a character, read the block, insert the character right-to-left, and write back the block. When the block is full, then do the more expensive full rewrite in order to prepend another null block. That way, you can reduce the number of times by a large factor that you have to do a full rewrite.
Added: Keep the file in two subfiles: A (a short one) and B (a long one). Prepend to A any way you like. When A gets "big enough", prepend A to B (by re-writing), and clear A.
Another way: Keep the file as a directory of small files ..., A000003, A000002, A000001.
Just prepend to the largest-numbered file. When it's big enough, make the next file in sequence.
When you need to read the file, just read them all in descending order.
You might be able to invert your implementation depending on your problem: append single characters to the end of your file. When it comes time to read the file, read it in reverse.
Hide this behind enough of an abstraction layer and it may not make a difference to your code how the bytes are physically stored.
If you use linux you could try to use a custom version of READ(2) loaded with LD_PRELOAD and have it prepend your data at the first read.
See https://zlibc.linux.lu/zlibc.html for implementation inspiration.
if you mean prepend that character to the start of the entire file, one way
$ echo "C" > tmp
$ cat my40gbfile >> tmp
$ mv tmp my40gbfile
or using sed
$ sed -i '1i C' my40gbfile
if you mean prepending the character to every line of the file
$ awk '{print "C"$0}' my40gbfile > temp && mv temp my40gbfile
As I understand, this is handled on the file system level, meaning if you prepend data to a file, it effectively rewrites the file. This is the same reason why the ID3 tags in MP3 files are zero padded, so that future updates don't rewrite the entire file, but just update those reserved bytes.
So whichever way you use will give roughly similar results. What you can try is do some tests with a custom copy function, that reads/writes in bigger chunks than the default system copy, say 2MB or 5MB, which might improve performance. Ultimately your disk I/O is the bottleneck here.
The absolutely most high-performance way would seem to be to get down into the level of sectors and how the file is actually stored. I'm not sure if the OS then becomes a factor, but the target platform might, anyway it's useful for us to know what you run on.
I think this is a case where C is the obvious choice, this kind of low-level stuff is exactly what a systems programming language is for.
Can you tell us what you end up doing, would be interesting.
Here's the Windows command line ("DOS") way:
Put your 1 char into prepend.txt
copy /b prepend.txt + myHugeFile fileNameOfCombinedFile

Resources