Read the file names or the number of files in tar.gz - ruby

I have a tar.gz file, which holds multiple csv files archived. I need to read the list of the file names or at least the number of files.
This is what I tried:
require 'zlib'
file = Zlib::GzipReader.open('test/data/file_name.tar.gz')
file.each_line do |line|
p line
end
but this only prints each line in the csv files, not the file names. I also tried this:
require 'zlib'
Zlib::GzipReader.open('test/data/file_name.tar.gz') { | f |
p f.read
}
which reads similarly, but character by character instead of line by line.
Any idea how I could get the list of file names or at least the number of files within the archive?

You need to use a tar reader on the uncompressed output.
".tar.gz" means that two processes were applied to generate the file. First a set of files were "tarred" to make a ".tar" file which contains a sequence of (file header block, uncompressed file data) units. Then that was gzipped as a single stream of bytes, to make the ".tar.gz". In reality, the .tar file was very likely never stored anywhere, but generated as a stream of bytes and gzipped on the fly to write out the .tar.gz file directly.
To get the contents, you reverse the process, ungzipping, and then feeding the result of that to a tar reader to interpret the file header blocks and extract the data. Again, you can ungzip and read the tarred file contents on the fly, with no need to store the intermediate .tar file.

Related

how do i combine txt file from a list of file emplacement

i have a problem, i used "everything" to extract every txt file from a specific directory so that i can merge them. But on emeditor i don't find a way to merge file from a list of localisation.
Here what the everything file look like:
E:\Main directory\subdirectory 1\file.txt
E:\Main directory\subdirectory 2\file.txt
E:\Main directory\subdirectory 3\file.txt
E:\Main directory\subdirectory 4\file.txt
The list goes over 40k location. is there a way to use a program to read all the location in the text file and combine them ?
Also, the subdirectory has other txt file that i don't want to so i can't just merge all txt file from the main. Another thing is that there are variation of the "file.txt" like "Files.txt" for example.

Terminal - Unzip all .gz files in a folder without combining resulting files

I have a folder, TestFolder, that contains several .gz files. Each .gz file is a folder containing several sub-directories, with the deepest level of each .gz file containing 5 text files. For example, extracting one of the .gz files ultimately has 5 files at the deepest level of the directory, like:
Users/me/Desktop/TestFolderParent/TestFolder/folder1/subfolder1/subfolder2/subfolder3/subfolder4/subfolder5/subfolder6/TextFile1.txt
Users/me/Desktop/TestFolderParent/TestFolder/folder1/subfolder1/subfolder2/subfolder3/subfolder4/subfolder5/subfolder6/TextFile2.txt
Users/me/Desktop/TestFolderParent/TestFolder/folder1/subfolder1/subfolder2/subfolder3/subfolder4/subfolder5/subfolder6/TextFile3.txt
Users/me/Desktop/TestFolderParent/TestFolder/folder1/subfolder1/subfolder2/subfolder3/subfolder4/subfolder5/subfolder6/TextFile4.txt
Users/me/Desktop/TestFolderParent/TestFolder/folder1/subfolder1/subfolder2/subfolder3/subfolder4/subfolder5/subfolder6/TextFile5.txt
when I run gunzip -r /Users/myuser/Desktop/TestFolderParent/TestFolder in terminal, it extracts all of the .gz files, each as a single text file containing all 5 constituent text files concatenated together. Is there any way to instead run a command to extract each .gz file and return each of the 5 constituent text files as a separate file?
.gz files themselves do not and cannot contain "several sub-directories". The gzip format compresses a single file, and that's it. gunzip will extract exactly one file from one .gz file.
That single file can itself be an uncompressed archive of files. That is often done using the tar archiver, so you end up with a .tar.gz file. Is that what you have? Then you need to use tar, not gunzip to extract the files.

Bash Script to read CSV file and search directory for files to copy

I'm working on creating bash script to read a CSV file (comma delineated). The file contains parts of names for files in another directory. I then need to take these names and use them to search the directory and copy the correct files to a new folder.
I am able to read the csv file. However, csv file only contains part of the file names so I need to use wildcards to search the directory for the files. I have been unable to get the wildcards to work within the directory.
CSV File Format (in notepad):
12
13
14
15
Example file names in target directory:
IXI12_asfds.nii
IXI13_asdscds.nii
IXI14_aswe32fds.nii
IXI15_asf432ds.nii
The prefix to all of the files is the same: IXI. The csv file contains the unique numbers for each target file which appear right after the prefix. The middle portion of the filenames are unique to each file.
#!/bin/bash
# CSV file with comma delineated numbers.
# CSV file only contains part of the file name. Need to add IXI to the
beginning, and search with a wildcard at the end.
input="CSV_file.csv"
while IFS=',' read -r file_name1
do
name=(IXI$file_name1)
cp $name*.nii /newfolder
done < "$input"
The error I keep getting is saying that no folder with the appropriate name is able to be identified.

Append string to an existing gzipfile in Ruby

I am trying to read a gzip file and append a part of the gzip file (which is string) to another existing gzip file. The size of string is ~3000 lines. I will have to do this multiple times (~10000 times) in ruby. What would be the most efficient way of doing this?. The zlib library does not support appending and using backticks (gzip -c orig_gzip >> gzip.gz) seems to be too slow. The resulting file should be a gigantic text file
It's not clear what you are looking for. If you are trying to join multiple files into one gzip archive, you can't get there. Per the gzip documentation:
Can gzip compress several files into a single archive?
Not directly. You can first create a tar file then compress it:
for GNU tar: gtar cvzf file.tar.gz filenames
for any tar: tar cvf - filenames | gzip > file.tar.gz
Alternatively, you can use zip, PowerArchiver 6.1, 7-zip or Winzip. The zip format allows random access to any file in the archive, but the tar.gz format usually gives a better compression ratio.
With the number of times you will be adding to the archive, it makes more sense to expand the source then append the string to a single file, then compress on demand or a cycle.
You will have a large file but the compression time would be fast.
If you want to accumulate data, not separate files, in a gzip file without expanding it all, it's possible from Ruby to append to an existing gzip file, however you have to specify the "a" ("Append") mode when opening your original .gzip file. Failing to do that causes your original to be overwritten:
require 'zlib'
File.open('main.gz', 'a') do |main_gz_io|
Zlib::GzipWriter.wrap(main_gz_io) do |main_gz|
5.times do
print '.'
main_gz.puts Time.now.to_s
sleep 1
end
end
end
puts 'done'
puts 'viewing output:'
puts '---------------'
puts `gunzip -c main.gz`
Which, when run, outputs:
.....done
viewing output:
---------------
2013-04-10 12:06:34 -0700
2013-04-10 12:06:35 -0700
2013-04-10 12:06:36 -0700
2013-04-10 12:06:37 -0700
2013-04-10 12:06:38 -0700
Run that several times and you'll see the output grow.
Whether this code is fast enough for your needs is hard to say. This example artificially drags its feet to write once a second.
It sounds like your appended data is long enough that it would be efficient enough to simply compress the 3000 lines to a gzip stream and append that to the existing gzip stream. gzip has the property that two valid gzip streams concatenated is also a valid gzip stream, and that gzip stream decompresses to the concatenation of the decompressions of the two original gzip streams.
I don't understand "(gzip -c orig_gzip >> gzip.gz) seems to be too slow". That would be the fastest way. If you don't like the time spent compressing, you can reduce the compression level, e.g. gzip -1.
The zlib library actually supports quite a bit, when the low-level functions are used. You can see advanced examples of gzip appending in the examples/ directory of the zlib distribution. You can look at gzappend.c, which appends more efficiently, in terms of compression, than a simple concatenation, by first decompressing the existing gzip stream and picking up compression where the previous stream left off. gzlog.h and gzlog.c provide an efficient and robust way to append short messages to a gzip stream.
You need to open the gzipped file in binary mode (b) and also in append mode (a), in my case it is a gzipped CSV file.
file = File.open('path-to-file.csv.gz', 'ab')
gz = Zlib::GzipWriter.new(f)
gz.write("new,row,csv\n")
gz.close
If you open the file in w mode, you will overwrite the content of the file. Check the documentation for full description of open modes http://ruby-doc.org/core-2.5.3/IO.html#method-c-new

Parsing a Zip file and extracting records from text files

I am really new to Ruby and could use some help with a program. I need to open a zip file that contains multiple text files that has many rows of data (eg.)
CDI|3|3|20100515000000|20100515153000|2008|XXXXX4791|0.00|0.00
CDI|3|3|20100515000000|20100515153000|2008|XXXXX5648|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX3276|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX4342|0.00|0.00
MITR|3|3|20100515000000|20100515153000|0000|XXXXX7832|0.00|0.00
HR|3|3|20100515000000|20100515153000|1114|XXXXX0238|0.00|0.00
I first need to extract the zip file, read the text files located in the zip file and write only the complete rows that start with (CDI and CHO) to two output files, one for the rows of data starting with CDI and one for the rows of data starting with CHO (basically parsing the file). I have to do it with Ruby and possibly try to set the program to an auto function for arrival of continuous zip files of the same stature. I completely appreciate any advice, direction or help via some sample anyone can give.
One means is using the ZipFile library.
require 'zip/zip'
# To open the zip file and pass each entry to a block
Zip::ZipFile.foreach(path_to_zip) do |text_file|
# Read from entry, turn String into Array, and pass to block
text_file.read.split("\n").each do |line|
if line.start_with?("CDI") || line.start_with?("CHO")
# Do something
end
end
end
I'm not sure if I entirely follow your question. For starters, if you're looking to unzip files using Ruby, check out this question. Once you've got the file unzipped to a readable format, you can try something along these lines to print to the two separate outputs:
cdi_output = File.open("cdiout.txt", "a") # Open an output file for CDI
cho_output = File.open("choout.txt", "a") # Open an output file for CHO
File.open("text.txt", "r") do |f| # Open the input file
while line = f.gets # Read each line in the input
cdi_output.puts line if /^CDI/ =~ line # Print if line starts with CDI
cho_output.puts line if /^CHO/ =~ line # Print if line starts with CHO
end
end
cdi_output.close # Close cdi_output file
cho_output.close # Close cho_output file

Resources