Currently we have a method that returns a string with a formatted CSV file.
string = EXPORT.tickets
We need to upload this csv file to a ftp server like so
ftp = Net::FTP.new(server, username, password)
ftp.putbinaryfile(string)
however, the string variable is obviously a string, and not a binary file as the putbinaryfile method expects. I see two ways to do this,
convert the string variable to a file first using File
convert the string directly to a file with something like StringIO
Do these seem like viable options? If so, how would I approach doing this, thanks in advance!
EDIT:
Since the putbinaryfile method is looking for a file path rather than an actual file, it looks like my best best will be to create a File from the string variable. Can anyone give an example of how this can be accomplished?
After talking to another developer, he gave me this solution which I found to be a better for my situation, since the file did not exist already. It skips writing the string to a Tempfile and uses StringIO to upload it directly. His solution:
The Net::FTP#putbinaryfile method takes the name of a file on the local filesystem to copy to the remote filesystem. Now, if your data is in a string (and wasn't read from a file on the filesystem) then you may want to use Net::FTP#storbinary instead:
require 'stringio'
require 'net/ftp'
BLOCKSIZE = 512
data = StringIO.new("Hello, world!\n")
hostname = "ftp.domain.tld"
username = "username"
password = "password"
remote_filename = "something.txt"
Net::FTP.open(hostname, username, password) do |ftp|
# ...other ftp commands here...
ftp.storbinary("STOR #{remote_filename}", data, BLOCKSIZE)
# ...any other ftp commands...
end
The above avoids writing data that's not on disk to disk, just so you can upload it somewhere. However, if the data is already in a file on disk, you might as well just fix your code to reference its filename instead.
Something like this should cover most of the bases:
require 'tempfile'
temp_file = Tempfile.new('for_you')
temp_file.write(string)
temp_file.close
ftp.putbinaryfile(temp_file)
temp_file.unlink
Using Tempfile relieves you from a lot of issues regarding unique filename, threadsafeness, etc. Garbage collection will ensure your file gets deleted, even if putbinaryfile raises an exception or similar perils.
The uploaded file will get a name like for_you.23423423.423.423.4, both locally and on the remote server. If you want it to have a specific name on the remote server like 'daily_log_upload', do this instead:
ftp.putbinaryfile(temp_file, 'daily_log_upload')
It will still have a unique name for the local temp file, but you don't care about that.
Related
So, I have a function that creates an object specifying user data. Then, using the Ruby YAML gem and some code, I put the object to a YAML file and save it. This saves the YAML file to the location where the Ruby script was run from. How can I tell it to save to a certain file directory? (A simplified version of) my code is this
print "Please tell me your name: "
$name=gets.chomp
$name.capitalize!
print "Please type in a four-digit PIN number: "
$pin=gets.chomp
I also have a function that enforces that the pin be a four-digit integer, but that is not important.
Then, I add this to an object
new_user=Hash.new (false)
new_user["name"]=$name
new_user["pin"]=$pin
and then add it to a YAML file and save it. If the YAML file doesn't exist, one is created. It creates it in the same file directory as the script is run in. Is there a way to change the save location?
The script fo save the object to a YAML file is this.
def put_to_yaml (new_user)
File.write("#{new_user["name"]}.yaml", new_user.to_yaml)
end
put_to_yaml(new_user)
Ultimately, the question is this: How can I change the save location of the file? And when I load it again, how can i tell it where to get the file from?
Thanks for any help
Currently when you use File.write it takes your current working directory, and appends the file name to that location. Try:
puts Dir.pwd # Will print the location you ran ruby script from.
You can specify the absolute path if you want to write it in a specific location everytime:
File.write("/home/chameleon/different_location/#{new_user["name"]}.yaml")
Or you can specify a relative path to your current working directory:
# write one level above your current working directory
File.write("../#{new_user["name"]}.yaml", new_user.to_yaml)
You can also specify relative to your current executing ruby file:
file_path = File.expand_path(File.dirname(__FILE__))
absolute_path = File.join(file_path, file_name)
File.write(absolute_path, new_user.to_yaml)
You are supplying a partial pathname (a mere file name), so we read and write from the current directory. Thus you have two choices:
Supply a full absolute pathname (personally, I like to use the Pathname class for this); or
Change the current directory first (with Dir.chdir)
I have a tar.gz file saved on disk and I want to leave it packed there, but I need to open one file within the archive, read from it and save some information somewhere.
File structure:
base_folder
file_i_need.txt
other_folder
other_file
code (it is not much - I tried 10mio different ways and this is what is left)
def self.open_file(file)
uncompressed_file = Gem::Package::TarReader.new(Zlib::GzipReader.open(file))
uncompressed_file.rewind
end
When I run it in a console I get
<Gem::Package::TarReader:0x007fbaac178090>
and I can run commands on the entries. I just haven't figured out how to open an entry and read from it without saving it unpacked to disk. I mainly need the string from the text file.
Any help appreciated. I might just be missing something...
TarReader is Enumerable, returning Entry.
That said, to retrieve the text content from the file by it’s name one might
uncompressed = Gem::Package::TarReader.new(Zlib::GzipReader.open(file))
text = uncompressed.detect do |f|
f.fullname == 'base_folder/file_i_need.txt'
end.read
#⇒ Hello, I’m content of the text file, located inside gzipped tar
Hope it helps.
I need to run some shell commands on a number of files and sometimes I get back more than one file in response. The question is: How can I read back several files from IO.popen in Ruby?
For instance, imagine the following case:
file = grid.get(record['_id']) # fetch a file from database
IO.popen('tar -Oxmz', 'ab') {|pipe| pipe.write(file.read)} # pass to tar and extract
This necessitates that I reread all the extracted files from the filesystem. I figured out this is the speed bottleneck of my script and I wonder if I can accomplish the same task in-memroy. I tried the following:
file = grid.get(record['_id'])
IO.popen('tar -Oxmz', 'w+b') do |pipe|
pipe.write(file.read)
pipe.close_write
output = pipe.read
end
It works, but I get the whole response, here including several extracted files, in one piece (in variable output). I need the files separate from each other and possibly with their names. Is there any way to do this?
By the way, the resulting files are most of the time text, but sometimes binary. Running a pipe for each output file is not a solution, because the actual overhead of running the commands for each file outweights the benefits of doing the transformation in-memory.
P.S. The actual use case does not rely on tar only. I use software that do not have Ruby wrappers.
I'm having troubles trying to download word documents from a dropbox using an APP controlled by a ruby program. (I would like to have the ability to download any file from a dropbox).
The code they provide is great for "downloading" a .txt file, but if you try using the same code to download a .docx file, the "downloaded" file won't open in word due to "corruption."
The code I'm using:
contents = #client.get_file(path + filename)
open(filename, 'w') {|f| f.puts contents }
For variable examples, path could be '/', and filename could be 'aFile.docx'. This works, but the file, aFile.docx, that is created can not be opened. I am aware that this is simply grabbing the contents of the file and then creating a new file and inserting the contents.
Try this:
open(filename, 'wb') { |f| f.write contents }
Two changes from your code:
I used the file mode wb to specify that I'm going to write binary data. I don't think this makes a difference on Linux and OS X, but it matters on Windows.
I used write instead of puts. I believe puts expects a string, while you're trying to write arbitrary binary data. I assume this is the source of the "corruption."
I have a folder which contains many small .gz files (compressed csv text files). I need to read them in my Spark job, but the thing is I need to do some processing based on info which is in the file name. Therefore, I did not use:
JavaRDD<<String>String> input = sc.textFile(...)
since to my understanding I do not have access to the file name this way. Instead, I used:
JavaPairRDD<<String>String,String> files_and_content = sc.wholeTextFiles(...);
because this way I get a pair of file name and the content.
However, it seems that this way, the input reader fails to read the text from the gz file, but rather reads the binary Gibberish.
So, I would like to know if I can set it to somehow read the text, or alternatively access the file name using sc.textFile(...)
You cannot read gzipped files with wholeTextFiles because it uses CombineFileInputFormat which cannot read gzipped files because they are not splittable (source proving it):
override def createRecordReader(
split: InputSplit,
context: TaskAttemptContext): RecordReader[String, String] = {
new CombineFileRecordReader[String, String](
split.asInstanceOf[CombineFileSplit],
context,
classOf[WholeTextFileRecordReader])
}
You may be able to use newAPIHadoopFile with wholefileinputformat (not built into hadoop but all over the internet) to get this to work correctly.
UPDATE 1: I don't think WholeFileInputFormat will work since it just gets the bytes of the file, meaning you may have to write your own class possibly extending WholeFileInputFormat to make sure it decompresses the bytes.
Another option would be to decompress the bytes yourself using GZipInputStream
UPDATE 2: If you have access to the directory name like in the OP's comment below you can get all the files like this.
Path path = new Path("");
FileSystem fileSystem = path.getFileSystem(new Configuration()); //just uses the default one
FileStatus [] fileStatuses = fileSystem.listStatus(path);
ArrayList<Path> paths = new ArrayList<>();
for (FileStatus fileStatus : fileStatuses) paths.add(fileStatus.getPath());
I faced the same issue while using spark to connect to S3.
My File was a gzip csv with no extension .
JavaPairRDD<String, String> fileNameContentsRDD = javaSparkContext.wholeTextFiles(logFile);
This approach returned currupted values
I solved it by using the the below code :
JavaPairRDD<String, String> fileNameContentsRDD = javaSparkContext.wholeTextFiles(logFile+".gz");
By adding .gz to the S3 URL , spark automatically picked the file and read it like gz file .(Seems a wrong approach but solved my problem .