For testing purposes, I'd like to serialize an axlsx spreadsheet to a string. The axlsx documentation indicates it is possible to "Output to file or StringIO". But I haven't found documentation or a code sample that explains how to output to a StringIO. How is it done?
From the code:
# Serialize to a stream
s = package.to_stream()
File.open('example_streamed.xlsx', 'w') { |f| f.write(s.read) }
In the end, an [xlsx] file is zip archive containing multiple xml files and other assets. You can use Package#to_stream to generate an IO stream for streaming purposes, but viewing that archive as a string is probably not what you are looking to do.
If you are just looking to investigate the xml for a specific Worksheet, you can use Worksheet#to_xml_string which will return a String object with all the goodies in there. (That is how worksheet validation works, we parse that XML and validate it against the schema for the object)
Hope this help!
Related
I'm grabbing image data using the Request module. The data that comes back looks like interpreted binary data like so:
`����JFIF��>CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), default quality
��C
$.' ",#(7),01444'9=82<.342��C
2!!
I have tried saving using:
image = open("test.jpg", "wb")
image.write(image_data)
image.close()
But that complains that it needs a bytes-like object. I have tried doing result.text.encode() with various formats like "utf-8" etc but the resulting image file cannot be opened. I have also tried doing bytes(result.text, "utf-8") and bytearray(result.text, "utf-8") and same problem. I think those are all roughly equivalent, anyway. Can someone help me convert this to a bytes-like object without destroying the data?
Also, my headers in the request is 'image/jpeg' but it still sends me the data as a string.
Thanks!
Use the content field instead of text:
import requests
r = requests.get('https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png')
with open('test.png', 'wb') as file:
file.write(r.content)
See: https://requests.readthedocs.io/en/master/user/quickstart/#binary-response-content
I am using prawnpdf/pdf-inspector to test that content of a PDF generated in my Rails app is correct.
I would want to check that the PDF file contains a link with certain URL. I looked at yob/pdf-reader but haven't found any useful information related to this topic
Is it possible to test URLs within PDF with Ruby/RSpec?
I would want the following:
expect(urls_in_pdf(pdf)).to include 'https://example.com/users/1'
The https://github.com/yob/pdf-reader contains a method for each page called text.
Do something like
pdf = PDF::Reader.new("tmp/pdf.pdf")
assert pdf.pages[0].text.include? 'https://example.com/users/1'
assuming what you are looking for is at the first page
Since pdf-inspector seems only to return text, you could try to use the pdf-reader directly (pdf-inspector uses it anyways).
reader = PDF::Reader.new("somefile.pdf")
reader.pages.each do |page|
puts page.raw_content # This should also give you the link
end
Anyway I only did a quick look at the github page. I am not sure what raw_content exactly returns. But there is also a low-level method to directly access the objects of the pdf:
reader = PDF::Reader.new("somefile.pdf")
puts reader.objects.inspect
With that it surely is possible to get the url.
I am building a command line app that will generate metadata files amongst other things. I have a series of values that I want included, and I would like to insert those values into json format and than write it to a .txt file.
The complicated part (to me at least) is some of the values are dynamic (i.e. they may change everytime a file is created), other parts of the json file will need to be static. Is there any sort of templating that may help with this? (json erb)
If I were to use a json erb template, how would I write the result of the template (after it has been populated) to a txt file since this is not a rails app and I thus would not be calling the view.
Thank you in advance for any help.
It seems like two things could be helpful to you, but your question is pretty open ended ...
First, if your json templates are complex (static and dynamic parts?) I suggest you look at a tool like RABL ...
https://github.com/nesquena/rabl
There is a railscast on RABL here:
http://railscasts.com/episodes/322-rabl
RABL lets you create templates for generating custom JSON output.
Regarding writing to a file, you may or may not need to call the controller first. But the flow would be something like:
#sample_controller.rb
require 'json'
def get_sample
#x = {:a => "apple", :b => "baker"}
render json: #x
end
You can call the controller and get the rendered json.
z = get_sample
File.open(yourfile, 'w') { |file| file.write(z) }
I'm trying to write some unit tests which involves Roo reading Excel 2007 files. I have the Excel file in my unit test file as a hex string, which in turn is fed into a StringIO instance. I can't simply pass the StringIO object to Roo::Spreadsheet.open, since that function actually checks if the passed object is a File instance:
def open(file, options = {})
file = File === file ? file.path : file
# ...
and if it isn't, proceeds to assume it's a string. Manually specifying the extension doesn't help:
doc = Roo::Spreadsheet.open(file, extension: :xlsx)
Are there any clever ways of getting Roo to use the StringIO instance as a file?
It looks like this version of roo has support for this. Instead of checking explicitly if it's a File class, it checks in duck-typing style if it's a stream based on whether it responds to #seek. The relevant code is here and here.
My company has data messages (json) stored in gzipped files on Amazon S3. I want to use Ruby to iterate through the files and do some analytics. I started to use the 'aws/s3' gem, and get get each file as an object:
#<AWS::S3::S3Object:0x4xxx4760 '/my.company.archive/data/msg/20131030093336.json.gz'>
But once I have this object, I do not know how to unzip it or even access the data inside of it.
You can see the documentation for S3Object here: http://amazon.rubyforge.org/doc/classes/AWS/S3/S3Object.html.
You can fetch the content by calling your_object.value; see if you can get that far. Then it should be a question of unpacking the gzip blob. Zlib should be able to handle that.
I'm not sure if .value returns you a big string of binary data or an IO object. If it's a string, you can wrap it in a StringIO object to pass it to Zlib::GzipReader.new, e.g.
json_data = Zlib::GzipReader.new(StringIO.new(your_object.value)).read
S3Object has a stream method, which I would hope behaves like a IO object (I can't test that here, sorry). If so, you could do this:
json_data = Zlib::GzipReader.new(your_object.stream).read
Once you have the unzipped json content, you can just call JSON.parse on it, e.g.
JSON.parse Zlib::GzipReader.new(StringIO.new(your_object.value)).read
For me the below set of steps worked:
Step to read and write the csv.gz from S3 client to local file
Open the local csv.gz file using gzipreader and read csv from it
file_path = "/tmp/gz/x.csv.gz"
File.open(file_path, mode="wb") do |f|
s3_client.get_object(bucket: bucket, key: key) do |gzfiledata|
f.write gzfiledata
end
end
data = []
Zlib::GzipReader.open(file_path) do |gz_reader|
csv_reader = ::FastestCSV.new(gz_reader)
csv_reader.each do |csv|
data << csv
end
end
The S3Object documentation is updated and the stream method is no longer available: https://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
So, the best way to read data from an S3 object would be this:
json_data = Zlib::GzipReader.new(StringIO.new(your_object.read)).read