Perl - Convert MSSQL Image string to JPEG/PNG using Perl - image

Without having to use Image::Magick;, is there a way to output to the local file system an MSSQL Image string as a JPEG/PNG file.
The following in C# works very well, struggling to find the equivalent in Perl.
string base64string = "/9j/4AAQSkZJRgABAQAAAQABAAD/4QB..."; # This string shortened otherwise it would not fit
byte[] blob = Convert.FromBase64String(base64string);
File.WriteAllBytes(#"C:\image.jpg", blob);
Thanks.

Looks like you just need decode_base64 on your string. Then write it to the file opened with > and run binmode on handler. So your data will not be corrupted by new line character conversion.

Related

ruby pandoc convert html string to docx temp file

Using the ruby pandoc gem, I'm trying to convert an html page as a string ("...") to a temp docx file, that users of the site can then download.
The documentation for pandoc ruby says to use:
PandocRuby.html("<h1>hello</h1>").to_latex
I assume this works for docx as well, although this is the output
PandocRuby.html("<h1>hello</h1>").to_docx
"PK\x03\x04\x14\x00\x02\x00\b\x00\xF8.\x7FF\x8E\r\x16\xD8]\x01\x00\x00$\x06\x00\x00\x13\x00\x00\x00[Content_Types].xml\xB5\x94\xCBN\xC30\x10E\xF7|E\xE4-J\xDC\xB2#\b5\xE9\x82\xC7\x12*Q>\xC0u&\xAD\x85_\xB2\xA7\xAF\xBFg\x92\xD0\b\xA1*\xA9h\xBB\x89\x94\xCC\xCC\xBD\xC7W\xE3L\xA6;\xA3\x93\r\x84\xA8\x9C\xCD\xD98\e\xB1\x04\xACt\xA5\xB2\xCB\x9C}\xCE_\xD3\a6-n&\xF3\xBD\x87\x98P\xAF\x8D9[!\xFAG\xCE\xA3\\\x81\x111s\x1E,U*\x17\x8C#z\rK\xEE\x85\xFC\x12K\xE0w\xetc...."
Just creating a new file
File.open(yourfile, 'w') { |file| file.write(docx_conversion) }
throws encoding errors, but I didn't think it would work because docx files are zipped doc and xml files.
Many thanks.
Try:
File.open(yourfile, 'wb') { |file| file.write(docx_conversion) }
Setting the file mode to wb tells Ruby that you will be writing unencoded binary data and sets the binary data encoding as the external encoding.

Ruby converting String to File for uploading to FTP

Currently we have a method that returns a string with a formatted CSV file.
string = EXPORT.tickets
We need to upload this csv file to a ftp server like so
ftp = Net::FTP.new(server, username, password)
ftp.putbinaryfile(string)
however, the string variable is obviously a string, and not a binary file as the putbinaryfile method expects. I see two ways to do this,
convert the string variable to a file first using File
convert the string directly to a file with something like StringIO
Do these seem like viable options? If so, how would I approach doing this, thanks in advance!
EDIT:
Since the putbinaryfile method is looking for a file path rather than an actual file, it looks like my best best will be to create a File from the string variable. Can anyone give an example of how this can be accomplished?
After talking to another developer, he gave me this solution which I found to be a better for my situation, since the file did not exist already. It skips writing the string to a Tempfile and uses StringIO to upload it directly. His solution:
The Net::FTP#putbinaryfile method takes the name of a file on the local filesystem to copy to the remote filesystem. Now, if your data is in a string (and wasn't read from a file on the filesystem) then you may want to use Net::FTP#storbinary instead:
require 'stringio'
require 'net/ftp'
BLOCKSIZE = 512
data = StringIO.new("Hello, world!\n")
hostname = "ftp.domain.tld"
username = "username"
password = "password"
remote_filename = "something.txt"
Net::FTP.open(hostname, username, password) do |ftp|
# ...other ftp commands here...
ftp.storbinary("STOR #{remote_filename}", data, BLOCKSIZE)
# ...any other ftp commands...
end
The above avoids writing data that's not on disk to disk, just so you can upload it somewhere. However, if the data is already in a file on disk, you might as well just fix your code to reference its filename instead.
Something like this should cover most of the bases:
require 'tempfile'
temp_file = Tempfile.new('for_you')
temp_file.write(string)
temp_file.close
ftp.putbinaryfile(temp_file)
temp_file.unlink
Using Tempfile relieves you from a lot of issues regarding unique filename, threadsafeness, etc. Garbage collection will ensure your file gets deleted, even if putbinaryfile raises an exception or similar perils.
The uploaded file will get a name like for_you.23423423.423.423.4, both locally and on the remote server. If you want it to have a specific name on the remote server like 'daily_log_upload', do this instead:
ftp.putbinaryfile(temp_file, 'daily_log_upload')
It will still have a unique name for the local temp file, but you don't care about that.

How to read gz files in Spark using wholeTextFiles

I have a folder which contains many small .gz files (compressed csv text files). I need to read them in my Spark job, but the thing is I need to do some processing based on info which is in the file name. Therefore, I did not use:
JavaRDD<<String>String> input = sc.textFile(...)
since to my understanding I do not have access to the file name this way. Instead, I used:
JavaPairRDD<<String>String,String> files_and_content = sc.wholeTextFiles(...);
because this way I get a pair of file name and the content.
However, it seems that this way, the input reader fails to read the text from the gz file, but rather reads the binary Gibberish.
So, I would like to know if I can set it to somehow read the text, or alternatively access the file name using sc.textFile(...)
You cannot read gzipped files with wholeTextFiles because it uses CombineFileInputFormat which cannot read gzipped files because they are not splittable (source proving it):
override def createRecordReader(
split: InputSplit,
context: TaskAttemptContext): RecordReader[String, String] = {
new CombineFileRecordReader[String, String](
split.asInstanceOf[CombineFileSplit],
context,
classOf[WholeTextFileRecordReader])
}
You may be able to use newAPIHadoopFile with wholefileinputformat (not built into hadoop but all over the internet) to get this to work correctly.
UPDATE 1: I don't think WholeFileInputFormat will work since it just gets the bytes of the file, meaning you may have to write your own class possibly extending WholeFileInputFormat to make sure it decompresses the bytes.
Another option would be to decompress the bytes yourself using GZipInputStream
UPDATE 2: If you have access to the directory name like in the OP's comment below you can get all the files like this.
Path path = new Path("");
FileSystem fileSystem = path.getFileSystem(new Configuration()); //just uses the default one
FileStatus [] fileStatuses = fileSystem.listStatus(path);
ArrayList<Path> paths = new ArrayList<>();
for (FileStatus fileStatus : fileStatuses) paths.add(fileStatus.getPath());
I faced the same issue while using spark to connect to S3.
My File was a gzip csv with no extension .
JavaPairRDD<String, String> fileNameContentsRDD = javaSparkContext.wholeTextFiles(logFile);
This approach returned currupted values
I solved it by using the the below code :
JavaPairRDD<String, String> fileNameContentsRDD = javaSparkContext.wholeTextFiles(logFile+".gz");
By adding .gz to the S3 URL , spark automatically picked the file and read it like gz file .(Seems a wrong approach but solved my problem .

Convert a PDF to .txt gives me an empty .txt file

Hi I'm trying to read a pdf in Ruby, first of all I want to convert it into a txt. path is the path to the PDF, The point is that I get a .txt file empty, and as someone told me is a pdftotext problem, but I don't know how to fix it.
spec = path.sub(/\.pdf$/, '')
`pdftotext #{spec}.pdf`
file = File.new("#{spec}.txt", "w+")
text = []
file.readlines.each do |l|
if l.length > 0
text << l
Rails.logger.info l
end
end
file.close
What's wrong with my code? Thanks!
It's not possible to extract text from every PDF. Some PDF files use a font encoding that makes it impossible to extract text with simple tools such as pdftotext (and some PDF files are even completely immune to direct text extraction with any tool known to me -- in these cases you'll have to apply OCR first to have a chance to extract text...).
So if you test your code with the same "weird" PDF file all the time, it may well happen that you're getting frustrated over your code while in reality the fault lies with the PDF.
First make sure that the commandline usage of pdftotxt works well with a given PDF, then test (and develop further) your code with that PDF.
The problem is you are opening the file in write ("w") mode, whuch truncates the file. You can see a table of file modes and what they mean at http://ruby-doc.org/core-1.9.3/IO.html.
Try something like this, it uses a pdftotext option to send the text to stdout to avoid creating a temporary file and uses blocks for more idiomatic ruby.
text = `pdftotext #{path} -`
text.split.select { |line|
line.length > 0
}.each { |line|
Rails.logger.info(line)
}
You would need to open the txt file with write permission.
file = File.new("#{spec}.txt", "w")
You could consult How to create a file in Ruby
Update: your code is not complete and looks buggy.
Cant say what is path
Looks like you are trying to read the text file to which you intend to write file.readlines.each
spell check length you have it l.lenght
You may want to paste the actual code.
Check this gist https://gist.github.com/4160587
As mentioned, your code is not working because you are reading and writing to the same file.
Example
Ruby code file_write.rb to do the file write operation
pdf_file = File.open("in.txt")
output_file = File.open("out.txt", "w") # file to which you want to write
#iterate over input file and write the content to output file
pdf_file.readlines.each do |l|
output_file.puts(l)
end
output_file.close
pdf_file.close
Sample txt file in.txt
Some text in file
Another line of text
1. Line 1
2. Not really line 2
Once your run file_write.rb you should see new file called out.txt with same content as in.txt You could change the content of input file if you want. In your case you would use pdf reader to get the content and write it to the text file. Basically first line of the code will change.

How to write a string to a text file without deleting/replacing any word in the original file?

In Cocoa, how can I write a string to a text file without replacing the contents of the file, like writing at the end of the file?
For example the following code:
BOOL written = [data writeToFile:[path stringByAppendingPathComponent:#"conf.txt"] options:NSAtomicWrite error:&error];
The data (string) was written to the text file however it replaced the original contents of the file.
Any suggestions?
Use an NSFileHandle.
First call -[NSFileHandle seekToEndOfFile] to seek to the end of the file.
Then use -[NSFileHandle writeData:] (instead of -[NSData writeToFile:]) to append your data to the end of the file.

Resources