What might explain an "invalid stored block lengths" error? - ruby

I am running a Rails (3.2.3) application with Ruby 1.9.3p194 on the basic Ubuntu lucid32 image in a Vagrant virtual box. The virtual box is running on Leopard, for what it's worth. I'm trying to use rubyzip in the application to decompress a zip archive - 2009_da_lmp.zip. Using code directly from examples in the rubyzip repository, I can confirm that I can list the archive file contents:
#f is the absolute path to 2009_da_lmp.zip (string)
Zip::ZipFile.open(f) { |zf| zf.entries[0] }
=> 20090101_da_lmp.csv #that is indeed a file in the archive.
Using some more code from the examples in the repository, I try to get at an actual file in the archive:
Zip::ZipInputStream.open(f) { |zis|
entry = zis.get_next_entry
print "first line of '#{entry.name}' (#{entry.size} bytes: ) "
puts "'#{zis.gets.chomp}'" }
=> first line of '20090101_da_lmp.csv' (826610 bytes: ) Zlib::DataError:
invalid stored block lengths #and a long stack trace I can provide
#if that might help
The Mac OS decompression utility unzips the archive fine. I was wondering if it was some kind of encoding-related thing (my locale is set to en_US.UTF-8 because to make using PostgreSQL in dev less painful), but I don't know how to tell if that's the case. I can't find any information on what might cause this error.

This is a typical error found when feeding random data to an inflater. In fact you will get this error about 1/4 of the time from random data (when the low three bits of the first byte are 000 or 001). So I would guess that the inflation is simply starting at the wrong byte for some reason.

Related

Is there a way to change the working directory of fiddle?

I'm trying to load a C shared library within Ruby using Fiddle.
Here is a minimal example:
require 'fiddle'
require 'fiddle/import'
module Era
extend Fiddle::Importer
dlload './ServerApi.so'
extern 'int era_init_lib()'
extern 'void era_deinit_lib()'
extern 'int era_process_request(const char* request, char** response)'
extern 'void era_free(char* response)'
end
Era.era_init_lib
begin
# ...
ensure
Era.era_deinit_lib
end
The shared library loads without issues. However when I call Era.era_init_lib it tries to load additional libraries (Network.so and Protobuf.so). I have these file located in the current working directory (in the same directory as ServerApi.so).
However when I try to execute the code above I receive the following error:
! Failed to load library: /home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so, error: /home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so: cannot open shared object file: No such file or directory
If I place the file at the location the error describes everything works fine.
My guess is that the C working directory of fiddle is different from the Ruby working directory. I would like to keep the project files within the project and not in the Ruby installation directory.
How can I use Network.so from my project folder?
All the *.so files are provided by a third-party. I do not have the source and as a result cannot change these files. The function signatures are provided by the documentation.
Searching for Network.so in the strace gives me these results:
readlink("/proc/self/exe", "/home/username/.rvm/rubies/ruby-2."..., 4096) = 44
openat(AT_FDCWD, "/home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
futex(0x7fcc16666d90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fcc16b44520, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "! Failed to load library: ", 26! Failed to load library: ) = 26
write(2, "/home/username/.rvm/rubies/ruby-2."..., 50/home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so) = 50
write(2, ", error: ", 9, error: ) = 9
write(2, "/home/username/.rvm/rubies/ruby-2."..., 109/home/username/.rvm/rubies/ruby-2.6.5/bin/Network.so: cannot open shared object file: No such file or directory) = 109
write(2, "\n", 1) = 1
I've also written a C script which does the same thing which works perfectly fine when the files are dropped into the same directory. So it might be the fault of the library, which I assume checks the location of the current running program, then tries to load the library from that folder. This would explain the behavior when ran as a Ruby script (since it runs as part of the Ruby program), whereas a C binary runs standalone.
For those that want to re-create the (Linux) issue. You can download the necessary files from here. Which gives you the server-linux-x86_64.sh file.
Supported distros are: Suse, Ubuntu, Debian, Red Hat and CentOS but others may also work fine.
You can either run the installer, which should place the files in /opt/eset/RemoteAdministrator/Server. Or, assuming most of you don't want to install the full application you can run the following command:
sed '1,/^# Start of TAR\.GZ file #$/d' server-linux-x86_64.sh | sed '1d' > server-linux-x86_64.tar.gz
Which removes all the installer instructions from the .sh file and only leaves the binary .tar.gz data, writing it to server-linux-x86_64.tar.gz.
Copy the files ServerApi.so, Protobuf.so and Network.so into a directory of your liking. Create a Ruby script (with the question code) in the same directory and run the script.
Because ServerApi.so checks /proc/self/exe for the location of all subsequent files to load, and it is very difficult to modify this target by normal means, it is easier to just modify ServerApi.so itself so that it uses something else besides proc for the source.
If we run strings ServerApi.so, we can verify that the location to check is stored inside a string in ServerApi.so:
strings ServerApi.so | grep 'proc/self/exe'
B/proc/self/exe
So now all we need to do is modify this string to something else that works for us.
The easiest way to modify the string is to replace it with something that is exactly the same length as the original. This way we do not have to worry about changing the end-of-string zero padding or accidentally changing the total size of ServerApi.so.
Here we can see a suitable candidate could be /tmp/scriptexe:
/proc/self/exe
/tmp/scriptexe <- same length
So let's do that:
sed -e 's/proc\/self\/exe/tmp\/scriptexe/' ServerApi.so > ServerApi_Mod.so
Now we can verify the change:
strings ServerApi_Mod.so | grep scriptexe
B/tmp/scriptexe
Next we need to create /tmp/scriptexe to actually point to our Ruby script:
ln -s /the/full/path/to/our/ruby/script.rb /tmp/scriptexe
Then we modify our script:
dlload './ServerApi_Mod.so
Now we can run it as normal:
ruby script.rb
And everything should work.
If we read the strace output we see that the library obtains the current executable location from /proc/self/exe, and then searches subsequent libraries from there.
/proc/self/exe is not easily modifiable, but by using a hard link to a Ruby executable in the current directory we can trick it to point to a new folder.
Problem is making a hard link requires root.
In any case, here is a self-contained solution (note that it will ask for root password the first time you run it, in order to create the hard link).
Put this at the top of your script:
# Obtain path to current executable
exe = File.readlink("/proc/self/exe")
# Check if we are running the hard-liked version
if !exe.match /localruby/
if !File.exist?('localruby')
# Create a hard link to the current Ruby exe using sudo
system("sudo ln #{exe} localruby")
end
puts "Restarting..."
# In order to prevent infinite busy loop in case of some mishap
sleep 1
# Rerun self using the hard-linked Ruby executable.
# This will make /proc/self/exe point to the hard-link, which then
# allows the ESET library to search for .so files in current folder.
exec('./localruby', File.expand_path(__FILE__))
end
require 'fiddle'
require 'fiddle/import'
# ...rest of your script goes here...
A simple solution without any extra Ruby code is to just create the hard link manually, and then always run the script with ./localruby myscript.rb, instead of using the normal ruby myscript.rb.

Why do I get 'Can't locate object method "init" via package "wlgmod::odt"' when I try to run wyd.pl in Cygwin?

I'm trying to run a Perl script called WyD using Cygwin on Windows. Cygwin is installed at C:\cygwin64, and WyD is installed at C:\wyd\wyd.pl. Both are in the Windows PATH environment variable as C:\cygwin64 and C:\wyd respectively.
When running WyD with bash/Mintty using:
wyd.pl -b -e -t -s 3 -o "OUTPUTTEDWORDLIST" "TARGETFOLDER"
...I get the following error:
Can't locate object method "init" via package "wlgmod::odt" (perhaps
you forgot to load "wlgmod::odt"?) at /cygdrive/c/WYD/wyd.pl line 284.
Sometimes wlgmod::odt is replaced with wlgmod::doc or any other document type, but running the script always generates that same basic error. A previous answer to this question recommended installing several dependencies, which turned out to be a mere copy-paste of an answer for Ubuntu systems, and didn't solve the error, so I've decided to start at the beginning and re-ask the question with more details. I also have all Perl packages in the Cygwin installer installed.
After everything I've tried and done to get this script working, I can personally think of two possible causes for the error. Think of these as a guide more than anything else.
The error above references line 284 in the wyd.pl script, so it's possible that something in that script is hardcoded so that it doesn't work with Cygwin/Windows, or just generally has a compatibility bug. I don't understand Perl, so I can't confirm this.
I notice that the installation of WyD at C:\wyd contains a folder called wlgmod, and that folder contains all the files that the above error seems to be looking for; doc.pm, html.pm, jpeg.pm, etc. If those files exist in that directory but bash is unable to find them, maybe it's due to the fact WyD needs to be run from within Cygwin itself. I've only recently thought about this possibility, and my knowledge of both Cygwin and WyD is too sparse to definitively know how both work. Is it even possible to run WyD from within the Cygwin folder? It's not a package so can't be installed as one, and therefore I'm not sure how that would work.
Here are the relevant sections of the script:
# Module hash containing module name and supported file extensions
# Multiple extensions are seperated using ';'
my %wlgmods = (
'wlgmod::strings', '', # only used with command-line switch
'wlgmod::plain' , '.txt', # used for all MIME text/plain as well
'wlgmod::html' , '.html;.htm;.php;.php3;.php4',
'wlgmod::doc' , '.doc',
'wlgmod::pdf' , '.pdf',
'wlgmod::mp3' , '.mp3',
'wlgmod::ppt' , '.ppt',
'wlgmod::jpeg' , '.jpeg;.jpg;.JPG;.JPEG',
'wlgmod::odt' , '.odt;.ods;.odp'
);
...
# Initialize possible modules
foreach(keys %wlgmods) {
eval("use $_;");
my $ret = $_->init(); # line 284
# If module failed, add errortext and remove from hash
if($ret) {
$retvals .= "$_: $ret\n";
delete $wlgmods{$_};
$ret = "";
}
}

Problems reading large JSON file in Ruby

I have problems reading a large JSON file (2.9GB) in Ruby. I am using this code
json_file = File.read(filename)
results = JSON.parse(json_file)
and when I try to read the file I get the error:
Errno::EINVAL: Invalid argument - <filename>
I have tested the same code with smaller files and it works fine. To verify that the file is written correctly I have tried to read it with python and it works.
Is there a limitation on the size of the file for JSON.parse? If so, could you recommend an alternative?
I have looked in the msgpack to reduce the size of the files, but unfortunately I am constraint by the fact that I cannot install gems.
This is a limitation of IO.read.
You may split your file into smaller parts (for example, 1 gigabyte) and read them separately:
dirname = File.dirname(filename)
`split -b 1024m #{filename} #{filename}.parts.`
Dir.chdir(dirname)
parts = Dir["#{filename}.parts.*"]
json = ''
parts.each do |partname|
json += File.read(partname)
File.delete(partname)
end
results = JSON.parse(json)
Be patient, this could take a while.

Error in nchar() when reading in stata file in R on Mac

I'm learning R and am simply trying to read in a stata data file but am getting the error below:
X <- Stata.file(Stata_File)
Error in nchar(varlabs) : invalid multibyte string 253
Multiple Mac users here are encountering this error with the program but it works fine on a PC. A google search of this error seems to say it has something to do with the R package but I can't find a solution. Any ideas? Thanks for your help!!
The R code up to the error point is below:
Root <- "/Users/Desktop/R_Training"
PathIn <- paste(Root,"Data/Example_0",sep="/")
# The 2007 Dominican Republic household member file (96 MB)
Stata_File <- "drpr51fl.dta"
# Load the memisc package:
library(memisc)
# Set the working directory:
setwd(PathIn)
# (1) Determine which variables we want:
# The Stata.file function (from memisc) reads the "header"
# of our Stata file so you can see what it contains
# and choose the variables you want.
X <- Stata.file(Stata_File)
**Error in nchar(varlabs) : invalid multibyte string 253**
Below is my session info:
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] memisc_0.95-33 MASS_7.3-13 lattice_0.19-30
This is what worked for me. You can force R to recognize every character by issuing the following command:
Sys.setlocale('LC_ALL','C')
Now run the previous command and all should be fine.
It seems like the encoding of strings in the file isn't what the program thinks it is...
I guess the file was generated on a PC? Does it contain non-ACII column names or data strings?
Since you seem to have UTF-8 encoding, and (US/western europe) PC:s typically have latin-1, that could be the problem. I'd expect the same problem on Linux then (also UTF-8).
Possible work-arounds:
Does the Stata.file method have an "encoding" option? Then you might try 'latin1' and hope for the best...
Another possibility is to start R with the --encoding=latin1 option.

Ruby: How to determine if file being read is binary or text

I am writing a program in Ruby which will search for strings in text files within a directory - similar to Grep.
I don't want it to attempt to search in binary files but I can't find a way in Ruby to determine whether a file is binary or text.
The program needs to work on both Windows and Linux.
If anyone could point me in the right direction that would be great.
Thanks,
Xanthalas
libmagic is a library which detects filetypes. For this solution I assume, that all mimetype's which start with text/ represent text files. Eveything else is a binary file. This assumption is not correct for all mime types (eg. application/x-latex, application/json), but libmagic detect's these as text/plain.
require "filemagic"
def binary?(filename)
begin
fm= FileMagic.new(FileMagic::MAGIC_MIME)
!(fm.file(filename)=~ /^text\//)
ensure
fm.close
end
end
gem install ptools
require 'ptools'
File.binary?(file)
An alternative to using the ruby-filemagic gem is to rely on the file command that ships with most Unix-like operating systems. I believe it uses the same libmagic library under the hood but you don't need the development files required to compile the ruby-filemagic gem. This is helpful if you're in an environment where it's a bit of work to install additional libraries (e.g. Heroku).
According to man file, text files will usually contain the word text in their description:
$ file Gemfile
Gemfile: ASCII text
You can run the file command through Ruby can capture the output:
require "open3"
def text_file?(filename)
file_type, status = Open3.capture2e("file", filename)
status.success? && file_type.include?("text")
end
Updating above answer with such example, when file name includes "text":
file /tmp/ball-texture.png
/tmp/ball-texture.png: PNG image data, 11 x 18, 8-bit/color RGBA, non-interlaced
So updated code will be like:
def text_file?(filename)
file_type, status = Open3.capture2e('file', filename)
status.success? && file_type.split(':').last.include?('text')
end

Resources