How can I insert a string in an IO stream in Ruby? - ruby

I'm trying to write a Ruby script that will tweak a SQL dump (taken from pg_dump) so it can set up a table cleanly.
So far it's been all good the way I've set it up; I've been able to File.read the file, insert a word, append some stuff to the end, and File.write the file again.
However, I'm now working with a dump that's nearly 7 GB, and it won't cope (File.read is raising EINVAL errors, and there's no trouble with the filename). So I want to use a single stream to find the right spot to insert that word, and then jump to the end and append the extra stuff.
But I can't insert that word. I want to change
DROP TABLE public.programmes;
SET search_path = public, pg_catalog;
to
DROP TABLE public.programmes CASCADE;
SET search_path = public, pg_catalog;
but using file_stream.puts (#write and #<< aren't any better), I end up overwriting part of the following line:
DROP TABLE public.programmes CASCADE;
ch_path = public, pg_catalog;
... and I'd rather not have to loop (read eight characters, seek back eight characters, write previous eight characters) all the way to the end of the file, 7 GB away.
(I might be okay with doing it back to the start of the file – that's only 460 B – but I'd still have to know how to insert some characters at the start.)
Is there a way to do this?

Since the place where CASCADE needed to go was so close to the start, I ended up writing eight characters to the file first, then appending the results of pg_restore. Then I could loop through the file stream from the start and drop the string into place...
# Could be any eight characters, but these are a valid SQL comment in case it fails
File.write(path, "-------\n")
system("pg_restore #{pgr_options} >> #{path}")
File.open(path, 'r+') do |stream|
content = ''
stream.pos = 8
# The semicolon is needed to delimit the table name
content << stream.getc until content =~ /(DROP TABLE public.[a-z_]*);/
stream.rewind
stream << content[0..-2] << ' CASCADE;'
... before jumping to the end and appending stuff.
stream.seek 0, :END
stream.puts "ALTER TABLE ONLY blah blah blah..."
end

Related

Call Incremental Datasets Created by Macro Function

I have a macro variable called max_attempts I created from a a PROC SQL that equals 4 for my current datafile. Then, I used a macro function to create datasets up to max_attempts so now I have attempt1_table, attempt2_table, attempt3_table, and attempt4_table. Now I'm having trouble merging the 4 datasets.
data final_table;
set attempt1_table - attempt&max_attempts._table;
run;
The inputted datafile will have a different max_n each time, so I'm using macros to account for that.
The - shortcut only works if the number is at the end of the dataset name. Rename your datasets to be round_table1, round_table2, etc.:
data final_table;
set round_table1 - round_table&max_n.;
run;
Use the trimmed option of the into :macrovar clause in order to remove the leading spaces that cause set attempt1_table - attempt&max_attempts._table; to resolve into erroneous syntax.
Example:
proc sql noprint;
select <computation-for-max-attempts>
into :max_attempts trimmed /* removes leading spaces when column is numeric */
from ...
;
quit;
Thank you everyone for your help! It was two issues, the number has to be at the end of the dataset name when using the - shortcut and using trimmed to remove leading spaces.
proc sql feedback;
select max(max_attempts)
into: max_attempts trimmed
from analysis_data;
quit;
data analysis_table;
set unknown_table attempt_table1 - attempt_table&max_attempts;
run;

Ruby + Prawn PDF: How to add space prefix in string inside of a table?

I have a two dimensional array:
line_items = []
line_item.product.book_versions.each do |book_version|
data = []
data << ""
data << " #{book_version.book.title} - #{book_version.isbn}" #<-- notice the extra spaces in the beginning of the string
data << "#{line_item.quantity}"
line_items << data
end
And I load this data into my table with pdf.table line_items ... do ... end
However, the extra spaces in my 2nd column don't show up. How would I escape these spaces so that they aren't stripped?
Depending on what you want to do, you can also use the constant Prawn::Text::NBSP. If it is purely blank space, then the column padding is what you want. However, I had a situation where I had to simulate a "checkmark space" such that an X character was underlined. My table looked like:
table([["<u>X</u>", "Do you agree to these terms and conditions?"]]) do
columns(0).style(:inline_format => true)
end
However, this produced a simple X with an underline. I wanted to underlined section to be wider, i.e., space (blank) characters that still received an underline. So I changed the table data to be
table([["<u>#{Prawn::Text::NBSP*3} X #{Prawn::Text::NBSP*3}</u>", ...]]) do
Then in the PDF, it looked like I wanted: ___X___ (with the X obviously underlined too).
You can do a hack with a character that prawns does not understand
I was able to do it with adding: ⁔ character before the text.
Prawns did not understand it, and ended up giving me a space :)
Your best bet will probably be to use a custom padding for that column. Something like this:
pdf.table line_items do
column(1).padding = [5, 5, 5, 30] # Default padding is 5; just increase the left padding
...
end
I don't think it is an escaping problem, maybe you should use a more formal way for spacing, try using the \t character instead of spaces. It is intended for that use.
line_items = []
line_item.product.book_versions.each do |book_version|
data = []
data << ""
data << "\t\t#{book_version.book.title} - #{book_version.isbn}"
data << "#{line_item.quantity}"
line_items << data
end

Ruby: How do you search for a substring, and increment a value within it?

I am trying to change a file by finding this string:
<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]>
and replacing {CLONEINCR} with an incrementing number. Here's what I have so far:
file = File.open('input3400.txt' , 'rb')
contents = file.read.lines.to_a
contents.each_index do |i|contents.join["<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>"] = "<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>" end
file.close
But this seems to go on forever - do I have an infinite loop somewhere?
Note: my text file is 533,952 lines long.
You are repeatedly concatenating all the elements of contents, making a substitution, and throwing away the result. This is happening once for each line, so no wonder it is taking a long time.
The easiest solution would be to read the entire file into a single string and use gsub on that to modify the contents. In your example you are inserting the (zero-based) file line numbers into the CDATA. I suspect this is a mistake.
This code replaces all occurrences of <![CDATA[{CLONEINCR}]]> with <![CDATA[1]]>, <![CDATA[2]]> etc. with the number incrementing for each matching CDATA found. The modified file is sent to STDOUT. Hopefully that is what you need.
File.open('input3400.txt' , 'r') do |f|
i = 0
contents = f.read.gsub('<![CDATA[{CLONEINCR}]]>') { |m|
m.sub('{CLONEINCR}', (i += 1).to_s)
}
puts contents
end
If what you want is to replace CLONEINCR with the line number, which is what your above code looks like it's trying to do, then this will work. Otherwise see Borodin's answer.
output = File.readlines('input3400.txt').map.with_index do |line, i|
line.gsub "<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>",
"<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>"
end
File.write('input3400.txt', output.join(''))
Also, you should be aware that when you read the lines into contents, you are creating a String distinct from the file. You can't operate on the file directly. Instead you have to create a new String that contains what you want and then overwrite the original file.

Escaping Strings For Ruby SQLite Insert

I'm creating a Ruby script to import a tab-delimited text file of about 150k lines into SQLite. Here it is so far:
require 'sqlite3'
file = File.new("/Users/michael/catalog.txt")
string = []
# Escape single quotes, remove newline, split on tabs,
# wrap each item in quotes, and join with commas
def prepare_for_insert(s)
s.gsub(/'/,"\\\\'").chomp.split(/\t/).map {|str| "'#{str}'"}.join(", ")
end
file.each_line do |line|
string << prepare_for_insert(line)
end
database = SQLite3::Database.new("/Users/michael/catalog.db")
# Insert each string into the database
string.each do |str|
database.execute( "INSERT INTO CATALOG VALUES (#{str})")
end
The script errors out on the first line containing a single quote in spite of the gsub to escape single quotes in my prepare_for_insert method:
/Users/michael/.rvm/gems/ruby-1.9.3-p0/gems/sqlite3-1.3.5/lib/sqlite3/database.rb:91:
in `initialize': near "s": syntax error (SQLite3::SQLException)
It's erroring out on line 15. If I inspect that line with puts string[14], I can see where it's showing the error near "s". It looks like this: 'Touch the Top of the World: A Blind Man\'s Journey to Climb Farther Than the Eye Can See'
Looks like the single quote is escaped, so why am I still getting the error?
Don't do it like that at all, string interpolation and SQL tend to be a bad combination. Use a prepared statement instead and let the driver deal with quoting and escaping:
# Ditch the gsub in prepare_for_insert and...
db = SQLite3::Database.new('/Users/michael/catalog.db')
ins = db.prepare('insert into catalog (column_name) values (?)')
string.each { |s| ins.execute(s) }
You should replace column_name with the real column name of course; you don't have to specify the column names in an INSERT but you should always do it anyway. If you need to insert more columns then add more placeholders and arguments to ins.execute.
Using prepare and execute should be faster, safer, easier, and it won't make you feel like you're writing PHP in 1999.
Also, you should use the standard CSV parser to parse your tab-separated files, XSV formats aren't much fun to deal with (they're downright evil in fact) and you have better things to do with your time than deal with their nonsense and edge cases and what not.

problem with parsing string from excel file

i have ruby code to parse data in excel file using Parseexcel gem. I need to save 2 columns in that file into a Hash, here is my code:
worksheet.each { |row|
if row != nil
key = row.at(1).to_s.strip
value = row.at(0).to_s.strip
if !parts.has_key?(key) and key.length > 0
parts[key] = value
end
end
}
however it still save duplicate keys into the hash: "020098-10". I checked the excel file at the specified row and found the difference are " 020098-10" and "020098-10". the first one has a leading space while the second doesn't. I dont' understand is it true that .strip function already remove all leading and trailing white space?
also when i tried to print out key.length, it gave me these weird number:
020098-10 length 18
020098-10 length 17
which should be 9....
If you will inspect the strings you receive, you will probably get something like:
" \x000\x002\x000\x000\x009\x008\x00-\x001\x000\x00"
This happens because of the strings encoding. Excel works with unicode while ruby uses ISO-8859-1 by default. The encodings will differ on various platforms.
You need to convert the data you receive from excel to a printable encoding.
However when you should not encode strings created in ruby as you will end with garbage.
Consider this code:
#enc = Encoding::Converter.new("UTF-16LE", "UTF-8")
def convert(cell)
if cell.numeric
cell.value
else
#enc.convert(cell.value).strip
end
end
parts = {}
worksheet.each do |row|
continue unless row
key = convert row.at(1)
value = convert row.at(0)
parts[key] = value unless parts.has_key?(key) or key.empty?
end
You may want change the encodings to a different ones.
The newer Spreadsheet-gem handles charset conversion automatically for you, to UTF-8 I think as standard but you can change it, so I'd recommend using it instead.

Resources