In rhomobile, which is on ruby I have a parsing of file and saving to sqlite db such a code
Questions.delete_all()
file_name = File.join(Rho::RhoApplication::get_model_path('app','Settings'), 'questions.txt')
file = File.new(file_name)
file.each_line("\n") do |row|
col = row.split("|")
#question=Questions.create(
{"id" => col[0], "question" => col[1],"answered"=>'0',"show"=>'1',"tutorial"=>col[4]}
)
break if file.lineno > 1500
end
file.close
when in text in string there is single quote aka ' , for example an expression
It's funny
Then after parsing, saving and populating I get
It�s funny
Any idea how to solve this and where from it comes, from Ruby, From sqlite or from what? how to solve it?
I would check to make sure that your parsing isn't doing something funny. The Rhodes handles all of the necessary escaping in its ORM. I've never had any issues with quotes in the db.
Related
I have CSV file with some line like:
col1,col "two",col3
so i get Illegal quoting error and fix that by setting :quote_char => "\x00"
["col1", "col\"two\"", "col3"]
but there is a line like
col1,col2,"col,3"
later in that file
["col1", "col2", "\"col", "3\""]
then i read file line by line and call parse_csv wrapped in block. Set :quote_char => "\"", rescue CSV::MalformedCSVError exceptions and for that particular lines set :quote_char => "\x00" and retry
All works perfectly until we get line
col1,col "two","col,3"
in this case it rescues from exception, set :quote_char => "\x00" and result is
["col1", "col\"two\"", "\"col", "3\""]
Apple Numbers is able to openn that file absolutely correctly.
Is there are any setting for parse_csv to handle this without preprocess string in some way?
UPD i show CSV lines as it is in file and results (arrays) as it was printed by p. there are no actual \" in my strings.
This is an invalid csv file. If you have access to the source, you could (ask to) generate the data as follows:
col1,"col ""two""","col,3"
If not, the only option is to parse the data yourself:
pseudocode:
while(read_line) {
bool InsideQuotes = false
for each_char_in_line {
if(char == doublequote)
InsideQuotes = !InsideQuotes
if(char == ',' and !InsideQuotes)
// separator found - process field
}
}
This will also take care of escaped quotes like in col1,"col ""two""","col,3".
If the file contains multiline fields, some more work has to be done.
CSV is less a standard and more of a name that everyone thinks they're using to describe their quirky format correctly, and this is despite their being an RFC standard for CSV which is just another thing nobody pays attention to.
As such, a lot of programs that read CSV are very forgiving. Ruby's core CSV library is pretty good, but not as adaptable as others. That's because you've got Ruby there to get you out of a jam, and in Numbers you don't.
Try rewriting \" to "" which is conventional CSV formatting, as defined in the spec linked above:
CSV.parse(File.read.gsub(/\\"/, '""'))
Let's say I have a record in my database that has:
name: "World\u0092s Greatest Jet Fighter Pilot"
OK I need to get in there and clean out the \u0092 (there were a ton of these in the db). I can query like this:
# encoding: UTF-8
...
def self.by_partial name
return Movie.find(:all, :conditions => {:name => /^.*#{name}.*/i})
end
# console:
>> sel = Movie.by_partial(/Greatest/) and sel.size
=> 1
and get back the correct number of records. But when I throw in the unicode, it fails:
>> sel = Movie.by_partial(/\u0092/) and sel.size
=> 0
>> sel = Movie.by_partial(/\\u0092/) and sel.size
=> 0
>> sel = Movie.by_partial('\u0092') and sel.size
=> 0
>> sel = Movie.by_partial('\\u0092') and sel.size
=> 0
What do I need to do to be able to query for records that contain unicode characters? Is this a setting in the rails console? I managed to solve this by iterating the records and checking like so if mov.name =~ /\u0092/ ... but I can't figure out how to pass a unicode string into my mongoid selector. Iterating the records seemed way too brute force. Luckily I don't need to do this very often.
I don't think your problem is with Unicode, your problems are:
The string interpolation inside by_partial.
And \u only works inside double quoted strings.
Second things first:
> '\u0070'
=> "\\u0070"
> '\\u0070'
=> "\\u0070"
> "\u0070"
=> "p"
So Movie.by_partial("\u0092") should work.
Your first problem is that you're passing /\u0092/ (which does match the character in question) to by_partial but by_partial does this:
/^.*#{name}.*/i
And /^.*#{/\u0092/}.*/i and that ends up as /^.*(?-mix:\u0092).*/i. I'd guess that the MongoDB driver is having some issues translating that Ruby regex into a JavaScript regex.
The MongoDB driver doesn't seem to like \u in a regex at all. Feeding /^\u0070/ into MongoDB doesn't get me any matches but /^p/ does find what I'm expecting, /^#{"\u0070"}/ also works. I'm not sure what's going on in the guts of the MongoDB regex translator but we're not the only ones to come across this. I'd guess that the MongoDB regex translator doesn't understand \u so it ends up being converted to a raw \\u0092 and since you don't have that sequence of six characters in your database, you don't find anything.
I'm creating a Ruby script to import a tab-delimited text file of about 150k lines into SQLite. Here it is so far:
require 'sqlite3'
file = File.new("/Users/michael/catalog.txt")
string = []
# Escape single quotes, remove newline, split on tabs,
# wrap each item in quotes, and join with commas
def prepare_for_insert(s)
s.gsub(/'/,"\\\\'").chomp.split(/\t/).map {|str| "'#{str}'"}.join(", ")
end
file.each_line do |line|
string << prepare_for_insert(line)
end
database = SQLite3::Database.new("/Users/michael/catalog.db")
# Insert each string into the database
string.each do |str|
database.execute( "INSERT INTO CATALOG VALUES (#{str})")
end
The script errors out on the first line containing a single quote in spite of the gsub to escape single quotes in my prepare_for_insert method:
/Users/michael/.rvm/gems/ruby-1.9.3-p0/gems/sqlite3-1.3.5/lib/sqlite3/database.rb:91:
in `initialize': near "s": syntax error (SQLite3::SQLException)
It's erroring out on line 15. If I inspect that line with puts string[14], I can see where it's showing the error near "s". It looks like this: 'Touch the Top of the World: A Blind Man\'s Journey to Climb Farther Than the Eye Can See'
Looks like the single quote is escaped, so why am I still getting the error?
Don't do it like that at all, string interpolation and SQL tend to be a bad combination. Use a prepared statement instead and let the driver deal with quoting and escaping:
# Ditch the gsub in prepare_for_insert and...
db = SQLite3::Database.new('/Users/michael/catalog.db')
ins = db.prepare('insert into catalog (column_name) values (?)')
string.each { |s| ins.execute(s) }
You should replace column_name with the real column name of course; you don't have to specify the column names in an INSERT but you should always do it anyway. If you need to insert more columns then add more placeholders and arguments to ins.execute.
Using prepare and execute should be faster, safer, easier, and it won't make you feel like you're writing PHP in 1999.
Also, you should use the standard CSV parser to parse your tab-separated files, XSV formats aren't much fun to deal with (they're downright evil in fact) and you have better things to do with your time than deal with their nonsense and edge cases and what not.
I need to process a CSV file from FedEx.com containing shipping history. Unfortunately FedEx doesn't seem to actually test its CSV files as it doesn't quote strings that have commas in them.
For instance, a company name might be "Dog Widgets, Inc." but the CSV doesn't quote that string, so any CSV parser thinks that comma before "Inc." is the start of a new field.
Is there any way I can reliably parse those rows using Ruby?
The only differentiating characteristic that I can find is that the commas that are part of a string have a space after then. Commas that separate fields have no spaces. No clue how that helps me parse this, but it is something I noticed.
you can use a negative lookahead
>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]
Well, here's an idea: You could replace each instance of comma-followed-by-a-space with a unique character, then parse the CSV as usual, then go through the resulting rows and reverse the replace.
Perhaps something along these lines..
using gsub to change the ', ' to something else
ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
[0] "foo",
[1] "bar",
[2] "baz",
[3] "pop| blah",
[4] "foobar"
]
and then remove the | after words.
If you are so lucky as to only have one field like that, you can parse the leading fields off the start, the trailing fields off than end and assume whatever is left is the offending field. In python (no habla ruby) this would look something like:
fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]
Whatever you do, you should be able at a minimum determine the number of offending commas and that should give you something (a sanity check if nothing else).
I would like to store a mySQL query in a file. I plan on having to string replace parts of it with variables from my program.
I played around with the 'eval' method in ruby, and it works, but it feels a little clumsy.
Using irb I did the following.
>> val = 7
=> 7
>> myQuery = "select * from t where t.val = \#{val}" #escaped hash simulates reading it from file
=> "select * from t where t.val = \#{val}"
>> myQuery = eval "\"#{myQuery}\""
=> "select * from t where t.val = 7"
As you can see it works! But to make it work I had to wrap the 'myQuery' variable in escaped quotes, and the whole thing looks a little messy.
Is there an easier way?
Generally, you should not use string interpolation to build SQL queries. Doing so will leave you open to SQL injection attacks, in which someone supplies input that has a closing quote character, followed by another query. For instance, using your example:
>> val = '7; DROP TABLE users;'
=> "7; DROP TABLE users;"
>> myQuery = "select * from t where t.val = \#{val}"
=> "select * from t where t.val = \#{val}"
>> eval "\"#{myQuery}\""
=> "select * from t where t.val = 7; DROP TABLE users;"
Even without malicious input, you could simply accidentally execute code that you weren't intending to, if for instance someone included quote marks in their input.
It is also generally a good idea to avoid using eval unless absolutely necessary; it makes it possible that if you have a bug in your program, someone could execute arbitrary code by getting it passed to eval, and it makes code less maintainable since some of your source code will be loaded from places other than your regular source tree.
So, how do you do this instead? Database APIs generally include a prepare command, which can prepare to execute an SQL statement. Within that statement, you can include ? characters, which represent parameters that can be substituted within that statement. You can then call execute on the statement, passing in values for those parameters, and they will be executed safely, with no way for someone to get an arbitrary piece of SQL executed.
Here's how it would work in your example. This is assuming you are using this MySQL/Ruby module; if you are using a different one, it will probably have a similar interface, though it may not be exactly the same.
>> val = 7
>> db = Mysql.new(hostname, username, password, databasename)
>> query = db.prepare("select * from t where t.val = ?")
>> query.execute(val)
You can use ERB templates instead - read them from the files and interpolate the variables (convert <%= something %> tags into the actual values).
Here's the official doc, it's quite complete and straightforward.
You can use printf like syntax for string replacement
"123 %s 456" % 23 # => "123 23 456"
This only works if your program knows in advance which variables to use.
Could you use parametrized queries?
I don't know off hand how to do so in ruby, but basically it involves marking your SQL statement with commands that SQL recognizes are replaces with parameters that are sent in addition to your statement.
This link might help: http://sqlite-ruby.rubyforge.org/sqlite3/faq.html#538670816