How to match a string with `\xXXXX` and `\uXXXX`

How to match a string with `\xXXXX` and `\uXXXX` - ruby

I have a string that contains \xXXXX and \uXXXX:
str = "\nDefault\nRouterRandom=\x9Db\u0012\xD3,\x92r\xFC o\u007F\x9B+\u0005I`\nWebInit=1\n"
I want to delete the content:
"RouterRandom=\x9Db\u0012\xD3,\x92r\xFC o\u007F\x9B+\u0005I`\n"
How can I match the string or delete it? I tried:
content = str.sub(/RouterRandom=.*WebInit/, "")
It returns errors:
E:/Automation/experiment/ruby_test/string_test.rb:119:in `sub': invalid byte sequence in UTF-8 (ArgumentError)
from E:/Automation/experiment/ruby_test/string_test.rb:119:in `block in <top (required)>'
from E:/Automation/experiment/ruby_test/string_test.rb:110:in `open'
from E:/Automation/experiment/ruby_test/string_test.rb:110:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'

You are getting invalid byte sequence because of invalid characters in your string. You can replace invalid characters in your string first:
content = str.encode( 'UTF-8', invalid: :replace )
Then split by newlines:
content = content.split( "\n" )
Delete the offending element in the array by index:
content.delete_at( 2 )
And then finally join the array back together into a newline delimited string:
new_str = content.join( "\n" )
# => "\nDefault\nWebInit=1"

Related

How do I parse a tab-delimited line that contains a quote?

I'm using Ruby 2.4. How do I parse a tab-delimited line that contains a quote character? This is what's happening to me now ...
2.4.0 :003 > line = "11\tDave\tO\"malley"
=> "11\tDave\tO\"malley"
2.4.0 :004 > CSV.parse(line, col_sep: "\t")
CSV::MalformedCSVError: Illegal quoting in line 1.
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1912:in `block (2 levels) in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1868:in `block in shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `loop'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1828:in `shift'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1770:in `each'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `to_a'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1784:in `read'
from /Users/davea/.rvm/rubies/ruby-2.4.0/lib/ruby/2.4.0/csv.rb:1324:in `parse'
from (irb):4
from /Users/davea/.rvm/gems/ruby-2.4.0#global/gems/railties-5.0.1/lib/rails/commands/console.rb:65:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0#global/gems/railties-5.0.1/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/davea/.rvm/gems/ruby-2.4.0#global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/davea/.rvm/gems/ruby-2.4.0#global/gems/railties-5.0.1/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/davea/.rvm/gems/ruby-2.4.0#global/gems/railties-5.0.1/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
Although teh example illustrates my point, I can't easily control the input coming in. So, although an answer coudl be< "Remove all quotes from teh string before parsing," I want to preserve the data as closely as possible.

That's a malformed document if you're trying to adhere to the CSV standard. Instad you might just brute-force it and pray there's no tabs in the data itself:
line.split(/\t/)
The CSV parsing library comes in handy when you're dealing with data like this:
"1\t2\t\"3a\t3b\"\t4"
Update: If you're prepared to abuse the CSV library a little then you can do this:
CSV.parse("11\tDave\tO\"malley", col_sep: "\t", quote_char: "\0")
That basically kills quote detection, so if there is other data that depends on that being processed correctly this may not work out.

"11\tDave\tO\"malley" is not valid CSV data. Strangely enough, the answer is to use two double-quotes, and to double quote each element
2.3.1 :001 > require 'csv'
=> true
2.3.1 :002 > line = "\"11\"\t\"Dave\"\t\"O\"\"malley\""
=> "\"11\"\t\"Dave\"\t\"O\"\"malley\""
2.3.1 :003 > puts line # for clarity
"11" "Dave" "O""malley"
=> nil
2.3.1 :004 > CSV.parse(line, col_sep: "\t")
=> [["11", "Dave", "O\"malley"]]

How do I delete stray double-quotes in a CSV file before my code runs?

I have a CSV file I import from one site and export to another web app.
However, when running my Ruby file I get the following error:
C:\Users\ALilland\Documents\sinatra\csv_to_screen>ruby app.rb
C:/Ruby22/lib/ruby/2.2.0/csv.rb:1843:in `block (2 levels) in shift': Missing or
stray quote in line 762 (CSV::MalformedCSVError)
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1836:in `each'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1836:in `block in shift'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1796:in `loop'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1796:in `shift'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1738:in `each'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1122:in `block in foreach'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1273:in `open'
from C:/Ruby22/lib/ruby/2.2.0/csv.rb:1121:in `foreach'
from app.rb:129:in `lost'
from app.rb:157:in `scorecard'
from app.rb:203:in `<main>'
This is an example of where a quote does not need to be in my data "GENERAL CONTRACTOR, ...":
MM,MM,"VENABLE, LLP","LOS ANGELES","GENERAL CONTRACTOR, ...",,Dead,02/11/2016,0
This is an example of completely missing quotes on irvine:
GS,GS,"REPLACE 1.5 DIELECTRIC UNION,IRVINE,"THE IRVINE CO. - EXECU...",,Job162048,02/01/2016,0
These "missing or stray quotes" are scattered throughout the CSV, usually in project_name = row[2].
What can I do to overwrite the stray quotes before my foreach block runs?
Here is a post I found that relates to my problem, but I was struggling to figure out how to implement it. This error seems rather well documented on the internet, I even heard a podcast on it the other day but now I officially ran into the problem firsthand and have come up empty. "How to remove the extra double quote?"
My complete foreach block is:
def lost(initials, salesperson)
my_status = 'Lost'
lost = 0
count = 0
CSV.foreach(path1, :encoding => 'windows-1251:utf-8') do |row|
salesman = row[0]
project_manager = row[1]
project_name = row[2]
project_city = row[3]
customer = row[4]
status = row[6]
if status[0,4] == 'Dead'
status = 'Dead'
end
bid_date = Date.strptime(row[7], '%m/%d/%Y')
amount = row[8].gsub(/(?<!^|,)"(?!,|$)/, '').tr(',', '').to_i
next if salesman != initials || status != my_status || bid_date < fiscal_start
dollar_amount = '$' + amount.to_s.reverse.gsub(/...(?=.)/,'\&,').reverse
lost = lost + amount
count = count + 1
end
#lost_count = count.to_s
#lost_amount = "$" + lost.to_s.reverse.gsub(/...(?=.)/,'\&,').reverse
end

Encoding::UndefinedConversionError: U+00A0 from UTF-8 to US-ASCII

I'm trying to scrap the 52 between the anchor links:
<div class="zg_usedPrice">
52 new
</div>
With this code:
def self.parse_products
product_hash = {}
product = #data.css('#zg_centerListWrapper')
product.css('.zg_itemImmersion').each do | product |
product_name = product.css('.zg_title a').text
product_used_price_status = product.css('.zg_usedPrice > a').text[/(\D+)/]
product_hash[:product] ||= []
product_hash[:product] << { :name => product_name,
:used_status => product_used_price_status }
end
product_hash
end
But I think the http://www.amazon.com/gp/offer-listing/B000O3GCFU/ref=zg_bs_baby-products_price?ie=UTF8&condition=new part in the URL is producing the following error:
Encoding::UndefinedConversionError:
U+00A0 from UTF-8 to US-ASCII
# ./parser_spec.rb:175:in `block (2 levels) in <top (required)>'
I tried what they suggested in "Ruby error UTF-8 to ASCII", but I'm still getting the same problem. Is there any workaround for that?
Full error trace:
1) Product (Baby) should return correct keys
Failure/Error: expect(product_hash[:product]["Pet Supplies"].keys).to eq(["Birds", "Cats", "Dogs", "Fish & Aquatic Pets", "Horses", "Insects", "Reptiles & Amphibians", "Small Animals"])
TypeError:
can't convert String into Integer
# ./parser_spec.rb:179:in `[]'
# ./parser_spec.rb:179:in `block (2 levels) in <top (required)>'
2) Product (Baby) should return correct values
Failure/Error: expect(product_hash[:product]["Pet Supplies"].values).to eq([16281, 245512, 513926, 46811, 14805, 364, 5816, 19769])
TypeError:
can't convert String into Integer
# ./parser_spec.rb:183:in `[]'
# ./parser_spec.rb:183:in `block (2 levels) in <top (required)>'
3) Product (Baby) should return correct hash
Failure/Error: expect(product_hash[:product]).to eq({"Pet Supplies"=>{"Birds"=>16281, "Cats"=>245512, "Dogs"=>513926, "Fish & Aquatic Pets"=>46811, "Horses"=>14805, "Insects"=>364, "Reptiles & Amphibians"=>5816, "Small Animals"=>19769}})
Encoding::UndefinedConversionError:
U+00A0 from UTF-8 to US-ASCII
# ./parser_spec.rb:187:in `block (2 levels) in <top (required)>'

Your HTML sample doesn't match the code you're showing, plus the URL you gave doesn't exist any more, so it's difficult to help you.
Here's a start:
require 'nokogiri'
html = '<div class="zg_usedPrice">
52 new
</div>
'
doc = Nokogiri::HTML(html)
text = doc.at('div.zg_usedPrice a').text # => "52\u00A0new"
text.gsub(/\u00A0/, ' ') # => "52 new"

How do I open local image file, encode it, and post via URI? (posting via Tumblr API)

I'm trying to read local image file, properly encode it and post to Tumbrl. According to the Tumblr API I can pass a parameter data which is Array (URL-encoded binary contents) Limit: 5 MB
I've tested my code with http://api.tumblr.com/v2/blog/#{BLOG}/info request. It is working. But I can't post a photo. Here is my code:
require 'oauth'
require 'oauth/consumer'
require 'open-uri'
require 'active_support'
CONSUMER = 'foo'
SECRET = 'foo'
TOKEN = 'foo'
TOKEN_SECRET = 'foo'
BLOG = 'foo'
consumer=OAuth::Consumer.new(CONSUMER, SECRET, {:site=>"http://tumblr.com"})
access_token = OAuth::AccessToken.new(consumer, TOKEN, TOKEN_SECRET)
# Here I tried one of two lines:
# data = Base64.encode64(IO.binread('./resized')) #first try
data = URI::encode(IO.binread('./resized')) #second try
# response = access_token.get "http://api.tumblr.com/v2/blog/#{BLOG}/info?api_key=#{CONSUMER}"
# puts response
response=access_token.post "http://api.tumblr.com/v2/blog/#{BLOG}/post?api_key=#{CONSUMER}&type=photo&data=#{data}&link=http://ya.ru&"
puts response
1st try:
% ruby ./w_oauth.rb
/usr/lib/ruby/1.9.1/uri/common.rb:176:in `split': bad URI(is not URI?): http://api.tumblr.com/v2/blog/foo/post?api_key=foo&type=photo&data=/9j/4AAQSkZJRgABAQEASABIAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4w (URI::InvalidURIError)
ICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gODAK/9sAQwAGBAUG
BQQGBgUGBwcGCAoQCgoJCQoUDg8MEBcUGBgXFBYWGh0lHxobIxwWFiAsICMm
(!!!long piece of image data skipped!!!)
FI/16HfTbyHPWurqdE+TGH4wx2js5SKQb+6b4bIj3aurqCrEtcXrf/4yf/dS
DLet/wCzEB6sa6uoomxJN2eaQj5mkYuerQj611dQM7Fx/wDLF/8AbXV1dTA/
/9k=
&link=http://ya.ru&
from /usr/lib/ruby/1.9.1/uri/common.rb:211:in `parse'
from /usr/lib/ruby/1.9.1/uri/common.rb:747:in `parse'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/tokens/access_token.rb:7:in `request'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/tokens/access_token.rb:47:in `post'
from ./w_oauth.rb:23:in `<main>'
2nd try:
% ruby ./w_oauth.rb
/var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:14:in `force_encoding': can't modify frozen String (RuntimeError)
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:14:in `rescue in escape'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:12:in `escape'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:43:in `block (2 levels) in normalize'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:42:in `collect'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:42:in `block in normalize'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:37:in `map'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/helper.rb:37:in `normalize'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/request_proxy/base.rb:98:in `normalized_parameters'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/request_proxy/base.rb:113:in `signature_base_string'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/signature/base.rb:77:in `signature_base_string'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/signature/hmac/base.rb:12:in `digest'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/signature/base.rb:65:in `signature'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/signature.rb:23:in `sign'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/client/helper.rb:45:in `signature'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/client/helper.rb:75:in `header'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/client/net_http.rb:91:in `set_oauth_header'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/client/net_http.rb:30:in `oauth!'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/consumer.rb:224:in `sign!'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/consumer.rb:188:in `create_signed_request'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/consumer.rb:159:in `request'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/tokens/consumer_token.rb:25:in `request'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/tokens/access_token.rb:12:in `request'
from /var/lib/gems/1.9.1/gems/oauth-0.4.6/lib/oauth/tokens/access_token.rb:47:in `post'
from ./w_oauth.rb:23:in `<main>'
UPD: ./resized is a proper JPEG file:
% file ./resized
./resized: JPEG image data, JFIF standard 1.01, comment: "CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 80"

URI encoding is not enough. You also need to encode: , / ? : # & = + $ #.
Try:
URI.escape(IO.binread('./resized'), Regexp.new("[^#{URI::PATTERN::UNRESERVED}]"))

ruby: `read': Invalid argument -(Errno::EINVAL) at File.read

I'm doing a simple script to check crc of all files...
require "zlib"
exit if Object.const_defined?(:Ocra)
files = Dir.glob("*")
File.open('dir.txt', 'a+') do |file|
file.puts files
end
File.read('dir.txt').each_line { |line|
file = File.read(line) ; nil
file_crc = Zlib.crc32(file,0).to_s(16)
puts line, file_crc
}
The problem is at the line File.read('dir.txt').each_line { |line|
I get this error:
test.rb:13:in `read': Invalid argument - 1.exe (Errno::EINVAL)
from C:/Users/Administrador/Desktop/1.rb:13:in `block in <main>'
from C:/Users/Administrador/Desktop/1.rb:12:in `each_line'
from C:/Users/Administrador/Desktop/1.rb:12:in `<main>'
PD: 1.exe is a file listed in the "dir.txt".

Have you checked that the line doesn't contain extra characters? p line.
IIRC line will contain the newline character, use line.chomp.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to match a string with `\xXXXX` and `\uXXXX` - ruby

Related

How do I parse a tab-delimited line that contains a quote?

How do I delete stray double-quotes in a CSV file before my code runs?

Encoding::UndefinedConversionError: U+00A0 from UTF-8 to US-ASCII

How do I open local image file, encode it, and post via URI? (posting via Tumblr API)

ruby: `read': Invalid argument -(Errno::EINVAL) at File.read

Categories

Resources