How to remove unwanted character using hash in Ruby? - ruby

I have a set of data :
coords=ARRAY(0x940044c)
Label<=>Bikini beach
coords=ARRAY(0x95452ec)
City=Y
Label=Naifaru%*
How do I remove the unwanted character to make it like this?
coords=ARRAY(0x940044c)
Label=Bikini beach
coords=ARRAY(0x95452ec)
City=Y
Label=Naifaru
I tried this:
hashChar = {"!"=>nil, "#"=>nil, "$"=>nil, "%"=>nil, "*"=>nil, "<=>"=>nil, "<"=>nil, ">"=>nil}
readFile.each do |char|
unwantedChar = char.chomp
puts unwantedChar.gsub(/\W/, hashChar)
end
But the output I will get is this:
coordsARRAY0x940044c
LabelBikinibeach
coordsARRAY0x95452ec
CityY
LabelNaifaru
Please help.

If the input is not extremely long and you are fine to load it into memory, String#gsub would do. It’s always better to whitelist wanted characters, rather than blacklist unwanted ones.
readFile.gsub(/[^\w\s=\(\)]+/, '')
# coords=ARRAY(0x940044c)
# Label=Bikini beach
# coords=ARRAY(0x95452ec)
# City=Y
# Label=Naifaru

I assume from the code you posted, that readFile is a String holding the set of data you are referring to.
puts readFile.delete('!#$<>*')
should do the job.

Using a hash map with gsub
regex = Regexp.union(hashChar.keys)
puts your_string.gsub(regex, hashChar)

Related

file read in ruby getting output as spaces in character

I have an function to read data from file but I have problem with reading data
input in file:
1,S1­-88,S2­-53,S3­-69,S4­-64
File.open(file_path).each do |line|
p line.gsub(/\s+/, "")
end
Output:
"1,S1 ­-88,S2 ­-53,S3 ­-69,S4­ -64 \n"
The problem is, it adding an extra space after s1 -integer,s2 -integer like so, I have tried .gsub(/\s+/, "") to remove white space from string but it is not working, Please can any one help me why this happenning, How I can override this issue Or it may be file encoding issue?
If you binread, essentially you have UTF-8 characters in between
irb(main):013:0> f = File.binread('f2.txt')
=> "1,S1\xC2\xAD-88,S2\xC2\xAD-53,S3\xC2\xAD-69,S4\xC2\xAD-64"
\xC2\xAD are essentially whitespace characters
This may be because you have copied it from somewhere incorrectly or it was introduced in your text because of God. Don't know. You an check here, it shows there are hidden characters in between your text.
This will remove all characters not wanted.
File.foreach('f2.txt') do |f|
puts f.gsub(/[^\\s!-~]/, '')
end
=> 1,S1-88,S2-53,S3-69,S4-64

Deleting all special characters from a string - ruby

I was doing the challenges from pythonchallenge writing code in ruby, specifically this one. It contains a really long string in page source with special characters. I was trying to find a way to delete them/check for the alphabetical chars.
I tried using scan method, but I think I might not use it properly. I also tried delete! like that:
a = "PAGE SOURCE CODE PASTED HERE"
a.delete! "!", "#" #and so on with special chars, does not work(?)
a
How can I do that?
Thanks
You can do this
a.gsub!(/[^0-9A-Za-z]/, '')
try with gsub
a.gsub!(/[!#%&"]/,'')
try the regexp on rubular.com
if you want something more general you can have a string with valid chars and remove what's not in there:
a.gsub!(/[^abcdefghijklmnopqrstuvwxyz ]/,'')
When you give multiple arguments to string#delete, it's the intersection of those arguments that is deleted. a.delete! "!", "#" deletes the intersections of the sets ! and # which means that nothing will be deleted and the method returns nil.
What you wanted to do is a.delete! "!#" with the characters to delete passed as a single string.
Since the challenge is asking to clean up the mess and find a message in it, I would go with a whitelist instead of deleting special characters. The delete method accepts ranges with - and negations with ^ (similar to a regex) so you can do something like this: a.delete! "^A-Za-z ".
You could also use regular expressions as shown by #arieljuod.
gsub is one of the most used Ruby methods in the wild.
specialname="Hello!#$#"
cleanedname = specialname.gsub(/[^a-zA-Z0-9\-]/,"")
I think a.gsub(/[^A-Za-z0-9 ]/, '') works better in this case. Otherwise, if you have a sentence, which typically should start with a capital letter, you will lose your capital letter. You would also lose any 1337 speak, or other possible crypts within the text.
Case in point:
phrase = "Joe can't tell between 'large' and large."
=> "Joe can't tell between 'large' and large."
phrase.gsub(/[^a-z ]/, '')
=> "oe cant tell between large and large"
phrase.gsub(/[^A-Za-z0-9 ]/, '')
=> "Joe cant tell between large and large"
phrase2 = "W3 a11 f10a7 d0wn h3r3!"
phrase2.gsub(/[^a-z ]/, '')
=> " a fa dwn hr"
phrase2.gsub(/[^A-Za-z0-9 ]/, '')
=> "W3 a11 f10a7 d0wn h3r3"
If you don't want to change the original string - i.e. to solve the challenge.
str.each_char do |letter|
if letter =~ /[a-z]/
p letter
end
end
You will have to write down your own string sanitize function, could easily use regex and the gsub method.
Atomic sample:
your_text.gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
API sample:
Route: post 'api/sanitize_text', to: 'api#sanitize_text'
Controller:
def sanitize_text
return render_bad_request unless params[:text].present? && params[:text].present?
sanitized_text = params[:text].gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
render_response( {safe_text: sanitized_text})
end
Then you call it
POST /api/sanitize_text?text=abcdefghijklmnopqrstuvwxyz123456<>$!#%23^%26*[]:;{}()`,.~'"\|/

Extract a single line string having "foo: XXXX"

I have a file with one or more key:value lines, and I want to pull a key:value out if key=foo. How can I do this?
I can get as far as this:
if File.exist?('/file_name')
content = open('/file_name').grep(/foo:??/)
I am unsure about the grep portion, and also once I get the content, how do I extract the value?
People like to slurp the files into memory, which, if the file will always be small, is a reasonable solution. However, slurping isn't scalable, and the practice can lead to excessive CPU and I/O waits as content is read.
Instead, because you could have multiple hits in a file, and you're comparing the content line-by-line, read it line-by-line. Line I/O is very fast and avoids the scalability problems. Ruby's File.foreach is the way to go:
File.foreach('path/to/file') do |li|
puts $1 if li[/foo:\s*(\w+)/]
end
Because there are no samples of actual key/value pairs, we're shooting in the dark for valid regex patterns, but this is the basis for how I'd solve the problem.
Try this:
IO.readlines('key_values.txt').find_all{|line| line.match('key1')}
i would recommend to read the file into array and select only lines you need:
regex = /\A\s?key\s?:/
results = File.readlines('file').inject([]) do |f,l|
l =~ regex ? f << "key = %s" % l.sub(regex, '') : f
end
this will detect lines starting with key: and adding them to results like key = value,
where value is the portion going after key:
so if you have a file like this:
key:1
foo
key:2
bar
key:3
you'll get results like this:
key = 1
key = 2
key = 3
makes sense?
value = File.open('/file_name').read.match("key:(.*)").captures[0] rescue nil
File.read('file_name')[/foo: (.*)/, 1]
#=> XXXX

Ruby: Remove whitespace chars at the beginning of a string

Edit: I solved this by using strip! to remove leading and trailing whitespaces as I show in this video. Then, I followed up by restoring the white space at the end of each string the array by iterating through and adding whitespace. This problem varies from the "dupe" as my intent is to keep the whitespace at the end. However, strip! will remove both the leading and trailing whitespace if that is your intent. (I would have made this an answer, but as this is incorrectly marked as a dupe, I could only edit my original question to include this.)
I have an array of words where I am trying to remove any whitespace that may exist at the beginning of the word instead of at the end. rstrip! just takes care of the end of a string. I want whitespaces removed from the beginning of a string.
example_array = ['peanut', ' butter', 'sammiches']
desired_output = ['peanut', 'butter', 'sammiches']
As you can see, not all elements in the array have the whitespace problem, so I can't just delete the first character as I would if all elements started with a single whitespace char.
Full code:
words = params[:word].gsub("\n", ",").delete("\r").split(",")
words.delete_if {|x| x == ""}
words.each do |e|
e.lstrip!
end
Sample text that a user may enter on the form:
Corn on the cob,
Fibonacci,
StackOverflow
Chat, Meta, About
Badges
Tags,,
Unanswered
Ask Question
String#lstrip (or String#lstrip!) is what you're after.
desired_output = example_array.map(&:lstrip)
More comments about your code:
delete_if {|x| x == ""} can be replaced with delete_if(&:empty?)
Except you want reject! because delete_if will only return a different array, rather than modify the existing one.
words.each {|e| e.lstrip!} can be replaced with words.each(&:lstrip!)
delete("\r") should be redundant if you're reading a windows-style text document on a Windows machine, or a Unix-style document on a Unix machine
split(",") can be replaced with split(", ") or split(/, */) (or /, ?/ if there should be at most one space)
So now it looks like:
words = params[:word].gsub("\n", ",").split(/, ?/)
words.reject!(&:empty?)
words.each(&:lstrip!)
I'd be able to give more advice if you had the sample text available.
Edit: Ok, here goes:
temp_array = text.split("\n").map do |line|
fields = line.split(/, */)
non_empty_fields = fields.reject(&:empty?)
end
temp_array.flatten(1)
The methods used are String#split, Enumerable#map, Enumerable#reject and Array#flatten.
Ruby also has libraries for parsing comma seperated files, but I think they're a little different between 1.8 and 1.9.
> ' string '.lstrip.chop
=> "string"
Strips both white spaces...

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Resources