file read in ruby getting output as spaces in character - ruby

I have an function to read data from file but I have problem with reading data
input in file:
1,S1­-88,S2­-53,S3­-69,S4­-64
File.open(file_path).each do |line|
p line.gsub(/\s+/, "")
end
Output:
"1,S1 ­-88,S2 ­-53,S3 ­-69,S4­ -64 \n"
The problem is, it adding an extra space after s1 -integer,s2 -integer like so, I have tried .gsub(/\s+/, "") to remove white space from string but it is not working, Please can any one help me why this happenning, How I can override this issue Or it may be file encoding issue?

If you binread, essentially you have UTF-8 characters in between
irb(main):013:0> f = File.binread('f2.txt')
=> "1,S1\xC2\xAD-88,S2\xC2\xAD-53,S3\xC2\xAD-69,S4\xC2\xAD-64"
\xC2\xAD are essentially whitespace characters
This may be because you have copied it from somewhere incorrectly or it was introduced in your text because of God. Don't know. You an check here, it shows there are hidden characters in between your text.
This will remove all characters not wanted.
File.foreach('f2.txt') do |f|
puts f.gsub(/[^\\s!-~]/, '')
end
=> 1,S1-88,S2-53,S3-69,S4-64

Related

How to remove unwanted character using hash in Ruby?

I have a set of data :
coords=ARRAY(0x940044c)
Label<=>Bikini beach
coords=ARRAY(0x95452ec)
City=Y
Label=Naifaru%*
How do I remove the unwanted character to make it like this?
coords=ARRAY(0x940044c)
Label=Bikini beach
coords=ARRAY(0x95452ec)
City=Y
Label=Naifaru
I tried this:
hashChar = {"!"=>nil, "#"=>nil, "$"=>nil, "%"=>nil, "*"=>nil, "<=>"=>nil, "<"=>nil, ">"=>nil}
readFile.each do |char|
unwantedChar = char.chomp
puts unwantedChar.gsub(/\W/, hashChar)
end
But the output I will get is this:
coordsARRAY0x940044c
LabelBikinibeach
coordsARRAY0x95452ec
CityY
LabelNaifaru
Please help.
If the input is not extremely long and you are fine to load it into memory, String#gsub would do. It’s always better to whitelist wanted characters, rather than blacklist unwanted ones.
readFile.gsub(/[^\w\s=\(\)]+/, '')
# coords=ARRAY(0x940044c)
# Label=Bikini beach
# coords=ARRAY(0x95452ec)
# City=Y
# Label=Naifaru
I assume from the code you posted, that readFile is a String holding the set of data you are referring to.
puts readFile.delete('!#$<>*')
should do the job.
Using a hash map with gsub
regex = Regexp.union(hashChar.keys)
puts your_string.gsub(regex, hashChar)

How to read a file's content and search for a string in multiple files

I have a text file that has around 100 plus entries like out.txt:
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
I have to read out.txt, line-by-line and check whether the strings like "domain\1esrt" are present in any of the files under a different directory. If present delete only that string occurrence and save the file.
I know how to read a file line-by-line and also know how to grep for a string in multiple files in a directory but I'm not sure how to join those two to achieve my above requirement.
You can create an array with all the words or strings you want to find and then delete/replace:
strings_to_delete = ['aaa', 'domain\1esrt', 'delete_me']
Then to read the file and use map to create an array with all the lines who doesn't match with none of the elements in the array created before:
# read the file 'text.txt'
lines = File.open('text.txt', 'r').map do|line|
# unless the line matches with some value on the strings_to_delete array
line unless strings_to_delete.any? do |word|
word == line.strip
end
# then remove the nil elements
end.reject(&:nil?)
And then open the file again but this time to write on it, all the lines which didn't match with the values in the strings_to_delete array:
File.open('text.txt', 'w') do |line|
lines.each do |element|
line.write element
end
end
The txt file looks like:
aaa
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
delete_me
I don't know how it'll work with a bigger file, anyways, I hope it helps.
I would suggest using gsub here. It will run a regex search on the string and replace it with the second parameter. So if you only have to replace any single string, I believe you can simply run gsub on that string (including the newline) and replace it with an empty string:
new_file_text = text.gsub(/regex_string\n/, "")

Why are strings beginning with a space converted with: ! ' with Ruby/YAML

I am writing a Ruby hash to a file using YAML.
File.open(output_file, "w") {|file| file.puts YAML::dump(final)}
The hash contains strings as keys and floats as values.
When my strings contain only letter they are outputted as such in the file file:
abc: 1.0
bcd: 1.0
cde: 1.0
When a string starts with a space it is outputted as such:
! ' ab': 1.0
When I read the file back in again everything is ok, but I want to know why this is happening and what does it mean.
I searched the YAML documentation and it says that a single exclamation point is used to represent local datatypes.
Why does this happen on string starting with spaces?
The ! is known as the "non-specific tag". It forces the YAML engine to decode the following item as either a string, a hash, or an array. It basically disables interpreting it as a different type. I'm not sure why the engine is tagging them this way; it doesn't seem to be needed. Perhaps it is just overzealously attempting to remove ambiguity?
Edit: either way, it's unneeded syntax:
YAML.dump({' a'=>0})
=> "---\n! ' a': 0\n"
YAML.load("---\n! ' a': 0\n") # with the bang
=> {" a"=>0}
YAML.load("---\n' a': 0\n") # without the bang
=> {" a"=>0}

Ruby: Remove whitespace chars at the beginning of a string

Edit: I solved this by using strip! to remove leading and trailing whitespaces as I show in this video. Then, I followed up by restoring the white space at the end of each string the array by iterating through and adding whitespace. This problem varies from the "dupe" as my intent is to keep the whitespace at the end. However, strip! will remove both the leading and trailing whitespace if that is your intent. (I would have made this an answer, but as this is incorrectly marked as a dupe, I could only edit my original question to include this.)
I have an array of words where I am trying to remove any whitespace that may exist at the beginning of the word instead of at the end. rstrip! just takes care of the end of a string. I want whitespaces removed from the beginning of a string.
example_array = ['peanut', ' butter', 'sammiches']
desired_output = ['peanut', 'butter', 'sammiches']
As you can see, not all elements in the array have the whitespace problem, so I can't just delete the first character as I would if all elements started with a single whitespace char.
Full code:
words = params[:word].gsub("\n", ",").delete("\r").split(",")
words.delete_if {|x| x == ""}
words.each do |e|
e.lstrip!
end
Sample text that a user may enter on the form:
Corn on the cob,
Fibonacci,
StackOverflow
Chat, Meta, About
Badges
Tags,,
Unanswered
Ask Question
String#lstrip (or String#lstrip!) is what you're after.
desired_output = example_array.map(&:lstrip)
More comments about your code:
delete_if {|x| x == ""} can be replaced with delete_if(&:empty?)
Except you want reject! because delete_if will only return a different array, rather than modify the existing one.
words.each {|e| e.lstrip!} can be replaced with words.each(&:lstrip!)
delete("\r") should be redundant if you're reading a windows-style text document on a Windows machine, or a Unix-style document on a Unix machine
split(",") can be replaced with split(", ") or split(/, */) (or /, ?/ if there should be at most one space)
So now it looks like:
words = params[:word].gsub("\n", ",").split(/, ?/)
words.reject!(&:empty?)
words.each(&:lstrip!)
I'd be able to give more advice if you had the sample text available.
Edit: Ok, here goes:
temp_array = text.split("\n").map do |line|
fields = line.split(/, */)
non_empty_fields = fields.reject(&:empty?)
end
temp_array.flatten(1)
The methods used are String#split, Enumerable#map, Enumerable#reject and Array#flatten.
Ruby also has libraries for parsing comma seperated files, but I think they're a little different between 1.8 and 1.9.
> ' string '.lstrip.chop
=> "string"
Strips both white spaces...

Ruby string encoding problem

I've looked at the other ruby/encoding related posts but haven't been able to figure out why the following is not working. Likely just because I'm dense, but here's the situation.
Using Ruby 1.9 on windows. I have a set of CSV files that need some data appended to the end of each line. Whenever I run my script, the appended characters are gibberish. The input text appears to be IBM437 encoding, whereas my string I'm appending starts as US-ASCII. Nothing I've tried with respect to forcing encoding on the input strings or the append string seems to change the resultant output. I'm stumped. The current encoding version is simply the last that I tried.
def append_salesperson(txt, salesperson)
if txt.length > 2
return txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson}")
end
end
salespeople = Hash[
"fname", "Record Manager"]
outfile = File.open("ActData.csv", "w:US-ASCII")
salespeople.each do | filename, recordManager |
infile = File.open("#{filename}.txt")
infile.each do |line|
outfile.puts append_salesperson(line, recordManager)
end
infile.close
end
outfile.close
One small note that is related to your question is that you have your csv data as such %(, "", "", "#{salesperson}"). Here you have a space char before your double quotes. This can cause the #{salesperson} to be interpreted as multiple fields if there is a comma in this text. To fix this there can't be white space between the comma and the double quotes. Example: "this is a field","Last, First","and so on". This is one little gotcha that I ran into when creating reports meant to be viewed in Excel.
In Common Format and MIME Type for Comma-Separated Values (CSV) Files they describe the grammar of a csv file for reference.
maybe txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson.force_encoding('something')}")
?
It sounds like the CSV data is coming in as UTF-16... hence the puts shows as the printable character (the first byte) plus a space (the second byte).
Have you tried encoding your appended data with .force_encoding(Encoding::UTF-16LE) or .force_encoding(Encoding::UTF-16BE)?

Resources