Extract a single line string having "foo: XXXX" - ruby

I have a file with one or more key:value lines, and I want to pull a key:value out if key=foo. How can I do this?
I can get as far as this:
if File.exist?('/file_name')
content = open('/file_name').grep(/foo:??/)
I am unsure about the grep portion, and also once I get the content, how do I extract the value?

People like to slurp the files into memory, which, if the file will always be small, is a reasonable solution. However, slurping isn't scalable, and the practice can lead to excessive CPU and I/O waits as content is read.
Instead, because you could have multiple hits in a file, and you're comparing the content line-by-line, read it line-by-line. Line I/O is very fast and avoids the scalability problems. Ruby's File.foreach is the way to go:
File.foreach('path/to/file') do |li|
puts $1 if li[/foo:\s*(\w+)/]
end
Because there are no samples of actual key/value pairs, we're shooting in the dark for valid regex patterns, but this is the basis for how I'd solve the problem.

Try this:
IO.readlines('key_values.txt').find_all{|line| line.match('key1')}

i would recommend to read the file into array and select only lines you need:
regex = /\A\s?key\s?:/
results = File.readlines('file').inject([]) do |f,l|
l =~ regex ? f << "key = %s" % l.sub(regex, '') : f
end
this will detect lines starting with key: and adding them to results like key = value,
where value is the portion going after key:
so if you have a file like this:
key:1
foo
key:2
bar
key:3
you'll get results like this:
key = 1
key = 2
key = 3
makes sense?

value = File.open('/file_name').read.match("key:(.*)").captures[0] rescue nil

File.read('file_name')[/foo: (.*)/, 1]
#=> XXXX

Related

How to remove unwanted character using hash in Ruby?

I have a set of data :
coords=ARRAY(0x940044c)
Label<=>Bikini beach
coords=ARRAY(0x95452ec)
City=Y
Label=Naifaru%*
How do I remove the unwanted character to make it like this?
coords=ARRAY(0x940044c)
Label=Bikini beach
coords=ARRAY(0x95452ec)
City=Y
Label=Naifaru
I tried this:
hashChar = {"!"=>nil, "#"=>nil, "$"=>nil, "%"=>nil, "*"=>nil, "<=>"=>nil, "<"=>nil, ">"=>nil}
readFile.each do |char|
unwantedChar = char.chomp
puts unwantedChar.gsub(/\W/, hashChar)
end
But the output I will get is this:
coordsARRAY0x940044c
LabelBikinibeach
coordsARRAY0x95452ec
CityY
LabelNaifaru
Please help.
If the input is not extremely long and you are fine to load it into memory, String#gsub would do. It’s always better to whitelist wanted characters, rather than blacklist unwanted ones.
readFile.gsub(/[^\w\s=\(\)]+/, '')
# coords=ARRAY(0x940044c)
# Label=Bikini beach
# coords=ARRAY(0x95452ec)
# City=Y
# Label=Naifaru
I assume from the code you posted, that readFile is a String holding the set of data you are referring to.
puts readFile.delete('!#$<>*')
should do the job.
Using a hash map with gsub
regex = Regexp.union(hashChar.keys)
puts your_string.gsub(regex, hashChar)

Ruby - Extra punctuation in file when using regex and csv class to write to a file

I'm using regex to grab parameters from an html file.
I've tested the regexp and it seems to be fine- it appears that the csv conversion is what's causing the issue, but I'm not sure.
Here is what I have:
mechanics_file= File.read(filename)
mechanics= mechanics_file.scan(/(?<=70%">)(.*)(?=<\/td)/)
id_file= File.read(filename)
id=id_file.scan(/(?<="propertyids\[]" value=")(.*)(?=")/)
puts id.zip(mechanics)
CSV.open('csvfile.csv', 'w') do |csv|
id.zip(mechanics) { |row| csv << row }
end
The puts output looks like this:
2073
Acting
2689
Action / Movement Programming
But the contents of the csv look like this:
"[""2073""]","[""Acting""]"
"[""2689""]","[""Action / Movement Programming""]"
How do I get rid of all of the extra quotes and brackets? Am I doing something wrong in the process of writing to a csv?
This is my first project in ruby so I would appreciate a child-friendly explanation :) Thanks in advance!
String#scan returns an Array of Arrays (bold emphasis mine):
scan(pattern) → array
Both forms iterate through str, matching the pattern (which may be a Regexp or a String). For each match, a result is generated and either added to the result array or passed to the block. If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
a = "cruel world"
# […]
a.scan(/(...)/) #=> [["cru"], ["el "], ["wor"]]
So, id looks like this:
id == [['2073'], ['2689']]
and mechanics looks like this:
mechanics == [['Acting'], ['Action / Movement Programming']]
id.zip(movements) then looks like this:
id.zip(movements) == [[['2073'], ['Acting']], [['2689'], ['Action / Movement Programming']]]
Which means that in your loop, each row looks like this:
row == [['2073'], ['Acting']]
row == [['2689'], ['Action / Movement Programming']]
CSV#<< expects an Array of Strings, or things that can be converted to Strings as an argument. You are passing it an Array of Arrays, which it will happily convert to an Array of Strings for you by calling Array#to_s on each element, and that looks like this:
[['2073'], ['Acting']].map(&:to_s) == [ '["2073"]', '["Acting"]' ]
[['2689'], ['Action / Movement Programming']].map(&:to_s) == [ '["2689"]', '["Action / Movement Programming"]' ]
Lastly, " is the string delimiter in CSV, and needs to be escaped by doubling it, so what actually gets written to the CSV file is this:
"[""2073""]", "[""Acting""]"
"[""2689""]", "[""Action / Movement Programming""]"
The simplest way to correct this, would be to flatten the return values of the scans (and maybe also convert the IDs to Integers, assuming that they are, in fact, Integers):
mechanics_file = File.read(filename)
mechanics = mechanics_file.scan(/(?<=70%">)(.*)(?=<\/td)/).flatten
id_file = File.read(filename)
id = id_file.scan(/(?<="propertyids\[]" value=")(.*)(?=")/).flatten.map(&:to_i)
CSV.open('csvfile.csv', 'w') do |csv|
id.zip(mechanics) { |row| csv << row }
end
Another suggestion would be to forgo the Regexps completely and use an HTML parser to parse the HTML.

Ruby script which can replace a string in a binary file to a different, but same length string?

I would like to write a Ruby script (repl.rb) which can replace a string in a binary file (string is defined by a regex) to a different, but same length string.
It works like a filter, outputs to STDOUT, which can be redirected (ruby repl.rb data.bin > data2.bin), regex and replacement can be hardcoded. My approach is:
#!/usr/bin/ruby
fn = ARGV[0]
regex = /\-\-[0-9a-z]{32,32}\-\-/
replacement = "--0ca2765b4fd186d6fc7c0ce385f0e9d9--"
blk_size = 1024
File.open(fn, "rb") {|f|
while not f.eof?
data = f.read(blk_size)
data.gsub!(regex, str)
print data
end
}
My problem is that when string is positioned in the file that way it interferes with the block size used by reading the binary file. For example when blk_size=1024 and my 1st occurance of the string begins at byte position 1000, so I will not find it in the "data" variable. Same happens with the next read cycle. Should I process the whole file two times with different block size to ensure avoiding this worth case scenario, or is there any other approach?
I would posit that a tool like sed might be a better choice for this. That said, here's an idea: Read block 1 and block 2 and join them into a single string, then perform the replacement on the combined string. Split them apart again and print block 1. Then read block 3 and join block 2 and 3 and perform the replacement as above. Split them again and print block 2. Repeat until the end of the file. I haven't tested it, but it ought to look something like this:
File.open(fn, "rb") do |f|
last_block, this_block = nil
while not f.eof?
last_block, this_block = this_block, f.read(blk_size)
data = "#{last_block}#{this_block}".gsub(regex, str)
last_block, this_block = data.slice!(0, blk_size), data
print last_block
end
print this_block
end
There's probably a nontrivial performance penalty for doing it this way, but it could be acceptable depending on your use case.
Maybe a cheeky
f.pos = f.pos - replacement.size
at the end of the while loop, just before reading the next chunk.

Using Ruby to automate a large directory system

So I have the following little script to make a file setup for organizing reports that we get.
#This script is to create a file structure for our survey data
require 'fileutils'
f = File.open('CustomerList.txt') or die "Unable to open file..."
a = f.readlines
x = 0
while a[x] != nil
Customer = a[x]
FileUtils.mkdir_p(Customer + "/foo/bar/orders")
FileUtils.mkdir_p(Customer + "/foo/bar/employees")
FileUtils.mkdir_p(Customer + "/foo/bar/comments")
x += 1
end
Everything seems to work before the while, but I keep getting:
'mkdir': Invalid argument - Cust001_JohnJacobSmith(JJS) (Errno::EINVAL)
Which would be the first line from the CustomerList.txt. Do I need to do something to the array entry to be considered a string? Am I mismatching variable types or something?
Thanks in advance.
The following worked for me:
IO.foreach('CustomerList.txt') do |customer|
customer.chomp!
["orders", "employees", "comments"].each do |dir|
FileUtils.mkdir_p("#{customer}/foo/bar/#{dir}")
end
end
with data like so:
$ cat CustomerList.txt
Cust001_JohnJacobSmith(JJS)
Cust003_JohnJacobSmith(JJS)
Cust002_JohnJacobSmith(JJS)
A few things to make it more like the ruby way:
Use blocks when opening a file or iterating through arrays, that way you don't need to worry about closing the file or accessing the array directly.
As noted by #inger, local vars start with lower case, customer.
When you want the value of a variable in a string usign #{} is more rubinic than concatenating with +.
Also note that we took off the trailing newline using chomp! (which changes the var in place, noted by the trailing ! on the method name)

Using Ruby to find the first previous occurrence of a string

I'm creating some basic work assistance utilities using Ruby. I've hit a problem that I don't really need to solve, but curiosity has the best of me.
What I would like to be able to do is search the contents of a file, starting from a particular line and find the first PREVIOUS occurrence of a string.
For example, if I have the following text saved in a file, I would like to be able to search for "CREATE PROCEDURE" starting at line 4 and have this return/output "CREATE PROCEDURE sp_MERGE_TABLE"
CREATE PROCEDURE sp_MERGE_TABLE
AS
SOME HORRIBLE STATEMENT
HERE
CREATE PROCEDURE sp_SOMETHING_ELSE
AS
A DIFFERENT STATEMENT
HERE
Searching for content isn't a challenge, but specifying a starting line - no idea. And then searching backwards... well...
Any help at all appreciated!
TIA!
I think you have to read file line one by line
then follwing will work
flag=true
if flag && line.include?("CREATE PROCEDURE")
puts line
flag=false
end
If performance isn't a big issue, you could just use a simple loop:
# pseudocode
line_no = 0
while line_no < start_line
read line from file
if content_found in this line
last_seen = line_no # or file offset
end
line_no += 1
end
return last_seen
I'm afraid you will have to work line by line through the file, unless you have some index over it, pointing to the beginnings of the lines. That would make the loop a little bit simpler but working through the file in backwards manner is harder (unless you keep the whole file in memory).
Edit:
I just had a much better idea, but I'm going to include the old solution anyway.
The benefit of searching backwards means you only have to read the first chunk of the file, upto the specified line number. For proximity, you get closer and closer to the start_line, and if you find a match you just forget the old one.. You still read in some redundant data at the beginning, but at least it's O(n)
path = "path/to/file"
start_line = 20
search_string = "findme!"
#assuming file is at least start_line lines long
match_index = nil
f = File.new(path)
start_line.times do |i|
line = f.readline
match_index = i if line.include? search_string
end
puts "Matched #{search_string} on line #{match_index}"
Of course, bear in mind that the size of this file plays an important role in answering your question.
If you wanted to get really serious, you could look into the IO class - it seems like this might be the ultimate solution. Untested, just a thought.
f = File.new(path)
start_line.downto(0) do |i|
f.lineno = i
break if f.gets.include?(search_string)
end
Original:
For an exhaustive solution, you could try something like the following. The downside is you'd need to read the whole file into memory, but it takes into account continuing from the bottom-up if it gets to the top without a match. Untested.
path = "path/to/file"
start_line = 20
search_string = "findme!"
#get lines of the file into an array (chomp optional)
lines = File.readlines(path).map(&:chomp)
#"cut" the deck, as with playing cards, so start_line is first in the array
lines = lines.slice!(start_line..lines.length) + lines
#searching backwards can just be searching a reversed array forwards
lines.reverse!
#search through the reversed-array, for the first occurence
reverse_occurence = nil
lines.each_with_index do |line,index|
if line.include?(search_string)
reverse_occurence = index
break
end
end
#reverse_occurence is now either "nil" for no match, or a reversed-index
#also un-cut the array when calculating the index
if reverse_occurence
occurence = lines.size - reverse_occurence - 1 + start_line
line = lines[reverse_occurence]
puts "Matched #{search_string} on line #{occurence}"
puts line
end
1) Read the entire file into a string.
2) Reverse the file-data string.
3) Reverse the search string.
4) Search forward. Remember to match end-of-line instead of beginning-of-line, and to start from position end-minus-N rather than from N.
Not very fast or efficient, but it's elegant. Or at least clever.

Resources