I need to parse a file according to different rules.
The file contains several lines.
I go through the file line by line. When I find a specific string, I have to store the data present in the next lines until a specific character is found.
Example of file:
start {
/* add comment */
first_step {
sub_first_step {
};
sub_second_step {
code = 50,
post = xxx (aaaaaa,
bbbbbb,
cccccc,
eeeeee),
number = yyyy (fffffff,
gggggg,
jjjjjjj,
ppppppp),
};
So, in this case:
File.open(#file_to_convert, "r").each_line do |line|
In "line" I have my current line. I need to:
1) find when the line contains the string "xxx"
if line.include?("union") then
Correct?
2) store the next values (e.g.: aaaa, bbbb, ccccc,eeee) in an array until I find the character ")". This highlights that the section is finished.
I think we I reach the line with the string "xxxx" I have to iterate the next lines inside the block "if".
Try this:
file_contents = File.read(#file_to_convert)
lines = file_contents[/xxx \(([^)]+)\)/, 1].split
# => ["aaaaaa,", "bbbbbb,", "cccccc,", "eeeeee"]
The regex (xxx \(([^)]+)\)) takes all the text after xxx ( until the next ), and split splits it into its items.
It think this is what you are looking for:
looking = true
results = []
File.open(#file_to_convert, "r").each_line do |line|
if looking
if line.include?("xxx")
looking = false
results << line.scan(/\(([^,]*)/x)
end
else
if line.include?(")")
results << line.strip.delete('),')
break
else
results << line.strip.delete(',')
end
end
end
puts results
Related
First post, try not to get mad at my formatting.
I am trying to ETL on a csv file with python 3.5 - the code I have, successfully extracts, filters on correct column, creates the desired end result in the "new_string" variable and produces the correctly named txt file at end of run. But opening the txt file shows it is only one character long if it were an index i = [1] is only thing showing up, I was expecting the whole column to print out in a string format.. clearly I am not taking the formatting of the list/string into consideration but I am stuck for now.
If anyone sees something going on here. I would appreciate the heads up. Thanks in advance...
here is my code:
cdpath = os.getcwd()
def get_file_path(filename):
currentdirpath = os.getcwd()
file_path = os.path.join(os.getcwd(), filename)
print (file_path)
return file_path
path = get_file_path('cleanme.csv') ## My test file to work on
def timeStamped(fname, fmt='%Y-%m-%d-%H-%M-%S_{fname}'): ##Time stamp func
return datetime.datetime.now().strftime(fmt).format(fname=fname)
def read_csv(filepath):
with open(filepath, 'rU') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
new_list = row[2]
new_string = str(new_list)
print (new_string)
with open(timeStamped('cleaned.txt'),'w') as outf:
outf.write(new_string)
In your code, you have:
def read_csv(filepath):
with open(filepath, 'rU') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
new_list = row[2]
new_string = str(new_list)
print (new_string)
with open(timeStamped('cleaned.txt'),'w') as outf:
outf.write(new_string)
As noted in my comment above, there was some question on whether the second with was properly indented, but actually, it doesn't matter:
You generate the new_string inside the for loop (for row in reader). But because you don't use it inside the loop (except printing it out), when the loop finishes, the only value you will have access to will be the last element.
Alternatively, if you had the with ... as outf as part of the loop, each time through, you'd open a new copy and overwrite the data, such that cleaned.txt only has the last value at the end again.
I think what you want is something like:
def read_csv(filepath):
with open(filepath, 'rU') as csvfile:
with open(timeStamped('cleaned.txt'),'w') as outf:
reader = csv.reader(csvfile)
for row in reader:
new_list = row[2] #extract the 3rd column of each row
new_string = str(new_list) # optionally do some transforms here
print (new_string) #debug
outf.write(new_string) #store result
I want to be able to read the file into an associative array where I can access the elements by the column head name.
My file is formatted as follows:
KeyName Val1Name Val2Name ... ValMName
Key1 Val1-1 Val2-1 ... ValM-1
Key2 Val1-2 Val2-2 ... ValM-2
Key3 Val1-3 Val2-3 ... ValM-3
.. .. .. .. ..
KeyN Val1-N Val2-N ... ValM-N
The only problem is I don't have a clue how to do it. So far I have:
scores = File.read("scores.txt")
lines = scores.split("\n")
lines.each { |x|
y = x.to_s.split(' ')
}
Which gets close to what I want, but still am unable to get it into the format that is usable for me.
f = File.open("scores.txt") #get an instance of the file
first_line = f.gets.chomp #get the first line in the file (header)
first_line_array = first_line.split(/\s+/) #split the first line in the file via whitespace(s)
array_of_hash_maps = f.readlines.map do |line|
Hash[first_line_array.zip(line.split(/\s+/))]
end
#read the remaining lines of the file via `IO#readlines` into an array, split each read line by whitespace(s) into an array, and zip the first line with them, then convert it into a `Hash` object, and return a collection of the `Hash` objects
f.close #close the file
puts array_of_hash_maps #print the collection of the Hash objects to stdout
Can be done in 3 lines (This is why I love Ruby)
scores = File.readlines('/scripts/test.txt').map{|l| l.split(/\s+/)}
headers = scores.shift
scores.map!{|score|Hash[headers.zip(score)]}
now scores contains your hash array
Here is a verbose explanation
#open the file and read
#then split on new line
#then create an array of each line by splitting on space and stripping additional whitespace
scores = File.open('scores.txt', &:read).split("\n").map{|l| l.split(" ").map(&:strip)}
#shift the array to capture the header row
headers = scores.shift
#initialize an Array to hold the score hashs
scores_hash_array = []
#loop through each line
scores.each do |score|
#map the header value based on index with the line value
scores_hash_array << Hash[score.map.with_index{|l,i| [headers[i],l]}]
end
#=>[{"KeyName"=>"Key1", "Val1Name"=>"Val1-1", "Val2Name"=>"Val2-1", "..."=>"...", "ValMName"=>"ValM-1"},
{"KeyName"=>"Key2", "Val1Name"=>"Val1-2", "Val2Name"=>"Val2-2", "..."=>"...", "ValMName"=>"ValM-2"},
{"KeyName"=>"Key3", "Val1Name"=>"Val1-3", "Val2Name"=>"Val2-3", "..."=>"...", "ValMName"=>"ValM-3"},
{"KeyName"=>"..", "Val1Name"=>"..", "Val2Name"=>"..", "..."=>"..", "ValMName"=>".."},
{"KeyName"=>"KeyN", "Val1Name"=>"Val1-N", "Val2Name"=>"Val2-N", "..."=>"...", "ValMName"=>"ValM-N"}]
scores_hash_array now has a hash for each row in the sheet.
You can try something like this:-
enter code here
fh = File.open("scores.txt","r")
rh={} #result Hash
fh.readlines.each{|line|
kv=line.split(/\s+/)
puts kv.length
rh[kv[0]] = kv[1..kv.length-1].join(",") #***store the values joined by ","
}
puts rh.inspect
fh.close
If you want to get an array of values,replace the last line in loop by
rh[kv[0]] = kv[1..kv.length-1]
Given a hash that contains function names like "find_by_user", "find_by_id", ...
I want to search in a directory of files, and return a object that has each file name, along with the line numbers of where the function name occurred.
I have this so far:
files = Dir.glob(#folder_path)
files.each do |file_name|
content = File.read(file_name)
end
This will be scanning a few hundred files.
Here's the basic functionality you need:
# Given a path to a file and a regex,
# return an array of paired filename+line number matches
def matching_lines( file_path, regex )
name = File.basename(file_path)
File.readlines(file_path)
.map.with_index{ |line,i| [name,line,i] }
.select{ |name,line,i| line =~ regex }
.map{ |name,line,i| [name,i] }
end
You can choose to use this as you like, iterating over multiple files and/or patterns, or using Regexp.union to create a pattern matching any one of a set of strings.
However: this is what grep was made for:
C:\>grep --line-number Nokogiri *.rb
push_nav_to_docs.rb:13: nav_dom = Nokogiri.XML(IO.read(NAV))
push_nav_to_docs.rb:39: landing = Nokogiri.XML(html)
push_nav_to_docs.rb:53: doc = Nokogiri.XML(IO.read(doc_path))
push_nav_to_docs.rb:73: if File.exists?(toc_path) && toc = Nokogiri.XML(IO.read(toc_path)).at('ul')
push_nav_to_docs.rb:104: container << Nokogiri.make("<ul/>").tap do |ul|
In Ruby you could call this code and get the output you want via:
lookfor = "Nokogiri"
grepped = `grep --line-number #{lookfor} *.rb`
results = grepped.scan(/^(.+?):(\d+)/)
#=> [["push_nav_to_docs.rb", "13"], ["push_nav_to_docs.rb", "39"], ["push_nav_to_docs.rb", "53"], ["push_nav_to_docs.rb", "73"], ["push_nav_to_docs.rb", "104"]]
Grep can also recurse into directories, match only particular file names, take regular expressions as patterns, and more.
I'm trying to read through a file, find a certain pattern and then grabbing a set number of lines of text after the line that contains that pattern. Not really sure how to approach this.
If you want the n number of lines after the line matching pattern in the file filename:
lines = File.open(filename) do |file|
line = file.readline until line =~ /pattern/ || file.eof;
file.eof ? nil : (1..n).map { file.eof ? nil : file.readline }.compact
end
This should handle all cases, like the pattern not present in the file (returns nil) or there being less than n lines after the matching lines (the resulting array containing the last lines of the file.)
First parse the file into lines. Open, read, split on the line break
lines = File.open(file_name).read.split("\n")
Then get index
index = line.index{|x| x.match(/regex_pattern/)}
Where regex_pattern is the pattern that you are looking for. Use the index as a starting point and then the second argument is the number of lines (in this case 5)
lines[index, 5]
It will return an array of 'lines'
You could combine it a bit more to reduce the number of lines. but I was attempting to keep it readable.
If you're not tied to Ruby, grep -A 12 trivet will show the 12 lines after any line with trivet in it. Any regex will work in place of "trivet"
matched = false;
num = 0;
res = "";
new File(filename).each_line { |line|
if (matched) {
res += line+"\n";
num++;
if (num == num_lines_desired) {
break;
}
} elsif (line.match(/regex/)) {
matched = true;
}
}
This has the advantage of not needing to read the whole file in the event of a match.
When done, res will hold the desired lines.
in rails (only difference is how I generate the file object)
file = File.open(File.join(Rails.root, 'lib', 'file.json'))
#convert file into an array of strings, with \n as the separator
line_ary = file.readlines
line_count = line_ary.count
i = 0
#or however far up the document you want to be...you can get very fancy with this or just do it manually
hsh = {}
line_count.times do |l|
child_id = JSON.parse(line_ary[i])
i += 1
parent_ary = JSON.parse(line_ary[i])
i += 1
hsh[child_id] = parent_ary
end
haha I've said too much that should definitely get you started
How can I in Ruby read a string from a file into an array and only read and save in the array until I get a certain marker such as ":" and stop reading?
Any help would be much appreciated =)
For example:
10.199.198.10:111 test/testing/testing (EST-08532522)
10.199.198.12:111 test/testing/testing (EST-08532522)
10.199.198.13:111 test/testing/testing (EST-08532522)
Should only read the following and be contained in the array:
10.199.198.10
10.199.198.12
10.199.198.13
This is a rather trivial problem, using String#split:
results = open('a.txt').map { |line| line.split(':')[0] }
p results
Output:
["10.199.198.10", "10.199.198.12", "10.199.198.13"]
String#split breaks a string at the specified delimiter and returns an array; so line.split(':')[0] takes the first element of that generated array.
In the event that there is a line without a : in it, String#split will return an array with a single element that is the whole line. So if you need to do a little more error checking, you could write something like this:
results = []
open('a.txt').each do |line|
results << line.split(':')[0] if line.include? ':'
end
p results
which will only add split lines to the results array if the line has a : character in it.