Sort an array depending on given set of characters within each element - ruby

I have an array made of a number of elements read from a .txt file. Each element is long and has a lot of information, for instance:
20201102066000000000000000000000000020052IC04008409Z8000000000030546676591AFIP
All the lines are already part of the array lines_array, but I need to sort them out depending on the content of the 36° character until the 51°, which in the example provided above would be:
20052IC04008409Z
I was already able to catch the patterns from each of the elements in the array:
lines_array = File.readlines(complete_filename)
pattern = nil
lines_array.each do |line|
pattern = line[36] + line[37] + line[38] + line[39] + line[40] + line[41] + line[42] + line[43] +
line[44] + line[45] + line[46] + line[47] + line[48] + line[49] + line[50] + line[51]
end
What I need to do now is to be able to sort alphabetically all the elements of the array (with the long elements) based on the content of the variable variable pattern. I tried with methods sort and sort_by but I wasn't able to pass my variable pattern as a parameter. For example, a correct order of three given elements would be:
20201102066000000000000000000000000020001IC04180127X8000000000030546676591AFIP
20201104066000000000000000000000000020001IC04182757T8000000000030546676591AFIP
20201102066000000000000000000000000020001IC05020641D8000000000030546676591AFIP
Any help?

Firstly, there are easier and cleaner ways to extract the substring you're after, you could use String#[] or String#slice:
# These do the same thing, use whichever reads better to you.
pattern = line[36, 16]
pattern = line.slice(36, 16)
pattern = line[36..51]
pattern = line.slice(36..51)
Then you can Enumerable#sort_by on that slice by using a block with sort_by:
sorted = lines_array.sort_by { |str| str[36, 16] }

Related

Insert multiple characters in string at once

Where as str[] will replace a character, str.insert will insert a character at a position. But it requires two lines of code:
str = "COSO17123456"
str.insert 4, "-"
str.insert 7, "-"
=> "COSO-17-123456"
I was thinking how to do this in one line of code. I came up with the following solution:
str = "COSO17123456"
str.each_char.with_index.reduce("") { |acc,(c,i)| acc += c + ( (i == 3 || i == 5) ? "-" : "" ) }
=> "COSO-17-123456
Is there a built-in Ruby helper for this task? If not, should I stick with the insert option rather than combining several iterators?
Use each to iterate over an array of indices:
str = "COSO17123456"
[4, 7].each { |i| str.insert i, '-' }
str #=> "COSO-17-123456"
You can uses slices and .join:
> [str[0..3], str[4..5],str[6..-1]].join("-")
=> "COSO-17-123456"
Note that the index after the first one (between 3 and 4) will be different since you are not inserting earlier insertion first. ie, more natural (to me anyway...)
You will insert at the absolute index of the original string -- not the moving relative index as insertions are made.
If you want to insert at specific absolute index values, you can also use ..each_with_index and control the behavior character by character:
str2 = ""
tgts=[3,5]
str.split("").each_with_index { |c,idx| str2+=c; str2+='-' if tgts.include? idx }
Both of the above create a new string.
String#insert returns the string itself.
This means you can chain the method calls, which can be a prettier and more efficient if you only have to do it a couple of times like in your example:
str = "COSO17123456".insert(4, "-").insert(7, "-")
puts str
COSO-17-123456
Your reduce version can be therefore more concisely written as:
[4,7].reduce(str) { |str, idx| str.insert(idx, '-') }
I'll bring one more variation to the table, String#unpack:
new_str = str.unpack("A4A2A*").join('-')
# or with String#%
new_str = "%s-%s-%s" % str.unpack("A4A2A*")

Regex to match a specific sequence of strings

Assuming I have 2 array of strings
position1 = ['word1', 'word2', 'word3']
position2 = ['word4', 'word1']
and I want inside a text/string to check if the substring #{target} which exists in text is followed by either one of the words of position1 or following one of the words of the position2 or even both at the same time. Similarly as if I am looking left and right of #{target}.
For example in the sentence "Writing reports and inputting data onto internal systems, with regards to enforcement and immigration papers" if the target word is data I would like to check if the word left (inputting) and right (onto) are included in the arrays or if one of the words in the arrays return true for the regex match. Any suggestions? I am using Ruby and I have tried some regex but I can't make it work yet. I also have to ignore any potential special characters in between.
One of them:
/^.*\b(#{joined_position1})\b.*$[\s,.:-_]*\b#{target}\b[\s,.:-_\\\/]*^.*\b(#{joined_position2})\b.*$/i
Edit:
I figured out this way with regex to capture the word left and right:
(\S+)\s*#{target}\s*(\S+)
However what could I change if I would like to capture more than one words left and right?
If you have two arrays of strings, what you can do is something like this:
matches = /^.+ (\S+) #{target} (\S+) .+$/.match(text)
if matches and (position1.include?(matches[1]) or position2.include?(matches[2]))
do_something()
end
What this regex does is match the target word in your text and extract the words next to it using capture groups. The code then compares those words against your arrays, and does something if they're in the right places. A more general version of this might look like:
def checkWords(target, text, leftArray, rightArray, numLeft = 1, numRight = 1)
# Build the regex
regex = "^.+"
regex += " (\S+)" * numLeft
regex += " #{target}"
regex += " (\S+)" * numRight
regex += " .+$"
pattern = Regexp.new(regex)
matches = pattern.match(text)
return false if !matches
for i in 1..numLeft
return false if (!leftArray.include?(matches[i]))
end
for i in 1..numRight
return false if (!rightArray.include?(matches[numLeft + i]))
end
return true
end
Which can then be invoked like this:
do_something() if checkWords("data", text, position1, position2, 2, 2)
I'm pretty sure it's not terribly idiomatic, but it gives you a general sense of how you would do what you in a more general way.

Adding backreferenced value to its replacement

I am trying to add a number from a backreference to another number, but I seem to get only concatenation:
textStr = "<testsuite errors=\"0\" tests=\"4\" time=\"4.867\" failures=\"0\" name=\"TestRateUs\">"
new_str = textStr.gsub(/(testsuite errors=\"0\" tests=\")(\d+)(\" time)/, '\1\2+4\3')
# => "<testsuite errors=\"0\" tests=\"4+4\" time=\"4.867\" failures=\"0\" name=\"TestRateUs\">"
I tried also using to_i on the backreferenced value, but I can't get the extracted value to add. Do I need to do something to the value to make it addable?
If you are manipulating XML, I'd suggest using some specific library for that. In this answer, I just want to show how to perform operations on the submatches.
You can sum up the values inside a block:
textStr="<testsuite errors=\"0\" tests=\"4\" time=\"4.867\" failures=\"0\" name=\"TestRateUs\">"
new_str = textStr.gsub(/(testsuite errors=\"0\" tests=\")(\d+)(\" time)/) do
Regexp.last_match[1] + (Regexp.last_match[2].to_i + 4).to_s + Regexp.last_match[3]
end
puts new_str
See IDEONE demo
If we use {|m|...} we won't be able to access captured texts since m is equal to Regexp.last_match[0].to_s.

RegEx to remove new line characters and replace with comma

I scraped a website using Nokogiri and after using xpath I was left with the following string (which is a few td's pushed into one string).
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
My goal is to make this into an array that looks like the following(it will be a nested array):
["Total First Downs", "359", "274"]
The issue is creating a regex equation that removes the escaped characters, subs in one "," but does not sub in a "," after the last set of integers. If the comma after the last set of integers is necessary, I could use #compact to get rid of the nil that occurs in the array. If you need the code on how I scraped the website here it is: (please note i saved the webpage for testing in order for my ip address to not get burned during the trial phase)
f = File.open('page')
doc = Nokogiri::HTML:(f)
f.close
number = doc.xpath('//tr[#class="tbdy1"]').count
stats = Array.new(number) {Array.new}
i = 0
doc.xpath('//tr[#class="tbdy1"]').each do |tr|
stats[i] << tr.text
i += 1
end
Thanks for your help
I don't fully understand your problem, but the result can be easily achieved with this:
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
.split(/[\n\t]+/)
# => ["Total First Downs", "359", "274"]
Try with gsub
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t".gsub("/[\n\t]+/",",")

Incrementing numeric parameter in a URL parameter string?

I've had a look round and can't find what I need on Stack Overflow, and was wondering if someone had a simple solution.
I want to find a parameter within a URL and increment its value, so, as an example:
?kws=&pstc=&cty=&prvnm=1
I want to be able to locate the prvnm parameter no matter where it is in the string and increment its value by 1.
I know I could split the parameters into an array, find the key, increment it and write it back but that seems rather long winded and wondered if someone else had any ideas!
require "uri"
url = "http://example.com/?kws=&pstc=&cty=&prvnm=1"
def new_url(url)
uri = URI.parse(url)
hsh = Hash[URI.decode_www_form(uri.query)]
hsh['prvnm'] = hsh['prvnm'].next
uri.query = URI.encode_www_form(hsh).to_s
uri.to_s
end
new_url(url) # => "http://example.com/?kws=&pstc=&cty=&prvnm=2"
There are already four answers, so I had to come up with something a little different:
s = "?kws=&pstc=&cty=&prvnm=1"
head, sep, tail = s.partition(/(?<=[?&]prvnm=)\d+/)
head + (sep.to_i + 1).to_s + tail # => "?kws=&pstc=&cty=&prvnm=2"
'String#partition' returns an array of three strings [head, sep, tail], such that head + sep + tail => s, where separator is partition's argument, which can be a string or a regex.
We want the separator to be the digits following &prvnm=. We therefore use a regex with \d+ preceeded by the aforementioned string which we want to treat as having zero length, so it will not be included in the separator. That calls for a "positive look-behind": (?<=&prvnm=). \d+ is "greedy", so it take all consequetive digits.
For the given value of s, head, sep, tail = s.partition(/(?<=&prvnm=)(\d+)/)
=> ["?kws=&pstc=&cty=&prvnm=", "1", ""].
Edit: my thanks to #quetzalcoatl for pointing out that I needed to change (?<=&prvnm=) in my regex to what I have now, as what I had would fail when ?prvnm= was at the beginning of the string.
split the string by `&`
then iterate over the parts
then split each part by `=` and inspect the results
when found `prvnm`, parse the integer and increment it
then join the bits by '='
then join the parts by '&'
Or, use regex like:
/[?&]prvnm=\d+/
and parse the result and then do a replacement.
Or, get some URL-parsing library..
Try something like this:
params = "?kws=&pstc=&cty=&prvnm=1"
num = params.scan(/prvnm=(\d)/)[0].join.to_i
puts num + 1
Use:
require 'uri'
Then:
parsed-url= URI.parse( ur full url)
r = CGI.parse(parsed_url.query)
r is now a hash of all your query parameters.
You can easily access it by using:
r["prsvn"].to_i + 1

Resources