Ruby replace array list - ruby

I have two strings:
packages="­linux-imag­e-3.2.0-4-­amd64 linux­-libc-dev linux­-headers-3­.2.0-4-amd­64 linux­-headers-3­.2.0-4-com­mon dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
I need a string pkgs that has the content of packages without exclusion like this:
pkgs="­linux-imag­e-3.2.0-4-­amd64 linux­-libc-dev linux­-headers-3­.2.0-4-amd­64 linux­-headers-3­.2.0-4-com­mon"
I tried the following code:
pkgs = packages.gsub!( /(?<!^|,)#{exclusion}(?!,|$)/, '\1')
which does not seem to be working. What would be the best working solution in this case?

packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
(packages.split - exclusion.split).join(" ") # => "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"

You need your variables to be arrays, not strings. Then you can just use the - operator to "subtract" the items in exclusion from packages:
packages = [ "­linux-imag­e-3.2.0-4-­amd64",
"linux­-libc-dev",
"linux­-headers-3­.2.0-4-amd­64",
"linux­-headers-3­.2.0-4-com­mon",
"dnsutils",
"mysql-server-5.5" ]
exclusion = [ "dnsutils", "mysql-server-5.5" ]
remaining = packages - exclusion
# => [ "­linux-imag­e-3.2.0-4-­amd64",
# "linux­-libc-dev",
# "linux­-headers-3­.2.0-4-amd­64",
# "linux­-headers-3­.2.0-4-com­mon" ]
If you then need the values in a single string, join them together with the join method:
remaining_str = remaining.join(" ")
# => "­linux-imag­e-3.2.0-4-­amd64 linux­-libc-dev linux­-headers-3­.2.0-4-amd­64 linux­-headers-3­.2.0-4-com­mon"

If you want to keep it simple, you can always split these strings into arrays, and join the difference.
(packages.split - exclusion.split).join ' '
String's split method will default to space characters. This give you two arrays, where you subtract the any values that exist in the both the first and second array from the first array. You then join this new array with space characters.
Longer example:
packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
one = packages.split
# >> ["linux-image-3.2.0-4-amd64", "linux-libc-dev", "linux-headers-3.2.0-4-amd64", "linux-headers-3.2.0-4-common", "dnsutils", "mysql-server-5.5"]
two = exclusion.split
# >> ["dnsutils", "mysql-server-5.5"]
difference = one - two
# >> ["linux-image-3.2.0-4-amd64", "linux-libc-dev", "linux-headers-3.2.0-4-amd64", "linux-headers-3.2.0-4-common"]
finished = difference.join ' '
# >> "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"

Related

Black is not honoring #fmt: skip

With below configuration in pyproject.toml :
[tool.black]
# How many characters per line to allow.
line-length = 120
# When processing Jupyter Notebooks, add the given magic to the list of known
# python-magics (timeit, prun, capture, pypy, python3, python, time).
# Useful for formatting cells with custom python magics.
# python-cell-magics =
# Require a specific version of Black to be running
# (useful for unifying results across many environments e.g. with a pyproject.toml file).
# It can be either a major version number or an exact version.
# required-version =
# A regular expression that matches files and directories that should be
# included on recursive searches. An empty value means all files are included
# regardless of the name. Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# include = "(\.pyi?|\.ipynb)$"
# A regular expression that matches files and directories that should be
# excluded on recursive searches. An empty value means no paths are excluded.
# Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# exclude = "/(\.direnv|\.eggs|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|venv|\.svn|\.ipynb_checkpoints|_build|buck-out|build|dist|__pypackages__)/"
# Like 'exclude', but adds additional files and directories on top of the excluded ones.
# (Useful if you simply want to add to the default).
# extend-exclude =
# Like 'exclude', but files and directories matching this regex will be excluded
# even when they are passed explicitly as arguments.
# force-exclude =
# The name of the file when passing it through stdin.
# Useful to make sure Black will respect 'force-exclude' option on some editors that rely on using stdin.
# stdin-filename =
# Number of parallel workers.
# Can be a number or a range.
# workers =
and this command line :
black --config "pyproject.toml" --target-version py39 --check --diff .
the following line of code is flagged :
ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # cspell: disable-line # fmt: skip
--- properties/datasets/models.py 2022-11-30 00:01:16.590743 +0000
+++ properties/datasets/models.py 2022-11-30 00:01:18.692767 +0000
## -746,11 +746,13 ##
calculate the mean value of all the dataset points
return: numerical value of this function when all variables are zero
rtype: float
"""
- ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # fmt:skip
+ ave_quantity = self.exec_math(
+ math_iterable["mean"], "mean", []
+ ) # execute the "mean" fxn on the dataset # fmt:skip
return getattr(ave_quantity, "magnitude", 0.0)
def serialize(self, flat=False):
return {
"type": "dataset",
would reformat properties/datasets/models.py
Oh no! 💥 💔 💥
1 file would be reformatted, 102 files would be left unchanged.
What am I missing here?
Using black v22.10.0
Also asked here --> https://github.com/psf/black/issues/451#issuecomment-1331478945

How to get a block at an offset in the IO.foreach loop in ruby?

I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?
file_names = [""]
IO.foreach("a.txt") { |block|
if block =~ /^file_names*/
dir = # get the next block
file_names.append(dir)
end
}
Actually my input looks like this:
file_names[174]:
name: "vector"
dir_index: 1
mod_time: 0x00000000
length: 0x00000000
file_names[175]:
name: "stl_bvector.h"
dir_index: 2
mod_time: 0x00000000
length: 0x00000000
I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.
You can use #each_cons to get the value of the next 4 rows from the text file:
files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
if block[0] =~ /file_names.*/
o << block[1..4].map{|e| e.split(':')[1]}
end
end
puts files
#=> "vector"
# 1
# 0x00000000
# 0x00000000
# "stl_bvector.h"
# 2
# 0x00000000
# 0x00000000
Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:
o << block[1..4].map{ |e| e.partition(':').last.strip}
I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:
p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]
(the values don't contain the \ escape character, that's just the way #p shows it).
Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:
IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
o << block[1..4].map{ |e| e.partition(':').last.strip }
end
It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:
File.readlines("data.txt").slice_before(/\Afile_names/)
Now you have an array of arrays that looks like:
[
[
"file_names[174]:\n",
" name: \"vector\"\n",
" dir_index: 1\n",
" mod_time: 0x00000000\n",
" length: 0x00000000\n"
],
[
"file_names[175]:\n",
" name: \"stl_bvector.h\"\n",
" dir_index: 2\n",
" mod_time: 0x00000000\n",
" length: 0x00000000"
]
]
Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.

PySpark - Sort RDD by Second Column

I've this RDD:
[[u''], [u'E01', u'Lokesh'], [u'E10', u'Venkat'], [u'EO2', u'Bhupesh'], [u'EO3', u'Amit'], [u'EO4', u'Ratan'], [u'EO5', u'Dinesh'], [u'EO6', u'Pavan'], [u'EO7', u'Tejas'], [u'EO8', u'Sheela']]
And I want to sort by the second column (name). I try this but without success:
[u'EO3', u'Amit'],
[u'EO2', u'Bhupesh'],
[u'EO5', u'Dinesh'],
[u'E01', u'Lokesh'],
[u'EO6', u'Pavan'],
[u'EO8', u'Sheela'],
[u'EO7', u'Tejas'],
[u'E10', u'Venkat']
I try with this:
sorted = employee_rows.sortBy(lambda line: line[1])
But it gives me this:
IndexError: list index out of range
How can sortby the second column?
Thanks!
In general, you should make all of your higher order rdd functions robust to bad inputs. In this case, your error is because you have at least one record that does not have a second column.
One way is to put a condition check on the length of line inside the lambda:
employee_rows.sortBy(lambda line: line[1] if len(line) > 1 else None).collect()
#[[u''],
# [u'EO3', u'Amit'],
# [u'EO2', u'Bhupesh'],
# [u'EO5', u'Dinesh'],
# [u'E01', u'Lokesh'],
# [u'EO6', u'Pavan'],
# [u'EO4', u'Ratan'],
# [u'EO8', u'Sheela'],
# [u'EO7', u'Tejas'],
# [u'E10', u'Venkat']]
Or you could define a custom sort function with try/except. Here's a way to make the "bad" rows sort last:
def mysort(line):
try:
return line[1]
except:
# since you're sorting alphabetically
return 'Z'
employee_rows.sortBy(mysort).collect()
#[[u'EO3', u'Amit'],
# [u'EO2', u'Bhupesh'],
# [u'EO5', u'Dinesh'],
# [u'E01', u'Lokesh'],
# [u'EO6', u'Pavan'],
# [u'EO4', u'Ratan'],
# [u'EO8', u'Sheela'],
# [u'EO7', u'Tejas'],
# [u'E10', u'Venkat'],
# [u'']]

Splitting a single string of hashes into an array of hashes

I can't get regex to split the string to give the desired result.
http://rubular.com/r/ytFwP3ivAv - according to rubular this expression should work.
str = "{"DATE"=>"11/26/2013 11:15", "DESC"=>"Accident (minor)", "LOCATION"=>"12 S THORNTON AV", "DISTRICT"=>"C5", "INCIDENT"=>"2013-00496193"}, {"DATE"=>"11/26/2013 11:10", "DESC"=>"Hold-up alarm", "LOCATION"=>"4725 S KIRKMAN RD", "DISTRICT"=>"E5", "INCIDENT"=>"2013-00496235"}"
sub_str_array = str.split(/({"[\w"=>\/ :,()-]*})/)
# the desired result - each hash is an element in an array
puts the_split[0] #=> {"DATE"=>"11/26/2013 11:15", "DESC"=>"Accident (minor)", "LOCATION"=>"12 S THORNTON AV", "DISTRICT"=>"C5", "INCIDENT"=>"2013-00496193"}
Is there another way (an easier way) to convert these string hashes into an array of hashes?
You can use this:
require 'json'
yourstr = '[' + '{"DATE"=>"11/26/2013 11:15", "DESC"=>"Accident (minor)", "LOCATION"=>"12 S THORNTON AV", "DISTRICT"=>"C5", "INCIDENT"=>"2013-00496193"}, {"DATE"=>"11/26/2013 11:10", "DESC"=>"Hold-up alarm", "LOCATION"=>"4725 S KIRKMAN RD", "DISTRICT"=>"E5", "INCIDENT"=>"2013-00496235"}, {"DATE"=>"11/26/2013 11:08", "DESC"=>"Missing person - adult", "LOCATION"=>"4818 S SEMORAN BV 503", "DISTRICT"=>"K1", "INCIDENT"=>"2013-00496198"}, {"DATE"=>"11/26/2013 11:07", "DESC"=>"911 hang up", "LOCATION"=>"311 W PRINCETON ST", "DISTRICT"=>"C2", "INCIDENT"=>"2013-00496231"}' + ']'
my_hash = JSON.parse(yourstr.gsub("=>", ":"))
puts my_hash[0]
You've set str as an object. Wrap it in quotes and it should work.
It may be better to use %Q(string goes here) rather than double quotes.
You can use eval "[#{str}]", if str is hardcoded and nobody can change it.

Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)

Let's say I have an array of Twitter account names:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
And a prepend and append variable:
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this:
tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday']
(The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.)
Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet.
It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far:
# Create one long string of #usernames separated by a space
tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ')
# alternative: tmp = '#' + twitter_accounts.join(' #')
# Number of characters left for mentioning the Twitter accounts
length = 140 - (prepend + append).length
# This method would split a string into multiple strings
# each with a maximum length of 'length' and it will only split on empty spaces (' ')
# ideally strip that space as well (although .map(&:strip) could be use too)
tweets = tmp.some_method(' ', length)
# Prepend and append
tweets.map!{|t| prepend + t + append}
P.S.
If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string:
arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
str = arr.map{|name|"##{name}"}.join(' ')
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
max_chars = 140 - prepend.size - append.size
until str.size <= max_chars do
p str.slice!(0, str.rindex(" ", max_chars))
str.lstrip! #get rid of the leading space
end
p str unless str.empty?
I'd make use of reduce for this:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
prepend = 'Check out these cool people:'
append = '#FollowFriday'
# Extra -1 is for the space before `append`
max_content_length = 140 - prepend.length - append.length - 1
content_strings = string.reduce([""]) { |result, target|
result.push("") if result[-1].length + target.length + 2 > max_content_length
result[-1] += " ##{target}"
result
}
tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" }
Which would yield:
"Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday"
"Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday"
"Check out these cool people: #example18 #example19 #example20 #FollowFriday"

Resources