Print out the duplicates and the amount of duplicates in ruby arrays - ruby

if I gave you an array:
['apples', 'bananas', 'apples','apples','apples', 'cat', 'dog', 'dog', 'troll']
and said:
Print me out the name of an each items and how often they appear, such that the out put was:
apples 4
bananas 1
cat 1
dog 2
troll 1
How would you do this, it seems simple, but to me it is stumping me.

Do as below :-
array = [
'apples', 'bananas', 'apples','apples',
'apples', 'cat', 'dog', 'dog', 'troll'
]
array.group_by(&:to_s).each do |k,v|
puts "#{k} #{v.size}"
end
# >> apples 4
# >> bananas 1
# >> cat 1
# >> dog 2
# >> troll 1

Related

How to convert bash array to list and save to a file?

I'm having array and I want to convert it in the form of list and save it to a file.
Here is what I tried:
export arrVal=(a,b,c)
echo NEWLIST="${arrVal[#]}" >> newtextfile
Output:
NEWLIST=a,b,c
Expected Output:
NEWLIST=[a,b,c]
You can add the square brackets to your expression, something like that:
export arrVal=(a,b,c)
echo NEWLIST="[${arrVal[#]}]"
Output:
NEWLIST=[a,b,c]
As #pmf wrotes in comment...
arrVal=(a,b,c)
...is only one value of key 0.
Look...
array=(a,b,c)
echo ${#array[#]} # puts out: 1
# Or only key 0...
echo ${array[0]} # puts out: a,b,c
Now...
array=(a b c)
echo ${#array[#]} # puts out: 3
# You can loop over it with...
for key in ${array[#]}; do echo ${key}; done
# That puts out...
a
b
c

Parse YAML to key value and include yaml categories

Was looking to parse a YAML file into plain key=value strings.
I have some initial structure, but I wanted to get some of the keys from a yaml as well.
test:
line1: "line 1 text"
line2: "line 2 text"
line3: "line 3 text"
options:
item1: "item 1 text"
item2: "item 2 text"
item3: "item 3 text"
Ruby:
File.open("test.yml") do |f|
f.each_line do |line|
line.chomp
if line =~ /:/
line.chop
line.sub!('"', "")
line.sub!(": ", "=")
line.gsub!(/\A"|"\Z/, '')
printline = line.strip
puts "#{printline}"
target.write( "#{printline}")
end
end
end
The results currently look like
test:
line1=line 1 text
line2=line 2 text
line3=line 4 text
options:
item1=item 1 text
item2=item 2 text
item3=item 3 text
But I am looking to add the category before like:
test/line1=line 1 text
test/line2=line 2 text
test/line3=line 3 text
options/item1=item 1 text
options/item2=item 2 text
options/item3=item 3 text
What is the best way to include the category for each line?
You could use the YAML#load_file, read each line and adapt it to your need:
foo = YAML.load_file('file.yaml').map do |key, value|
value.map { |k, v| "#{key}/#{k}=#{v}" }
end
foo.each { |value| puts value }
# test/line1=line 1 text
# test/line2=line 2 text
# test/line3=line 3 text
# options/item1=item 1 text
# options/item2=item 2 text
# options/item3=item 3 text
You can easily convert YAML to a hash:
#test.yml
test:
line1: "line 1 text"
line2: "line 2 text"
line3: "line 3 text"
options:
item1: "item 1 text"
item2: "item 2 text"
item3: "item 3 text"
#ruby
hash = YAML.load File.read('test.yml')
Now you can do anything you want with the hash, get the keys, values etc.
hash['options']['item1'] #=> "item 1 text"
hash['test']['line1'] #=> "line 1 text"

How to add lines after a pattern using sed

In shell script, how can I add lines after a certain pattern? Say I have the following file and I want to add two lines after block 1 and blk 2.
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
[block 1] and [blk 2] will be present in the file.
The output I am expecting is below.
abc
def
[block 1]
oranges = 5
pears = 2
apples = 3
grapes = 4
[blk 2]
oranges = 5
pears = 2
banana = 2
apples = 3
I thought of doing this with sed. I tried the below command but it does not work on my Mac. I checked these posts but I couldn't find what I am doing wrong.
$sed -i '/\[block 1\]/a\n\toranges = 3\n\tpears = 2' sample2.txt
sed: 1: "sample2.txt": unterminated substitute pattern
How can I fix this? Thanks for your help!
[Edit]
I tried the below and these didn't work on my Mac.
$sed -E '/\[block 1\]|\[blk 2\]/r\\n\\toranges = 3\\n\\tpears = 2' sample2.txt
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
$sed -E '/\[block 1\]|\[blk 2\]/r\n\toranges = 3\n\tpears = 2' sample2.txt
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
Awk attempt:
$awk -v RS= '/\[block 1\]/{$0 = $0 ORS "\toranges = 3" ORS "\tpears = 2" ORS}
/\[blk 2\]/{$0 = $0 ORS "\toranges = 5" ORS "\tpears = 2" ORS} 1' sample2.txt
abc
def
[block 1]
apples = 3
grapes = 4
[blk 2]
banana = 2
apples = 3
oranges = 3
pears = 2
oranges = 5
pears = 2
Note that the text provided to the a command has to be on a separate line:
sed '/\[block 1\]/ {a\
\toranges = 3\n\tpears = 2
}' file
and all embedded newlines have to be escaped. Another way to write it (probably more readable):
sed '/\[block 1\]/ {a\
oranges = 3\
pears = 2
}' file
Also, consider the r command as an alternative to the a command when larger amounts of text have to be inserted (e.g. more than one line). It will read data from a text file provided:
sed '/\[block 1\]/r /path/to/text' file
To handle multiple sections with one sed program, you can use the alternation operator (available in ERE, notice the -E flag):
sed -E '/\[block 1\]|\[blk 2\]/r /path/to/text' file
This awk should work with empty RS. This breaks each block into a single record.
awk -v RS= '/\[block 1\]/{$0 = $0 ORS "\toranges = 3" ORS "\tpears = 2" ORS}
/\[blk 2\]/{$0 = $0 ORS "\toranges = 5" ORS "\tpears = 2" ORS} 1' file
abc
def
[block 1]
apples = 3
grapes = 4
oranges = 3
pears = 2
[blk 2]
banana = 2
apples = 3
oranges = 5
pears = 2
This might work for you (GNU sed):
sed '/^\[\(block 1\|blk 2\)\]\s*$/{n;h;s/\S.*/oranges = 5/p;s//pears = 2/p;x}' file
Locate the required match, print it and then store the next line in the hold space. Replace the first non-space character to the end of the line with the first required line, repeat for the second required string and then revert to the original line.

How can i delete an element in an array and then shift the array in Shell Script?

First let me state my problem clearly:
Ex: Let's pretend this is my array, (the elements don't matter as in my actual code they vary):
array=(jim 0 26 chris billy 78 hello foo bar)
Now say I want to remove the following elements:
chris 78 hello
So I did: unset array[$i] while looping through the array.
This removes the elements correctly, however, i end up with an array that looks like this:
array=(jim 0 26 '' billy '' '' foo bar)
I need it to look like this:
array=(jim 0 26 billy foo bar)
where jim is at index 0, 0#1, 26#2, etc..
How do I delete the elements in the array and move the other elements so that there are no null/empty spaces in the array?
Thanks!
Try this:
$ array=( "one two" "three four" "five six" )
$ unset array[1]
$ array=( "${array[#]}" )
$ echo ${array[0]}
one two
$ echo ${array[1]}
five six
Shell arrays aren't really intended as data structures that you can add and remove items from (they are mainly intended to provide a second level of quoting for situations like
arr=( "one two" "three four" )
somecommand "${arr[#]}"
to provide somecommand with two, not four, arguments). But this should work in most situations.
See http://www.thegeekstuff.com/2010/06/bash-array-tutorial
Remove an Element from an Array
...
Unix=('Debian' 'Red hat' 'Ubuntu' 'Suse' 'Fedora' 'UTS' 'OpenLinux');
pos=3
Unix=(${Unix[#]:0:$pos} ${Unix[#]:$(($pos + 1))})
This contracts the array around pos, which the original poster wanted.
Try this:
user#pc:~$ array=(jim 0 26 chris billy 78 hello foo bar)
user#pc:~$ for itm2rm in chris 78 hello; do array=(\`echo ${array[#]} | sed "s/\<${itm2rm}\>//g"\`); done ; echo ${array[#]}
jim 0 26 billy foo bar
this post had been revised and moved to its own post as a more in-depth tutorial how to remove an array element correctly in a for loop

Perform highly customized sort based on multiple columns of a CSV file?

I have a four-column CSV file, using # as the separator, e.g.:
0001 # fish # animal # eats worms
The first column is the only column guaranteed to be unique.
I need to perform four sort operations on columns 2, 3, and 4.
First, column 2 is sorted alphanumerically. The important feature of this sort is it must guarantee that any duplicate entries within column 2 are next to each other, e.g.:
# a # #
# a # #
# a # #
# a # #
# a # #
# b # #
# b # #
# c # #
# c # #
# c # #
# c # #
# c # #
Next, within the first sort, sort the lines into two categories. The first lines are those which do not contain the words “arch.”, “var.”, “ver.”, “anci.” or “fam.” anywhere within column 4. The second lines (which are sorted after), are those containing those words, e.g.:
# a # # Does not have one of those words.
# a # # Does not have one of those words.
# a # # Does not have one of those words.
# a # # Does not have one of those words.
# a # # This sentence contains arch.
# b # # Does not have one of those words.
# b # # Has the word ver.
# c # # Does not have one of those words.
# c # # Does not have one of those words.
# c # # Does not have one of those words.
# c # # This sentence contains var.
# c # # This sentence contains fam.
# c # # This sentence contains fam.
Finally, sorting only within the separate categories of the second sort, sort the lines from “contains the most duplicate entries within column 3” to “contains the least number of duplicate entries within column 3”, e.g.:
# a # fish # Does not have one of those words.
# a # fish # Does not have one of those words.
# a # fish # Does not have one of those words.
# a # tiger # Does not have one of those words.
# a # bear # This sentence contains arch.
# b # fish # Does not have one of those words.
# b # fish # Has the word ver.
# c # bear # Does not have one of those words.
# c # bear # Does not have one of those words.
# c # fish # Does not have one of those words.
# c # tiger # This sentence contains var.
# c # tiger # This sentence contains fam.
# c # bear # This sentence contains fam.
How can I sort the file alphanumerically by column 2, by the appearance of some key words in column 4, and by most common duplicate to least common duplicate in column 3?
TXR: ( http://www.nongnu.org/txr )
#(bind special-words ("arch." "var." "ver." "anci." "fam."))
#(bind ahash #(hash :equal-based))
#(repeat)
#id ## #alpha ## #animal ## #words
# (rebind words #(split-str words " "))
# (bind record (id alpha animal words))
# (do (push record [ahash alpha]))
#(end)
#(bind sorted-rec-groups nil)
#(do
(defun popularity-sort (recs)
(let ((histogram [group-reduce (hash)
third (do inc #1)
recs 0]))
[sort recs > [chain third histogram]]))
(dohash (key records ahash)
(let (contains does-not combined)
(each* ((r records)
(w [mapcar fourth r]))
(if (isec w special-words)
(push r contains)
(push r does-not)))
(push (append (popularity-sort does-not)
(popularity-sort contains))
sorted-rec-groups)))
(set sorted-rec-groups [sort sorted-rec-groups :
[chain first second]]))
#(output)
# (repeat)
# (repeat)
#(rep)#{sorted-rec-groups} ## #(last)#{sorted-rec-groups " "}#(end)
# (end)
# (end)
#(end)
Data:
0001 # b # fish # Does not have one of those words.
0002 # a # bear # Does not have one of those words.
0003 # b # bear # Has the word ver.
0004 # a # fish # Does not have one of those words.
0005 # c # bear # Does not have one of those words.
0006 # c # bear # Does not have one of those words.
0007 # a # fish # Does not have one of those words.
0008 # c # fish # Does not have one of those words.
0009 # a # fish # Does not have one of those words.
0010 # c # tiger # This sentence contains var.
0011 # c # bear # This sentence contains fam.
0012 # a # fish # Does not have one of those words.
0013 # c # tiger # This sentence contains fam.
Run:
$ txr sort.txr data.txt
0004 # a # fish # Does not have one of those words.
0007 # a # fish # Does not have one of those words.
0009 # a # fish # Does not have one of those words.
0012 # a # fish # Does not have one of those words.
0002 # a # bear # Does not have one of those words.
0001 # b # fish # Does not have one of those words.
0003 # b # bear # Has the word ver.
0005 # c # bear # Does not have one of those words.
0006 # c # bear # Does not have one of those words.
0008 # c # fish # Does not have one of those words.
0010 # c # tiger # This sentence contains var.
0013 # c # tiger # This sentence contains fam.
0011 # c # bear # This sentence contains fam.
Here's an answer to your first question to help you get started:
sort data -t "#" -k 2,2 -k 3,4
How it works:
-t specifies the field separator which for you is the "#" sign.
-k 2,2 means sort on field two
-k 3,4 means resolve ties by sorting on field 3, then field 4
Here's a solution in Ruby.
#!/usr/bin/env ruby
class Row
SEPARATOR = " # "
attr_accessor :cols
def initialize(text)
#cols = text.chomp.split(SEPARATOR)
#cols.size == 4 or raise "Expected text to have four columns: #{text}"
duplicate_increment
end
def has_words?
cols[3]=~/arch\.|var\.|ver\.|anci\.|fam\./ ? true : false
end
def to_s
SEPARATOR +
#cols[1,3].join(SEPARATOR) +
" -- id:#{cols[0]} duplicates:#{duplicate_count}"
end
### Comparison
def <=>(other)
other or raise "Expected other to exist"
cmp = self.cols[1] <=> other.cols[1]
return cmp if cmp !=0
cmp = (self.has_words? ? 1 : -1) <=> (other.has_words? ? 1 : -1)
return cmp if cmp !=0
other.duplicate_count <=> self.duplicate_count
end
### Track duplicate entries
##duplicate_count = Hash.new{|h,k| h[k]=0}
def duplicate_key
[cols[1],has_words?]
end
def duplicate_count
##duplicate_count[duplicate_key]
end
def duplicate_increment
##duplicate_count[duplicate_key] += 1
end
end
### Main
lines = ARGF
rows = lines.map{|line| Row.new(line) }
sorted_rows = rows.sort
sorted_rows.each{|row| puts row }
Input:
0001 # b # fish # text
0002 # a # bear # text
0003 # b # bear # ver.
0004 # a # fish # text
0005 # c # bear # text
0006 # c # bear # text
0007 # a # fish # text
0008 # c # fish # text
0009 # a # fish # text
0010 # c # lion # var.
0011 # c # bear # fam.
0012 # a # fish # text
0013 # c # lion # fam.
Output:
$ cat data.txt | ./sorter.rb
# a # fish # text -- id:0007 duplicates:5
# a # bear # text -- id:0002 duplicates:5
# a # fish # text -- id:0012 duplicates:5
# a # fish # text -- id:0004 duplicates:5
# a # fish # text -- id:0009 duplicates:5
# b # fish # text -- id:0001 duplicates:1
# b # bear # ver. -- id:0003 duplicates:1
# c # bear # text -- id:0005 duplicates:3
# c # fish # text -- id:0008 duplicates:3
# c # bear # text -- id:0006 duplicates:3
# c # lion # var. -- id:0010 duplicates:3
# c # bear # fam. -- id:0011 duplicates:3
# c # lion # fam. -- id:0013 duplicates:3
 q
First, I load the "csv" and get it into the right shape. The test data is called "worms" on my computer but because q doesn't use strings as the file name "type" (to protect against e.g. injection attacks), I need to use hsym to make a "file name":
t:flip `id`a`b`c!("SSSS";"#")0:hsym`worms;
Then I worked on which "fourth field" entries contained one of your words. I built a bitmap using like and applying it to each row(left) then each pattern(right) to get 0 where the word is not present, or 1 where one of them is:
t:update p:any each c like/:\:("*arch.*";"*var.*";"*ver.*";"*anci.*";"*fam.*") from t;
Then I want to find the number of duplicates. This is simply the count of rows by column 2 (a), column 3 (b) and within the present-category:
t:update d:neg count i by a,b,p from t;
Finally, I because I negated the count, all of my values "go the same way", so I can simply sort by those three columns:
`a`p`d xasc t
This might work for you (very inelegant!):
sed 's/[^#]*#\([^#\]*\)#\([^#]*\)/\1\t\2\t&/;h;s/#/&\n/3;s/.*\n//;/\(arch\|var\|ver\|anci\|fam\)\./!ba;s/.*/1/;bb;:a;s/.*/0/;:b;G;s/\(.\)\n\([^\t]*\)/\2\t\1/' file |
sort |
tee file1 |
sed 's/\(.*\)\t.*/\1/' |
uniq -c |
sed 's|^\s*\(\S*\) \(.*\t.*\t\(.*\)\)|/^\2/s/\3/\1/|' >file.sed
sed -f file.sed file1 |
sort -k1,2 -k3,3nr |
sed 's/\t/\n/3;s/.*\n//'
1 # a # fish # Does not have one of those words.
2 # a # fish # Does not have one of those words.
3 # a # fish # Does not have one of those words.
4 # a # tiger # Does not have one of those words.
5 # a # bear # This sentence contains arch.
6 # b # fish # Does not have one of those words.
7 # b # fish # Has the word ver.
8 # c # bear # Does not have one of those words.
9 # c # bear # Does not have one of those words.
10 # c # fish # Does not have one of those words.
11 # c # tiger # This sentence contains var.
12 # c # tiger # This sentence contains fam.
13 # c # bear # This sentence contains fam.
Explanation:
Make sort keys consisting of:
The 2nd field
0/1: 0 represents 4th field without arch./var./etc. 1 represents those with.
The count of 3rd field duplicates after sorting the above 2.
The file is eventually sorted using the above keys and then the keys deleted.

Resources