Greedy conditional algorithm - ruby

I have a list of k artistes mapped to their respective music videos that they have starred in. This is represented in a multidimensional array:
musicvid_arr =
[["MUSICVID 1", 2014, ["ARTISTE 1", "ARTISTE 2", "ARTISTE 3"]],
["MUSICVID 2", 2014, ["ARTISTE 4", "ARTISTE 1", "ARTISTE 9", "ARTISTE 10"]],
["MUSICVID 3", 1935, ["ARTISTE 2", "ARTISTE 10", "ARTISTE 6"]],
["MUSICVID 4", 2010, ["ARTISTE 1", "ARTISTE 2", "ARTISTE 3"]],
["MUSICVID 5", 2009, ["ARTISTE 4", "ARTISTE 1", "ARTISTE 9", "ARTISTE 2", "ARTISTE 6", "ARTISTE 5"]],
["MUSICVID 6", 2014, ["ARTISTE 18", "ARTISTE 10", "ARTISTE 6", "ARTISTE 2"]],
["MUSICVID 7", 2014, ["ARTISTE 9", "ARTISTE 2", "ARTISTE 3", "ARTISTE 0", "ARTISTE 9"]],
["MUSICVID 8", 2000, ["ARTISTE 8", "ARTISTE 3", "ARTISTE 9", "ARTISTE 11", "ARTISTE 2", "ARTISTE 1"]],
["MUSICVID 9", 2014, ["ARTISTE 21", "ARTISTE 0", "ARTISTE 6"]],
["MUSICVID 10", 2014, ["ARTISTE 12", "ARTISTE 2", "ARTISTE 3"]],
["MUSICVID 11", 2013, ["ARTISTE 14", "ARTISTE 1", "ARTISTE 9", "ARTISTE 12"]],
["MUSICVID 12", 2014, ["ARTISTE 2"]]]
I want to create a method get_artistes that takes the parameters: k , r, and musicvid_arr:
def get_artistes(k, r, musicvid_arr)
# the code here
end
where
k: the number of artistes to return
r: the least number of artistes found in the return array of k artistes that must appear in each music video for the music video to be counted/valid
This method should return a list of artistes. If k = 3:
["ARTISTE 1", "ARTISTE 2", "ARTISTE 9"]
This image gives a better understanding of how k and r affect the most number. With reference to the image above,
# for r = 1 , it would have 11 valid music videos.
# for r = 2 , it would have 6 valid music videos.
# for r = 3 , it would have 3 valid music videos.
No matter what r and k we pass to this method, we want an array of artistes that have the most number of valid music videos.
What would be an effective and efficient approach on tackling this problem?
I attempted to do this via the following algorithm. I do not think that it is the most effective. With big datasets, it takes very long to run.
def get_artistes(k, r, musicvid_arr)
musicvid_arr=musicvid_arr.select{|t| t[2].size>=r}
artiste_arr = musicvid_arr.map.reduce({}){|a,vs|vs[2].each{|v|(a[v]||= [])<< vs[0]};a}.to_a.sort_by{|x| -x[1].count}
output = []
for i in 0...k
output << artiste_arr[i]
end
return output
end

Often Jr. Programmers doing something new see the problem as complex and feel the solution should have equal complexity.
Regardless of the complexity of any problem when writing code, the solution, should be clear and readable. If you can't speak your code aloud it's too complicated.
You should create a class ex: ParseArtists with multiple methods that clearly state each step in the process of parsing artists. This class should have the single responsibility of parsing artists with descriptive variable names, and small obvious methods.

Related

Elasticsearch how to index an existing json file

I use a PUT command:
curl -PUT "http://localhost:9200/music/lyrics/2" --data-binary #D:\simple\caseyjones.json
caseyjones.json:
{
"artist": "Wallace Saunders",
"year": 1909,
"styles": ["traditional"],
"album": "Unknown",
"name": "Ballad of Casey Jones",
"lyrics": "Come all you rounders if you want to hear
The story of a brave engineer
Casey Jones was the rounder's name....
Come all you rounders if you want to hear
The story of a brave engineer
Casey Jones was the rounder's name
On the six-eight wheeler, boys, he won his fame
The caller called Casey at half past four
He kissed his wife at the station door
He mounted to the cabin with the orders in his hand
And he took his farewell trip to that promis'd land
Chorus:
Casey Jones--mounted to his cabin
Casey Jones--with his orders in his hand
Casey Jones--mounted to his cabin
And he took his... land"
}
Warning: failed to parse, document is empty. But log has show contents of *.json file.
JSON does not allow line-breaks. So, you need to replace all the line breaks with \n (platform specific) and store the text as a single line.
Like:
{
"artist": "Wallace Saunders",
"year": 1909,
"styles": ["traditional"],
"album": "Unknown",
"name": "Ballad of Casey Jones",
"lyrics": "Come all you rounders if you want to hear\nThe story of a brave engineer\nCasey Jones was the rounder's name...."
}

Sort a hash by key containing date in string format

I have a hash containing date in string format as -
{Oct 2014: "some value", Aug 2012: "some value", July 2011: "new value"}
I want to sort them based on those. I tried calling sort_by or sort on keys, but since they are stored in string format, it sorts them alphabetically.
hash_name.keys.sort
This gives me in order of Aug 2012, July 2011, Oct 2014. While I'm trying to sort them in order of the year and month as --- July 2011, Aug 2012, Oct 2014
Use strptime method to convert each key to a Date object :
require 'date'
hash = {
'Oct 2014' => "some value",
'Aug 2012' => "some value",
'July 2011' => "new value"
}
hash.sort_by { |k,_| Date.strptime(k,"%b %Y") }
# => [["July 2011", "new value"],
# ["Aug 2012", "some value"],
# ["Oct 2014", "some value"]]
Note :
%b - The abbreviated month name ('Jan')
%Y - Year with century (can be negative, 4 digits at least) - 0001, 0000, 1995, 2009, 14292, etc.

What is a regex to capture the total amount from strings? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I need to parse the total amount from different files. The layout of each file is different so the lines I need to parse vary.
What should be the regex for capturing from a sting a number that falls after "Total"?
It needs to be case insensitive and should consider the closest match after "Total". There can be anything before or after the word "Total", and I need the first number that comes after it.
For example:
from string "Service charges: 10 Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount: 100 Shipping: 10"
from string "Service charges: 10 Grand Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"
The output should be 100 in all the above cases.
If all you're really asking about is a pattern match for various strings, look at using scan and grab the numeric strings:
[
"Service charges: 10 Total: 100 Shipping: 10",
"Service charges: 10 Total Amount: 100 Shipping: 10",
"Service charges: 10 Grand Total: 100 Shipping: 10",
"Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s.scan(/\d+/)[1] }
=> ["100", "100", "100", "100"]
This assumes you want the second number in each string.
If that order is going to change, which is unlikely because it looks like you're scanning invoices, then variations on the pattern and/or scan will work. This switches it up and uses a standard regex search based on the location of "Total", some possible intervening text, followed by ":" and the total value:
[
"Service charges: 10 Total: 100 Shipping: 10",
"Service charges: 10 Total Amount: 100 Shipping: 10",
"Service charges: 10 Grand Total: 100 Shipping: 10",
"Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1] }
=> ["100", "100", "100", "100"]
To get the integer values append to_i inside the map statement:
[
"Service charges: 10 Total: 100 Shipping: 10",
"Service charges: 10 Total Amount: 100 Shipping: 10",
"Service charges: 10 Grand Total: 100 Shipping: 10",
"Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1].to_i }
=> [100, 100, 100, 100]
For your example strings, it's probably preferable to use case-sensitive patterns to match "Total" unless you have knowledge that you will encounter "total" in lower-case. And, in that case, you should show such an example.
I think you can do this:
/Total[^:]*:\s+([0-9]+)/i
Explanation:
Total seach for "total"
[^:]* followed by anything or nothing until a colon ":" is found
:\s+ read over the colon and any following white space (maybe take * instead of +)
([0-9]+) read the numbers into a group for later retrieval -> 100
I am not sure how to indicate case insensitivity in the environment you use, but usually this can be done with some flags like I indicated with the i
here is a fiddle as an example
# assuming you have all your files ready in an array
a = ["Service charges: 10 Total: 100 Shipping: 10", "Service charges: 10 Total Amount: 100 Shipping: 10", "Service charges: 10 Grand Total: 100 Shipping: 10", "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"]
# we find every total with the following regexp
a.map {|s| s[/total[^\d]*(?<total>\d+)/i, 'total']}
#=> ["100", "100", "100", "100"]
The regexp is /total[^\d]*(?<total>\d*)/i. It looks for the word "total" and ignores any following character, until it finds a number (which it returns in a capture group). The i option makes it case insensitive.

How to delete an element start with underscore ruby array

I am having an array
["You purchased 2 tickets to: \n",
"____________________________________________________________________________\n",
"_________________ \n",
"The Temper Trap\n", "Webster Hall, New York, NY\n", "Fri, Apr 2, 2010 07:00 PM \n",
"\n", "Order for: Vikas Sekhri\n"]
I want to remove the underscore (means second and third element of an array). I need like this
["You purchased 2 tickets to: \n", "The Temper Trap\n", "Webster Hall, New York, NY\n", "Fri, Apr 2, 2010 07:00 PM \n", "\n", "Order for: Vikas Sekhri\n"]
Anyone can help me
arr = ["You purchased 2 tickets to: \n", "____________________________________________________________________________\n", "_________________ \n", "The Temper Trap\n", "Webster Hall, New York, NY\n", "Fri, Apr 2, 2010 07:00 PM \n", "\n", "Order for: Vikas Sekhri\n"]
arr.reject { |elt| elt.starts_with? "_" }
array.reject! {|element| element =~ /_/ }
This modifies the array. If you don't want it modified, use reject (without the bang) instead.

Merging multiple Google Calendar feeds for display on my desktop

I have several Google calendars that I'd like to merge and place on my windows desktop using Samurize. I've tried using Samurize's Page Scraper plugin, but it doesn't appear to be up to the task.
I can get Samurize to run a script and place it's output on the desktop, but I'm not sure
what the best tools are to do this.
All the URLs I have are of the form:
http://www.google.com/calendar/feeds/example%40gmail.com/private-REMOVED/basic?futureevents=true&orderby=starttime&sortorder=ascending&singleevents=true
So I could fetch them using curl, but then I need to filter them.
I want something that looks like:
2009 12 02 Event from calendar 1's description
2009 12 03 Event from calendar 2's description
2009 12 04 Event from calendar 1's description
2009 12 05 Event from calendar 3's description
2009 12 06 Event from calendar 1's description
However the dates in the calendar feeds are formatted like this:
<title type='html'>Event from calendar 1's description</title><summary type='html'>When: Fri 5 Dec 2008<br>
So how do I filter out the dates and descriptions, and convert the dates?
(I have cygwin installed so something using perl or sed/awk would be perfect as I'm familiar enough with them that I'd be confident about altering them in future, but I'm open to suggestions.)
I'm learning perl so please don't laugh too hard, but here's something that might get you most of the way towards parsing:
#!C:\Perl\bin -w
use strict;
my %months = ("Jan", "01", "Feb", "02", "Mar", "03", ... etc. etc. ... "Dec", "12");
$_ = "<title type='html'>Event from calendar 1's description</title><summary type='html'>When: Fri 5 Dec 2008<br>";
if (/<title type='html'>([\d\D]*)<\/title><summary type='html'>When: (\S+) (\S+) (\S+) (\S+)<br>/)
{
print "$5 $months{$4} $3 $1\n";
}
Two ideas.
You could use Yahoo Pipes (see this article.)
Or, if you don't want to wait around for Yahoo to refresh it's data, here is a python script under development to merge ICAL files.
Building on John W's script this is what I'm using
#!c:\cygwin\bin\perl.exe -w
use strict;
use LWP::Simple qw(get);
my %calendars = ( "Sam Hasler", "http://www.google.com/calendar/feeds/blah/blah/basic"
, "Family ", "http://www.google.com/calendar/feeds/blah/blah/basic"
, "Work ", "http://www.google.com/calendar/feeds/blah/blah/basic"
);
my $params = "?futureevents=true&orderby=starttime&sortorder=ascending&singleevents=true";
my %months = ( "Jan", "01", "Feb", "02", "Mar", "03", "Apr", "04"
, "May", "05", "Jun", "06", "Jul", "07", "Aug", "08"
, "Sep", "09", "Oct", "10", "Nov", "11", "Dec", "12");
my $calendar_name;
my $calendar_url;
my #lines;
while (($calendar_name, $calendar_url) = each(%calendars)){
my $calendar_data = get "$calendar_url$params";
#lines = split(/\n/, $calendar_data);
foreach (#lines) {
if (/<title type='html'>([\d\D]*)<\/title><summary type='html'>When: (\S+) (\S+) (\S+) (\S+)<br>/)
{
my $day = "$3";
if ($3 < 10 ) {
$day = "0$3";
}
print "$5 $months{$4} $day\t$calendar_name\t$1\n";
}
}
}
I just pipe the output through sort to get it in date order.
Update: I've converted my script to a plugin and submitted it to the Samurize website: Merge Google Calendar feeds.

Resources