At my company, we recently started using Rally for our project management tool. Initially, someone external to our team invested a lot of time manually creating iterations using a naming convention that is just not going to jive with our team's existing scheme. Instead of asking this poor soul to delete these empty iterations by hand, one by one, I would like to automate this process using Rally's REST API. In short, we need to delete these 100+ empty iterations which span across 3 different projects (which all sharing a common parent).
I have spent some time looking at the rally-rest-api ruby gem, and although I have some limited Ruby experience, the Query interface of the API remains confusing to me, and I am having some trouble wrapping my head around it. I know what my regex would like, but I just don't know how to supply that to the query.
Here is what I have so far:
require 'rubygems'
require 'rally_rest_api'
rally = RallyRestAPI.new(:username => "myuser",
:password => "mypass")
regex = /ET-VT-100/
# get all names that match criteria
iterations = rally.find(:iteration) { "query using above regex?" }
# delete all the matching iterations
iterations.each do |iteration|
iteration.delete
end
Any pointers in the right direction would be much appreciated. I feel like I'm almost there.
I had to do something similar to this a few months back when I wanted to rename a large group of iterations.
First, make sure that the user you are authenticating with has at least "Editor" role assigned in all projects from which you want to delete iterations. Also, if you have any projects in your workspace which you do not have read permissions, you will have to first supply a project(s) element for the query to start from. (You might not even know about them, someone else in your organization could have created them).
The following gets a reference to the projects and then loops through the iterations with the specified regex:
require 'rubygems'
require 'rally_rest_api'
rally = RallyRestAPI.new(:username => "myuser",
:password => "mypass")
# Assumes all projects contain "FooBar" in name
projects = rally.find(:project) { contains :name, "FooBar"}
projects.each do |project|
project.iterations.each do |iteration|
if iteration.name =~ /ET-VT-100/
iteration.delete
end
end
end
Try:
iterations = rally.find(:iteration) { contains :name, "ET-VT-100" }
This assumes that the iteration has ET-VT-100 in the name, you may need to query against some other field. Regexes are not supported by the REST api, afaict.
Related
I working on a json file, I think. But Regardless, I'm working with a lot of different hashes and fetching different values and etc. This is
{"notification_rule"=>
{"id"=>"0000000",
"contact_method"=>
{"id"=>"000000",
"address"=>"cod.lew#gmail.com",}
{"notification_rule"=>
{"id"=>"000000"
"contact_method"=>
{"id"=>"PO0JGV7",
"address"=>"cod.lew#gmail.com",}
Essential, this is the type of hash I'm currently working with. With my code:
I wanted to stop duplicates of the same thing in the text file. Because whenever I run this code it brings both the address of both these hashes. And I understand why, because its looping over again, but I thought this code that I added would help resolve that issue:
Final UPDATE
if jdoc["notification_rule"]["contact_method"]["address"].to_s.include?(".com")
numbers.print "Employee Name: "
numbers.puts jdoc["notification_rule"]["contact_method"]["address"].gsub(/#target.com/, '').gsub(/\w+/, &:capitalize)
file_names = ['Employee_Information.txt']
file_names.each do |file_name|
text = File.read(file_name)
lines = text.split("\n")
new_contents = lines.uniq.join("\n")
File.open(file_name, "w") { |file| file.puts new_contents }
end
else
nil
end
This code looks really confused and lacking a specific purpose. Generally Ruby that's this tangled up is on the wrong track, as with Ruby there's usually a simple way of expressing something simple, and testing for duplicated addresses is one of those things that shouldn't be hard.
One of the biggest sources of confusion is the responsibility of a chunk of code. In that example you're not only trying to import data, loop over documents, clean up email addresses, and test for duplicates, but somehow facilitate printing out the results. That's a lot of things going on all at once, and they all have to work perfectly for that chunk of code to be fully operational. There's no way of getting it partially working, and no way of knowing if you're even on the right track.
Always try and break down complex problems into a few simple stages, then chain those stages together as necessary.
Here's how you can define a method to clean up your email addresses:
def address_scrub(address)
address.gsub(/\#target.com/, '').gsub(/\w+/, &:capitalize)
end
Where that can be adjusted as necessary, and presumably tested to ensure it's working correctly, which you can now do indepenedently of the other code.
As for the rest, it looks like this:
require 'set'
# Read in duplicated addresses from a file, clean up with chomp, using a Set
# for fast lookups.
duplicates = Set.new(
File.open("Employee_Information.txt", "r").readlines.map(&:chomp)
)
# Extract addresses from jdoc document array
filtered = jdocs.map do |jdoc|
# Convert to jdoc/address pair
[ jdoc, address_scrub(jdoc["notification_rule"]["contact_method"]["address"]) ]
end.reject do |jdoc, address|
# Remove any that are already in the duplicates list
duplicates.include?(address)
end.map do |jdoc, _|
# Return only the document
jdoc
end
Where that processes jdocs, an array of jdoc structures, and removes duplicates in a series of simple steps.
With the chaining approach you can see what's happening before you add on the next "link", so you can work incrementally towards a solution, adjusting as you go. Any mistakes are fairly easy to catch because you're able to, at any time, inspect the intermediate products of those stages.
I am successful scraping building data from a website (www.propertyshark.com) using a single address, but it looks like I get blocked once I use loop to scrape multiple addresses. Is there a way around this? FYI, the information I'm trying to access is not prohibited according to their robots.txt.
Codes for single run is as follows:
require 'mechanize'
class PropShark
def initialize(key,link_key)
##key = key
##link_key = link_key
end
def crawl_propshark_single
agent = Mechanize.new{ |agent|
agent.user_agent_alias = 'Mac Safari'
}
agent.ignore_bad_chunking = true
agent.verify_mode = OpenSSL::SSL::VERIFY_NONE
page = agent.get('https://www.google.com/')
form = page.forms.first
form['q'] = "#{##key}"
page = agent.submit(form)
page = form.submit
page.links.each do |link|
if link.text.include?("#{##link_key}")
if link.text.include?("PropertyShark")
property_page = link.click
else
next
end
if property_page
data_value = property_page.css("div.cols").css("td.r_align")[4].text # <--- error points to these commands
data_name = property_page.css("div.cols").css("th")[4].text
#result_hash["#{data_name}"] = data_value
else
next
end
end
end
return #result_hash
end
end #endof: class PropShark
# run
key = '41 coral St, Worcester, MA 01604 propertyshark'
key_link = '41 Coral Street'
spider = PropShark.new(key,key_link)
puts spider.crawl_propshark_single
I get the following errors but in an hour or two the error disappears:
undefined method `text' for nil:NilClass (NoMethodError)
When I use a loop using the above codes, I delay the process by having sleep 80 between addresses.
The first thing you should do, before you do anything else, is to contact the website owner(s). Right now, you actions could be interpreted anywhere between overly aggressive and illegal. As others have pointed out, the owners may not want you scraping the site. Alternatively, they may have an API or product feed available for this particular thing. Either way, if you are going to be depending on this website for your product, you may want to consider playing nice with them.
With that being said, you are moving through their website with all of the grace of an elephant in a china store. Between the abnormal user agent, unusual usage patterns from a single IP, and a predictable delay between requests, you've completely blown your cover. Consider taking a more organic path through the site, with a more natural human-emulation delay. Also, you should either disguise your useragent, or make it super obvious (Josh's Big Bad Scraper). You may even consider using something like Selenium, which uses a real browser, instead of Mechanize, to give away fewer hints.
You may also consider adding more robust error handling. Perhaps the site is under excessive load (or something), and the page you are parsing is not the desired page, but some random error page. A simple retry may be all you need to get that data in question. When scraping, a poorly-functioning or inefficient site can be as much of an impediment as deliberate scraping protections.
If none of that works, you could consider setting up elaborate arrays of proxies, but at that point you would be much better of using one of the many online Webscraping/API creating/Data extraction services that currently exist. They are fairly inexpensive and already do everything discussed above, plus more.
It is very likely nothing is "blocking" you. As you pointed out
property_page.css("div.cols").css("td.r_align")[4].text
is the problem. So lets focus on that line of code for a second.
Say the first time round your columns are columns = [1,2,3,4,5] well then rows[4] will return 5 (the element at index 4).
No for fun let's assume the next go around your columns are columns = ['a','b','c','d'] well then rows[4] will return nil because there is nothing at the fourth index.
This appears to be your case where sometimes there are 5 columns and sometimes there are not. Thus leading to nil.text and the error you are recieving
I've been flirting with Redis for a while now.
I've watched these series some time ago and they were awesome. I've been through some of the documentation and the mentioning of the Time complexity of the queries blew me away, this is something that's rarely mentioned in web materials but is of huge importance for app building.
Anyhow I'm trying to make my app use the Redis on the consumer end so the users can fetch the data as fast as possible.
So I'm trying to save some objects to hash as:
$redis->hmset("taxi_car", array(
"brand" => "Toyota",
"model" => "Yaris",
"license number" => "RO-01-PHP",
"year of fabrication" => 2010,
"nr_stats" => 0)
as found here and this works nicely.
However I can't find a way to delete the whole entry anywhere.
Did I get this hash thing wrong?
Following this example I would like to delete the entry with given licence number. All I could find is how to delete the licence number from the object:
$redis->hdel("taxi_car", "license number");
and can't figure out how to delete the whole hash row (please do correct with proper word for row here).
Another problem here is that it seems this only allows me to save a single taxi_car in the Redis. How do I set the UUID so I can have multiple Taxi cars?
I'm going to play with this a bit, any help is welcome. Thanks!
To delete a key of any type, Hash included, call the Redis DEL command.
To have multiple keys, give them different names, e.g. taxi_car:1, taxi_car:2 etc.
I have a web app that allows users to upload text documents (of about 2-3000 words), and a database table with about 50,000 phrases (as strings).
How can I most efficiently find out which phrases appear in each of those uploaded documents? (i.e. is there anything better than brute-forcing it by checking each phrase separately?)
The web app flow should ideally be that on the page load after upload, the app knows which phrases it has found in that one document.
Ideally I'd like a solution in ruby, but suggestions as to other technologies or data structures or anything would be a real help.
I don't know what the database you are using, so I just give a MySQL solution:
require 'mysql2'
content = File.read('/path/to/document.txt')
client = Mysql2::Client.new(:host => "localhost", :username => "root")
sql = "SELECT phrase FROM phrases ORDER BY LENGTH(phrase)"
appeared = client.query(sql, as: :array, stream: true).each.with_object([]) do |row, array|
array << row[0] if content.gsub!(%r[#Regexp.escape(row[0])]i, '')
end
The idea is to shrink the content after each match so that next search will be faster.
DISCLAIMER: Not tested.
I don't know if it can be called an algorithm but i think its close.
I will be pulling data from an API that will have certain words in the title, eg:
Great Software 2.0 Download Now
Buy Great Software for just $10
Great Software Torrent Download
So, i want to do different things based on the presence of certain words such as Download, Buy etc. For eg, if it has the word 'buy' in it, i would like to extract the word buy and the amount value that is present in the title and show it in another div, so in this case it would be "Buy for $10" or "Buy $10" etc. I can do if/else as well but I don't want to use if else because there could be more such conditions in the future. So what i am thinking about is using the send method. eg:
def buy(string)
'Buy for just' + string.scan(/\$\d+/).first
end
def whichkeyword(title)
send (title.scan(/(download|buy)/i)[0][0]).downcase.to_sym, title
end
whichkeyword('Buy this software for $10 now')
is there a better way to do this? Or is this even a good way to do it? Any help would be appreciated
First of all, use send if and only you are to call private method, use public_send otherwise.
In this particular case metaprogramming is an overkill. It requires too much redundant code, plus it requires the code to be changed for new items. I would go with building a hash like:
#hash = { 'buy' => { text: 'Buy for just %{placeholder}', re: /\$\d+/ } }
This hash might be places somewhere outside of the code, e. g. it might be stored in yml file near the code and loaded in advance. That way you might be able to change a behaviour without modifying the code, that is handy for instance in gem.
As we have a hash defined/loaded, I would call the method:
def format string
key = string[/#{Regexp.union(#hash.keys).source}/i].downcase
puts #hash[key][:text] % { placeholder: string[#hash[key][:re]] }
end
Yielding:
▶ format("Buy this software for $10 now")
#⇒ Buy for just $10
There are many advantages over declaring methods, e. g. now matches might contain spaces, you might easily add/remove matchers etc.
First of all, your algorithm can work, but has some troubles in it, like what if no keyword is applied.
I have two solutions for you:
NLP
If you want to do it much more dynamic, you can use NLP - Natural language Processing. NLP will find main words in you sentence and then you can find the good solution for each.
A good gem for that is Treat that you can use with stanford-core-nlp. After processing the data you can find the verbs and even synonyms in the sentence and figure out what to do.
sentence('Buy this software for $10 now').verbs # ['buy']
Simple Hash
This solution is less dynamic, but much more simple. Like you did with the scan, just use Constant to manage your keywords, and the output from them(I would do it with lambdas). you can also add default to the hash
KEYWORDS = Hash.new('Default Title').merge(
buy: -> { },
download: -> { }
)
KEYWORDS[sentence[/(#{KEYWORDS.keys.join('|')})/i].downcase]
I think this solution is good enough.
The only thing that looks strange is scan(/(download|buy)/i)[0][0].
As for me I don't very much like using [] syntax in Ruby.
I think using scan here is not necessary.
What about
def whichkeyword(title)
title =~ /(download|buy)/i
send $1.downcase.to_sym, title unless $1.nil?
end
UPDATE
def whichkeyword(title)
action = title[/(download|buy)/i]
public_send action.downcase.to_sym, title if action
end