How to get the number of forks of a GitHub repo with the GitHub API? - fork

I use Github API V3 to get forks count for a repository, i use:
GET /repos/:owner/:repo/forks
The request bring me only 30 results even if a repository contain more, I googled a little and I found that due to the memory restrict the API return only 30 results per page, and if I want next results I have to specify the number of page.
Only me I don't need all this information, all I need is the number of forks.
Is there any way to get only the number of forks?
Because If I start to loop page per page my script risque to crash if a repository contain thousand results.

You can try and use a search query.
For instance, for my repo VonC/b2d, I would use:
https://api.github.com/search/repositories?q=user%3AVonC+repo%3Ab2d+b2d
The json answer gives me a "forks_count": 5
Here is one with more than 4000 forks (consider only the first result, meaning the one whose "full_name" is actually "strongloop/express")
https://api.github.com/search/repositories?q=user%3Astrongloop+repo%3Aexpress+express
"forks_count": 4114,

I had a job where I need to get all forks as git-remotes of a github project.
I wrote the simple python script https://gist.github.com/urpylka/9a404991b28aeff006a34fb64da12de4
At the base of the program is recursion function for getting forks of a fork. And I met same problem (GitHub API was returning me only 30 items).
I solved it with add increment of ?page=1 and add check for null response from server.
def get_fork(username, repo, forks, auth=None):
page = 1
while 1:
r = None
request = "https://api.github.com/repos/{}/{}/forks?page={}".format(username, repo, page)
if auth is None: r = requests.get(request)
else: r = requests.get(request, auth=(auth['login'], auth['secret']))
j = r.json()
r.close()
if 'message' in j:
print("username: {}, repo: {}".format(username, repo))
print(j['message'] + " " + j['documentation_url'])
if str(j['message']) == "Not Found": break
else: exit(1)
if len(j) == 0: break
else: page += 1
for item in j:
forks.append({'user': item['owner']['login'], 'repo': item['name']})
if auth is None:
get_fork(item['owner']['login'], item['name'], forks)
else:
get_fork(item['owner']['login'], item['name'], forks, auth)

Related

How to use entrezpy and Biopython Entrez libraries to access ClinVar data from genomic position of variant

[Disclaimer: I have published this question 3 weeks ago in biostars, with no answers yet. I really would like to get some ideas/discussion to find a solution, so I post also here.
biostars post link: https://www.biostars.org/p/447413/]
For one of my projects of my PhD, I would like to access all variants, found in ClinVar db, that are in the same genomic position as the variant in each row of the input GSVar file. The language constraint is Python.
Up to now I have used entrezpy module: entrezpy.esearch.esearcher. Please see more for entrezpy at: https://entrezpy.readthedocs.io/en/master/
From the entrezpy docs I have followed this guide to access UIDs using the genomic position of a variant: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html in code:
# first get UIDs for clinvar records of the same position
# credits: credits: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html
chr = variants["chr"].split("chr")[1]
start, end = str(variants["start"]), str(variants["end"])
es = entrezpy.esearch.esearcher.Esearcher('esearcher', self.entrez_email)
genomic_pos = chr + "[chr]" + " AND " + start + ":" + end # + "[chrpos37]"
entrez_query = es.inquire(
{'db': 'clinvar',
'term': genomic_pos,
'retmax': 100000,
'retstart': 0,
'rettype': 'uilist'}) # 'usehistory': False
entrez_uids = entrez_query.get_result().uids
Then I have used Entrez from BioPython to get the available ClinVar records:
# process each VariationArchive of each UID
handle = Entrez.efetch(db='clinvar', id=current_entrez_uids, rettype='vcv')
clinvar_records = {}
tree = ET.parse(handle)
root = tree.getroot()
This approach is working. However, I have two main drawbacks:
entrezpy fulls up my log file recording all interaction with Entrez making the log file too big to be read by the hospital collaborator, who is variant curator.
entrezpy function, entrez_query.get_result().uids, will return all UIDs retrieved so far from all the requests (say a request for each variant in GSvar), thus this space inefficient retrieval. That is the entrez_uids list will quickly grow a lot as I process all variants from a GSVar file. The simple solution that I have implenented is to check which UIDs are new from the current request and then keep only those for Entrez.fetch(). However, I still need to keep all seen UIDs, from previous variants in order to be able to know which is the new UIDs. I do this in code by:
# first snippet's first lines go here
entrez_uids = entrez_query.get_result().uids
current_entrez_uids = [uid for uid in entrez_uids if uid not in self.all_entrez_uids_gsvar_file]
self.all_entrez_uids_gsvar_file += current_entrez_uids
Does anyone have suggestion(s) on how to address these two presented drawbacks?

Code is not delievered by SendCodeRequest without error and this happens only on server-side with Heroku

Here is my situation: the same Telethon code is used on my local machine and on the server. Requesting authorization code from local machine works fine. Requesting the code from the server does not produce any error, and code is not sent. Sometimes it works even from the server without any changes in code.
I suppose there might be some ip blocks or something related to the ip, cause that is the only thing which might differ on the server side: Heroku assign ip addresses dynamically, so, there might by some subnets which are blocked by Telegram API for some reason. But there is no error and that is really strange. There are too many ip addresses to disprove the hypothesis. I need to catch at least one ip address which gives me opposite results: one time code it recieved and another time does not. So I am stuck with this situation and have no ideas how it could be fixed or clarified.
global t
t = None
async def ssssendCode(phone):
global t
try:
if os.path.isfile(phone+'.session'):
logger.debug('client file exists')
else:
logger.debug('client file does not exist')
if t is None:
t = TelegramClient(phone, settings['telegramClientAPIId'], settings['telegramClientAPIHash'])
t.phone = phone
#t.phone_code_hash = None
await t.connect()
#response = await t.send_code_request(phone=phone,force_sms=True)
s3_session.resource('s3').Bucket('telethon').upload_file(str(phone)+".session", str(phone)+".session")
logger.debug(str(requests.get('https://httpbin.org/ip').text))
response = await t.send_code_request(phone=phone)
logger.debug(str(t.is_connected()))
except Exception as e:
response = str(e)
return str(response)
example of response to the local machine request
SentCode(type=SentCodeTypeSms(length=5), phone_code_hash='b5b069a2a4122040f1', next_type=CodeTypeCall(), timeout=120)
example of reponse to the server-side request
SentCode(type=SentCodeTypeSms(length=5), phone_code_hash='0e89db0324c1af0149', next_type=CodeTypeCall(), timeout=120)
send_code_request is the from the Telethon without modifications
async def send_code_request(
self: 'TelegramClient',
phone: str,
*,
force_sms: bool = False) -> 'types.auth.SentCode':
"""
Sends the Telegram code needed to login to the given phone number.
Arguments
phone (`str` | `int`):
The phone to which the code will be sent.
force_sms (`bool`, optional):
Whether to force sending as SMS.
Returns
An instance of :tl:`SentCode`.
Example
.. code-block:: python
phone = '+34 123 123 123'
sent = await client.send_code_request(phone)
print(sent)
"""
result = None
phone = utils.parse_phone(phone) or self._phone
phone_hash = self._phone_code_hash.get(phone)
if not phone_hash:
try:
result = await self(functions.auth.SendCodeRequest(
phone, self.api_id, self.api_hash, types.CodeSettings()))
except errors.AuthRestartError:
return await self.send_code_request(phone, force_sms=force_sms)
# If we already sent a SMS, do not resend the code (hash may be empty)
if isinstance(result.type, types.auth.SentCodeTypeSms):
force_sms = False
# phone_code_hash may be empty, if it is, do not save it (#1283)
if result.phone_code_hash:
self._phone_code_hash[phone] = phone_hash = result.phone_code_hash
else:
force_sms = True
self._phone = phone
if force_sms:
result = await self(
functions.auth.ResendCodeRequest(phone, phone_hash))
self._phone_code_hash[phone] = result.phone_code_hash
return result
Just in case: I have much more than 2 minutes between attempts to get a code from the local machine and server, so it is absolutely not the timeout issue. And moreover: even when requesting the code from the local right after half a minute from the failed server-side attemp: code is coming almost immediately.

Typeahead remote call is triggering even if there are datas in local/prefetch

The remote call is trigging even if there are values in prefetch/local data.
Sample Code:
var jsonObj = ["Toronto", "Montreal", "Calgary", "Ottawa", "Edmonton", "Peterborough"];
$('input.countries-cities').typeahead([
{
name: 'Canada',
local: jsonObj,
remote: {
url: 'http://localhost/typeahead/ajaxcall.php?q=QUERY',
cache: true
},
limit: 3,
minLength: 1,
header: '<h3>Canada</h3>'
}
]);
What i expect is trigger remote call only if there are no matches in local. But each time i type locations the remote call is getting triggered. Any help will be highly appreciated.
I know this question is a couple months old, but I ran into a similar issue and found this answer.
The problem is that your limit is set to 3 and your search is turning up less results than your limit, thus triggering the remote call. If you had set your limit to 1, you wouldn't get a remote call unless there were no results.
Not a great design IMO, since you probably still want to see 3 results if there are 3 results. And worse, say your local/prefetch results only return 1 result...if your remote returns the same result, it will be duplicated in your list. I haven't found a solution to that problem yet.
In bloodhound.js replace
matches.length < this.limit ? cacheHit = ...
by
matches.length < 1 ? cacheHit = ...

Rate Exceeding in workflow_execution polling

I am currently trying to modify a plugin for posting metrics to new-relic via AWS. I have successfully managed to make the plugin post metrics from swf to new relic (not originally in plugin), but have encountered a problem if the program runs for too long.
When the program runs for a bout 10 minutes I get the following error:
Error occurred in poll cycle: Rate exceeded
I believe this is coming from my polling swf for the workflow executions
domain.workflow_executions.each do |execution|
starttime = execution.started_at
endtime = execution.closed_at
isOpen = execution.open?
status = execution.status
if endtime != nil
running_workflow_runtime_total += (endtime - starttime)
number_of_completed_executions += 1
end
if status.to_s == "open"
openCount = openCount + 1
elsif status.to_s == "completed"
completedCount = completedCount + 1
elsif status.to_s == "failed"
failedCount = failedCount + 1
elsif status.to_s == "timed_out"
timed_outCount = timed_outCount + 1
end
end
This is called in a polling cycle every 60 seconds
Is there a way to set the polling rate? Or another way to get the workflow executions?
Thanks, here's a link to the ruby sdk for swf => link
The issue is likely that you are creating a large number of workflow executions and each iteration through the loop in workflow_executions is causing a lookup, which eventually is exceeding your rate limit.
This could also be getting a bit expensive, so be careful.
It's not clear what you're really trying to do, so I can't tell you how to fix it unless you post all your code (or the parts around calls to SWF).
You can see here:
https://github.com/aws/aws-sdk-ruby/blob/05d15cd1b6037e98f2db45f8c2597014ee376a59/lib/aws/simple_workflow/workflow_execution_collection.rb
That a call is made to SWF for each workflow in the collection.

How to make 'perishable' link in Tornado

i want to make a link that is valid only for 24 hours, this is for a validation purpose, so my question is simple:
How do i make this link valid only for this time; i've a hint:
Get the epoch time.
Make a link using only this value: something.com/time/1359380374
When the user clics on the link, extract this value and compare.
I hear about Hash values? why? we cant get the time from the hash value (invert the process) so how this is done?
Your best bet is to have the users email send as an argument and then query the database to see if their link has expired:
Requested link query: update users set locked_stamp = now();
Request url: http://yourdomain.com/?email=useremail
Query: select true from users where email = '$email' and locked_stamp > now()-interval 1 hour and now() limit 1
Result: You have a person requesting within the hour with email: $email.
I have a script that using base64 to encode the timestamp... but its not secure by any means.
import tornado.web
import base64, re, time
import sys
def get_time():
"""Method used to get the current time in b64"""
return base64.b64encode(str(int(time.mktime(time.localtime()))))
class WebHandler(tornado.web.RequestHandler):
def get(self, _time):
timecheck = base64.b64decode(_time)
try:
#require it to be all digits
assert re.match('^\d+$', timecheck) is not None
# Must be within 1 hour: greater then 1 hour ago and less then now
assert int(timecheck) > int(time.mktime(time.localtime()))-3600 and \
int(timecheck) < int(time.mktime(time.localtime()))
except AssertionError:
raise tornado.web.HTTPError(401,'Woops! Unauthorized.')
else:
self.write('Pass')
# Route
application = tornado.web.Application([
(r"/([^\/]+)/?", WebHandler),
])
if __name__ == "__main__":
application.listen(8889)
tornado.ioloop.IOLoop.instance().start()
the same way it sets secure cookies:
signed_message = self.create_signed_value(secret, name, value)
Then you can check it:
message = self.decode_signed_value(secret, name, value, max_age_days=31, clock=None,min_version=None)
Secret should be a long random number, but you only need one per app. min_version could be DEFAULT_SIGNED_VALUE_VERSION (which is currently 2).
Don't roll your own solution. Use the one in the library. It's there. It works.

Resources