I need a faster way to find repos with PyGitHub - pygithub

My organization has a ton of repos in GitHub. I have a subset of these repos that I want to programmatically detect. All of the repos I want start with the same string.
The code below correctly finds them, but it takes about a minute to do it because of all of the repos present. Is there a faster way to do this?
def FindMyRepos():
global ACCESS_TOKEN, ORGANIZATION, PREFIX
repos = []
# Log into GitHub
gh = Github(ACCESS_TOKEN)
if not gh: return 1
# Connect to organization
org = gh.get_organization(ORGANIZATION)
if not org: return 2
# Look for repos that start with PREFIX
for r in org.get_repos():
if r.name.startswith(PREFIX):
repos.append(r.name)

After a lot of searching, I have found the answer. I need to use the search_repositories method. The one 'gotcha' is that this method is part of the Github object but not the organization object.
def FindMyRepos():
global ACCESS_TOKEN, ORGANIZATION, PREFIX
repos = []
# Log into GitHub
gh = Github(ACCESS_TOKEN)
if not gh: return 1
# Look for repos in ORGANIZATION containing PREFIX
for r in gh.search_repositorires(query=ORGANIZATION+'/'+PREFIX):
# Only include repos starting with PREFIX
if r.name.startswith(PREFIX):
repos.append(r.name)
The reason for the if r.name.startswith(PREFIX) is because for some reason the search will return all repos in the ORGANIZATION that have the string PREFIX in their name anywhere.

Related

Process Substitution in Ansible for Path-based Parameters

Many Ansible modules are designed to accept file paths as a parameter, the but lack the possibility to supply the contents of the file directly. In cases where the input data actually comes from something other than a file, this forces one to create a temporary file somewhere on disk, write the intended parameter value into it and then supply the path of this temporary file to the Ansible module.
For illustration purposes a real life example: the java_cert Ansible module takes the parameter pkcs12_path for the path to a PKCS12 keystore containing a keypair to be imported into a given Java keystore. Now say for example this data is retrieved through a Vault lookup, so in order to be able to supply the module with the path it demands, we must write the Vault lookup result into a temporary file, use the file's path as the parameter and then handle the secure deletion of the temporary file, seeing as the data is likely confidential.
When a situation such as this arises within the context of Shell/bash scripting, namely a command line tool's flag only supporting interaction with a file, the magic of process substitution (e.g. --file=<(echo $FILE_CONTENTS)) allows for the tool's input and output data to be linked with other commands by transparently providing a named pipe that acts as if it were a (mostly) normal file on disk.
Within Ansible, is there any comparable mechanism to replace file-based parameters with more flexible constructs that allow for the usage of data from variables or other commands? If there is no built-in method to achieve this, are there maybe 3rd-party solutions that allow for it, or that simplify workflows like the one I described? For example something like a custom lookup plugin which is supplied with the file content data and then handles, transparently and in the background, the file management (i.e. creation, writing the data, and ultimately deletion) and provides the temporary path as its return value, without the user necessarily ever having to know it.
Exemplary usage of such a plugin could be:
...
pkcs_path: "{{ lookup('as_file', '-----BEGIN PRIVATE KEY-----...-----END PRIVATE KEY----- ') }}"
...
with the plugin then creating a file under e.g. /tmp/as_file.sg7N3bX containing the textual key from the second parameter and returning this file path as the lookup result. I am however unsure how exactly the continued management of the file (especially the timely deletion of sensitive data) could be realized in such a context.
Disclaimer:
I am (obviously!) the author of the below collection which was created as a reaction to the above question
The lookup plugin was not thoroughly tested and might fail with particular modules.
Since this was a pretty good idea and nothing existed, I decided to give it a try. This all ended up in a collection now called thoteam.var_as_file which is available in a github repo. I won't paste all files in this answer as they are all available in the mentioned repo with a full README documentation to install, test and use.
The global idea was the following:
Create a lookup plugin responsible for pushing new temporary files with a given content and returning a path to use them.
Clean-up the created files at the end of playbook run. For this step, I created a callback plugin which launches the cleanup action listening to v2_playbook_on_stats events.
I still have some concerns about concurrency (files yet to be cleaned are stored in a static json file on disk) and reliability (not sure that the stats stage happens in all situation, especially on crashes). I'm also not entirely sure using a callback for this is a good practice / best choice.
Meanwhile this was quite fun to code and it does the job. I will see if this work is used by other and might very well enhance all this in the next weeks (and if you have PRs to fix the already know issues, I'm happy to accept them).
Once installed and the callback plugin enabled (see https://github.com/ansible-ThoTeam/thoteam.var_as_file#installing-the-collection), the lookup can be used anywhere to get a file path containing the passed content. For example:
- name: Get a filename with the given content for later use
ansible.builtin.set_fact:
my_tmp_file: "{{ lookup('thoteam.var_as_file.var_as_file', some_variable) }}"
- name: Use in place in a module where a file is mandatory and you have the content in a var
community.general.java_cert:
pkcs12_path: "{{ lookup('thoteam.var_as_file.var_as_file', pkcs12_store_from_vault) }}"
cert_alias: default
keystore_path: /path/to/my/keystore.jks
keystore_pass: changeit
keystore_create: yes
state: present
These are the relevant parts of the two plugin files. I removed the ansible documentation vars (for conciseness) which you can find in the git repo directly if your wish.
plugins/lookup/var_as_file.py
from ansible.errors import AnsibleError
from ansible.plugins.lookup import LookupBase
from ansible.module_utils.common.text.converters import to_native
from ansible_collections.thoteam.var_as_file.plugins.module_utils.var_as_file import VAR_AS_FILE_TRACK_FILE
from hashlib import sha256
import tempfile
import json
import os
def _hash_content(content):
"""
Returns the hex digest of the sha256 sum of content
"""
return sha256(content.encode()).hexdigest()
class LookupModule(LookupBase):
created_files = dict()
def _load_created(self):
if os.path.exists(VAR_AS_FILE_TRACK_FILE):
with open(VAR_AS_FILE_TRACK_FILE, 'r') as jfp:
self.created_files = json.load(jfp)
def _store_created(self):
"""
serialize the created files as json in tracking file
"""
with open(VAR_AS_FILE_TRACK_FILE, 'w') as jfp:
json.dump(self.created_files, jfp)
def run(self, terms, variables=None, **kwargs):
'''
terms contains the content to be written to the temporary file
'''
try:
self._load_created()
ret = []
for content in terms:
content_sig = _hash_content(content)
file_exists = False
# Check if file was already create for this content and check it.
if content_sig in self.created_files:
if os.path.exists(self.created_files[content_sig]):
with open(self.created_files[content_sig], 'r') as efh:
if content_sig == _hash_content(efh.read()):
file_exists = True
ret.append(self.created_files[content_sig])
else:
os.remove(self.created_files[content_sig])
# Create / Replace the file
if not file_exists:
temp_handle, temp_path = tempfile.mkstemp(text=True)
with os.fdopen(temp_handle, 'a') as temp_file:
temp_file.write(content)
self.created_files[content_sig] = temp_path
ret.append(temp_path)
self._store_created()
return ret
except Exception as e:
raise AnsibleError(to_native(repr(e)))
plugins/callback/clean_var_as_file.py
from ansible.plugins.callback import CallbackBase
from ansible_collections.thoteam.var_as_file.plugins.module_utils.var_as_file import VAR_AS_FILE_TRACK_FILE
from ansible.module_utils.common.text.converters import to_native
from ansible.errors import AnsibleError
import os
import json
def _make_clean():
"""Clean all files listed in VAR_AS_FILE_TRACK_FILE"""
try:
with open(VAR_AS_FILE_TRACK_FILE, 'r') as jfp:
files = json.load(jfp)
for f in files.values():
os.remove(f)
os.remove(VAR_AS_FILE_TRACK_FILE)
except Exception as e:
raise AnsibleError(to_native(repr(e)))
class CallbackModule(CallbackBase):
''' This Ansible callback plugin cleans-up files created by the thoteam.var_as_file.var_as_file lookup '''
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'utility'
CALLBACK_NAME = 'thoteam.var_as_file.clean_var_as_file'
CALLBACK_NEEDS_WHITELIST = False
# This one doesn't work for a collection plugin
# Needs to be enabled anyway in ansible.cfg callbacks_enabled option
CALLBACK_NEEDS_ENABLED = False
def v2_playbook_on_stats(self, stats):
_make_clean()
I'll be happy to get any feedback if your give it a try.

AWS-CDK: Cross account Resource Access and Resource reference

I have a secret key-value pair in Secrets Manager in Account-1 in us-east-1. This secret is encrypted using a Customer managed KMS key - let's call it KMS-Account-1. All this has been created via console.
Now we turn to CDK. We have cdk.pipelines.CodePipeline which deploys Lambda to multiple stages/environments - so 1st to { Account-2, us-east-1 } then to { Account-3, eu-west-1 } and so on. This has been done.
The lambda code in all stages/environments above, now needs to be changed to use the secret key-value pair present with Account-1's us-east-1 SecretsManager by getting it via secretsmanager client. That code should probably look like this (python):
client = boto3.session.Session().client(
service_name = 'secretsmanager',
region_name = 'us-east-1'
)
resp = client.get_secret_value(
SecretId='arn:aws:secretsmanager:us-east-1:<ACCOUNT-1>:secret:name/of/the/secret'
)
secret = json.loads(resp['SecretString'])
All lambdas in various accounts and regions (ie. environments) will have the exact same code as above since the secret needs to be fetched from Account-1 in us-east-1.
Firstly I hope this is conceptually possible. Is that right?
Next how do I change the cdk code to facilitate this? How will the code-deploy in code-pipeline get permission to import this custom kms key and SecretManager' secretand apply correct permissions for cross account access by the lambdas that the cdk pipeline creates ?
Can someone please give some pointers?
This is bit tricky as CloudFormation, and hence CDK, doesn't allow cross account/cross stage references because CloudFormation export doesn't work cross account as far as my understanding goes. All these patterns of "centralised" resources fall into that category - ie. resource in one account (or a stage in CDK) referenced by other stages.
If the resource is created outside the context of CDK (like via console), then you might as well hardcode the names/arns/etc. throughout the CDK code where its used and that should be sufficient.
For resources that have the ability to hold resource based policies, it's simpler as you can just attach the cross-account access permissions to them directly - again, offline via console since you are maintaining it manually anyway. Each time you add a stage (account) to your pipeline, you will need to go to the resource and add cross-account permissions manually.
For resources that don't have resource based policies, like SSM for eg., things are a bit roundabout as you will need to create a Role that can be assumed cross-account and then access the resource. In that case you will have to separately maintain the IAM Role too and manually update the trust policy to other accounts as you add stages to your CDK pipeline. Then, as usual hardcode the role arn in your CDK code, assume it in some CustomResource lambda and use it.
It gets more interesting if the creation is also done in the CDK code itself (ie. managed by CloudFormation - not done separately via console/aws-cli etc.). In this case, many times you wouldn't "know" the exact ARNs as the physical-id would be generated by CloudFormation and likely be a part of the ARN. Even influencing the physical-id yourself (like by hardcoding the bucket name) might not solve it in all cases. Eg. KMS ARNs and SecretManager ARNs append unique-ids or some sort of hashes to the end of the ARN.
Instead of trying to work all that out, it would be best left untouched and let CFn generate whatever random name/arn it chooses. To then reference these constructs/ARNs, just put them into SSM Parameters in the source/central account. SSM doesn't have resource based policy that I know of. So additionally create a role in cdk that trusts the accounts in your cdk code. Once done, there is no more maintenance - each time you add new environments/accounts to CDK (assuming its a cdk pipeline here), the "loop" construct that you will create will automatically add the new account into the trust relationship.
Now all you need to do is to distribute this role-arn and the SSM Parameternames to other stages. Choose an explicit role-name and SSM Parameters. The manual ARN construction given a rolename is pretty straightforward. So distribute that and SSM Parameters around your CDK code to other stages (compile time strings instead of references). In target stages, create custom-resource(s) (AWSCustomResource) backed by AwsSdkCall lambda to simply assume this role-arn and make the SDK call to retrieve the SSM Parameter values. These values can be anything, like your KMS ARNs, SecretManager's full ARNs etc. which you couldn't easily guess. Now simply use these.
Roundabout way to do a simple thing, but so far that is all I could do to get this to work.
#You need to maintain this list no matter what you do - so it's nothing extra
all_other_accounts = [ <list of accounts that this cdk deploys to> ]
account_principals = [iam.AccountPrincipal(a) for a in all_other_account]
role = iam.Role(
assumed_by = iam.CompositePrincipal(*account_principals), #auto-updated as you change the list above
role_name = some_explicit_name,
...
)
role_arn = f'arn:aws:iam::<account-of-this-stack>:role/{some_explicit_name}'
kms0 = kms.Key(...)
kms0.grant_decrypt(role)
# Because KMS also needs explicit resource policy even if role policy allows access to it
kms0.add_to_role_policy(iam.PolicyStatement(principals = [iam.ArnPrincipal(role_arn)], actions = ...))
kms1 = kms.Key(...)
kms1.grant_decrypt(role)
kms0.add_to_role_policy(... same as above ...)
secrets0 = secretsmanager.Secret(...) #maybe this is based off kms0
secrets0.grant_read(role)
secrets1 = secretsmanager.Secret(...) #maybe this is based off kms1
secrets1.grant_read(role)
# You can turn all this into a loop ofc.
ssm0 = ssm.StingParameter(self, '...', parameter_name = 'kms0_arn', string_value = kms0.key_arn, ...)
ssm0.grant_read(role)
ssm1 = ssm.StingParameter(self, '...', parameter_name = 'kms1_arn', string_value = kms1.key_arn, ...)
ssm1.grant_read(role)
ssm2 = ssm.StingParameter(self, '...', parameter_name = 'secrets0_arn', string_value = secrets0.secret_full_arn, ...)
ssm2.grant_read(role)
...
#Now simply pass around the role and ssm parameter names
for env in environments:
MyApplicationStage(self, <...>, ..., role_arn = role_arn, params = [ 'kms0_arn', 'kms1_arn', ... ], ...)
And then in the target stage(s):
for parm in params:
fn = AwsSdkCall('ssm', 'get_parameter', { "Name": param }, ...)
acr = AwsCustomResource(..., on_create = fn, on_update = fn, ...)
collect['param'] = acr.get_response_field('Parameter.Value')
Now do whatever you want with the collected artifacts, including supplying them as environment variables to your main service lambda (which will be resolved at deploy time).
Remember they will all be Tokens and resolved only at deploy time, but that's true of any resource, whether or not via custom-resource and it shouldn't matter.
That's a generic pattern which should work for any case.
(GitHub link where this question was asked and I had answered it there too)

How to fetch all the deployed commits from octokit github client for ruby?

I am using the ruby Ocktokit to fetch deployments (list_deployments). At present it only lists the latest 30. I need to filter this on the basis of payload and need to access all deployments till date.
The Github Api provides a link in the header to access the next page. Is there something similar in Ocktokit?
client = Octokit::Client.new(
:access_token => ENV.fetch("GITHUB_TOKEN"),
)
repo = "repo_name"
env = "env_name"
options = {
:environment => env,
:task => "task_name"
}
deployments = client.deployments(repo, options)
Octokit provides pagination but also auto-pagination.
You may be able to do something like:
client.auto_paginate = true
deployments = client.deployments 'username/repository' # same as list_deployments
deployments.length
Update: I tested this locally and pagination in this manner, while documented, doesn't work as expected for deployments. You'll need to fetch the deployments manualy.
The deployment documentation indicated that listing all deployments should be available in the latest version.
If that doesn't work you may need to do it manually:
# fetch your first list of deployments
deployments = client.deployments 'username/repository'
while true
begin
deployments.concat client.last_response.rels[:next].get.data
puts deployments.length
break if deployments.length > 500
rescue StandardError
puts 'quitting'
end
end
puts deployments.length

ruby-fog: Delete an item from the object storage in less than 3 requests

I started using fog storage for a project. I do the most simple actions: upload an object, get the object, delete the object. My code looks something like this:
storage = get_storage(...) // S3 / OpenStack / ...
dir = storage.directories.get(bucket) # 1st request
if !dir.nil?
dir.files.create(key: key, body: body) # 2nd request
# or:
dir.files.get(key) # 2nd request
#or
file = dir.files.get(key) # 2nd request
if !file.nil?
file.destroy # 3rd request
end
end
In all cases there's a 1st step to get the directory, which does a request to the storage engine (it returns nil if the directory doesn't exists).
Then there's another step to do whatever I'd like to do (in case of delete there's even a 3rd step in the middle).
However if I look at let's say the Amazon S3 API, it's clear that deleting an object doesn't need 3 requests to amazon.
Is there a way to use fog but make it do less requests to the storage provider?
I think this was already answered on the mailing list, but if you use #new on directories/files it will give you just a local reference (vs #get which does a lookup). That should get you what you want, though it may raise errors if the file or directory does not exist.
Something like this:
storage = get_storage(...) // S3 / OpenStack / ...
dir = storage.directories.new(key: bucket)
dir.files.create(key: key, body: body) # 1st request
# or:
dir.files.get(key) # 1st request
#or
file = dir.files.new(key)
if !file.nil?
file.destroy # 1st request
end
Working in this way should allow any of the 3 modalities to work in a single request, but may result in errors if the bucket does not exist (as trying to add a file to non-existent bucket is an error). So it is more efficient, but would need different error handling. Conversely, you can make the extra requests if you need to be sure.

Ruby-Git gets wrong tag info

using ruby-git, a private repo cloned from github located at PATH_TO_REPO_NAME
g = Git.open("PATH_TO_REPO_NAME")
tag="us_prod"
puts g.tag(tag)
result is not what appears under https://github.com/gtforge/REPO_NAME/tags
error eventually was that while I pulled from the repo, I didnt fetch the tags separately, which is because I did not know that was required. false alarm.

Resources