I have a Python Scrapy spider that I want to run at regular intervals on Heroku or similar.
It produces a JSON file that I would like to commit to a Github repo.
Given Heroku or other similar platform, how can I set it up to automatically commit, post-run?
You could write an item pipeline that keeps a static list of items.
Give this pipeline a function called spider_closed and use the dispatcher to attach that function as a signal handler for spider_closed signal.
In spider_closed use json.loads to serialize your data. Save it to a file. And commit it to github.
This repo has good code examples:
https://github.com/dm03514/CraigslistGigs/blob/master/craigslist_gigs/pipelines.py
This seems to be a good library to use for the git part:
https://pypi.python.org/pypi/GitPython/
HTH - if you need anymore help I'd be happy to update this response.
Related
When a collaborator creates/updates a pull request against my repository's default branch I want two things to happen in a specific order:
Automatic code formatting + commit formatting changes to the PR branch
Run code quality tests and unit tests
If tests complete without errors the default branch's branch protection rules should allow merging.
The problem is that when step 1 completes, the current workflows are now invalid since there is a new commit on the PR branch. Because of this, the results of the tests can not validate the PR, rendering the PR impossible to merge.
Step no. 1 does not trigger another round of Actions since it was committed and pushed by an Action itself and that behavior would just create an endless loop of Actions anyways.
What I want is a way to run step no. 1 automatically before anything else happens so that simple warnings get squashed without developers having to do anything manually.
I am trying to avoid doing this through pre-commit hooks since that would require developers to manually set up their environments the same way.
How can I create the flow I am describing by using GitHub Actions?
Since we don't have the actual action script I can only assume what steps it performs, and in what order (see the details under TL;DR).
The true issue is that your action is failing the separation of concerns principle, i.e. it is doing validation (code quality analysis, read-only) as it should, but it is also doing modification (code formatting correction).
Code formatting is, in general, a task more suited for a pre-commit or pre-push hook instead. If badly formatted code does get pushed, code formatting check should fail the build instead of auto-correcting it.
TL;DR
A simplified example... The origin repository is github.com/example/app, the pull request is for the branch new-feature, and the action looks something like this:
steps:
- name: Checkout repository (1)
uses: actions/checkout
- name: Check code formatting (2)
run: run-lint -autofix && git add . && git commit "formatted"
- name: Run tests and check code quality (3)
run: run-tests && run-sonar
- name: Merge it (4)
...
When a contributor pushes changes to the new-feature branch a build will be triggered, and it will be done by e.g. GitHub's build-bot-42.
The build-bot-42 will go through the action's steps in order:
(1) get a copy of the code in question onto itself - build-bot-42 is not the same computer as the one storing GitHub's git repository. The checkout action will basically do the following:
cd ${unique-temp-dir}
git init
git fetch --all
git checkout origin/new-feature
(2) run a lint tool in auto-correct mode, and then apply the changes in a git commit.
(3) run tests and the static code analysis.
If all checks pass we have a local git repository ready to be merged and pushed to origin (in that order).
Of course, if the tests and/code quality don't pass the pull request still has the badly formatted code because the action didn't push the code formatting changes.
I have a few different CI/CD flows, one of them automatically creates GitLab merge requests for specific branches. Each merge request has a generated description and title, with links to resolved issues, etc. After merge request is merged, GitLab creates a merge commit with default schema, and it looks like this:
Merge branch '<my branch>' into '<my other branch>'
<Title of merge request>
See merge request <number of merge request>
I'd like this merge commit to be diffferent, and contain merge request description only, since CD should use it to generate changelogs for each build. I've tried to find an option to change it in GitLab API, but I can't find any parameter or request that would allow me to set merge commit message when it's created, or change it afterwards.
Is there any way to copy merge request description to merge commit body automatically? Maybe some API fields, or templates that can be used?
Based on this issue that’s been open for three years, that functionality isn’t even in the UI so an API operation for it most likely wouldn’t exist until that does. https://gitlab.com/gitlab-org/gitlab/-/issues/2551
Your best option until then is to use git:
Clone the repo
Rebase and reword the commit message
Push it back to the remote
I am trying to find a portable way to produce code annotations for GitHub in a way that would avoid a vendor-lockin.
Mainly I want to dump annotations inside a file (yaml, json,...) during build process and have a task at the end that does transform this file into github annotations.
The main goal here is to avoid hardcoding support for github-annotation into the tools that produce them, so other CI/CD systems could also consume the annotation-reports and display them in their UI.
linters -> annotations.report -> github-upload
Tools like flake8 are able to produce output in parsable format file:line:column: message, but I need to know if there is any attempt to standardize annotations so we can collect and combine them from multiple tools and feed them to the CI/CD engine.
Today I googled up what the heck those "Github Action Annotations" are all, and this was among the hits:
https://github.com/marketplace/actions/annotations-action
GitHub action for creating annotations from JSON file
As of now that page also contains:
This repository uses npm packages from #attest scope on github; we are working hard to open source these packages.
Annotations Action is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.
I didn't try it, again, just a random google hit.
I am currently using https://github.com/yuzutech/annotations-action
Sample action code:
- name: Annotate
uses: yuzutech/annotations-action#v0.3.0
with:
repo-token: ${{secrets.GITHUB_TOKEN}}
input: ./annotations.json
title: 'Findings'
ignore-missing-file: true
It does its job well but with one minor defect. If you have a findings on a commit/PR you get to see the finding with a beautiful annotation right where you need it. If you re-push changes, even if the finding persists, the annotation is not displayed on later commits. I have opened an issue but I have not yet received an answer.
The annotations-action mentioned above has not been updated and it does not work with me at all (deprecated calls).
I haven't found anything else that worked exactly as I wanted it to.
Update: I found that you can use reviewdog to annotate based on findings. I also created a GitHub action that can be used for Static Code Analysis here https://github.com/tsigouris007/action-semgrep-reviewdog. You can visit the entrypoint.sh file and check how I piped the custom output to reviewdog utilizing jq.
I am experimenting with GitHub API using octokit ruby gem. My goal is to be able to extract the 'tag' that a commit SHA belongs to.
Now I can easily do this on command line using
> git describe 688ae0b --tags
and get output
> 3.0.1-122-g688ae0b
which tells me Tag, commits since tags, and last commit hash.
How do I get same info from GitHub API?
Answers using GitHub API or Octokit client would both do, as I can translate from one other just fine.
I have looked at a bunch of things like, releases, tags, commits etc.. but none of them give me this information that I can get in one line from command line.
I am not looking for 'how to use github api'. I am looking for specific request or set of requests that will let me derive this information.
Since there is no easy way to run a query like git describe with the GitHub API, that leaves you with an iterative process involving:
listing all tags
trying to diff a tag against your specific commit, with the compare 2 commits API
GET /repos/:owner/:repo/compare/:base...:head
(with base being the commit, and head being the tag)
If there are any result, the commit is accessible from the tag.
(I use a similar approach in "Github API: Finding untagged commits")
Hey so I see I have a main repo and then a development fork of the repo. I work of the dev and submit pull request for code review to the main, if it gets accepted my boss will merge my pull request with the main repo. We want to set up an even hook similar to "Post-Recieve URLs" that will send a post to my main web app once a pull request is accepted to do a git pull. If I have this right "Post-Recieve URLs" only works for if I commit directly to the repository is that correct? So it wont work if I merge a pull request.
If I have this right "Post-Recieve URLs" only works for if I commit directly to the repository is that correct?
Yes, so not activated in case of merge done directly within the repo.
And this thread mentions that a "hook on merge" (ie on the auto-commit done by a merge) might not work.
A background job in charge of monitoring any new commits (and check if that commit is the result of a merge, by looking at its parent: more than one parent means "merge") is more appropriate.