How to prevent GitLab CI/CD from deleting the whole build - continuous-integration

I'm currently having a frustrating issue.
I have a setup of GitLab CI on a VPS server, which is working completely fine, I have my pipelines running without a problem.
The issue comes after having to redo a pipeline. Each time GitLab deletes the whole folder, where the build is and builds it again to deploy it. My problem is that I have a "uploads" folder, that stores all user content, that was uploaded, and each time I redo a pipeline everything gets deleted from this folder and I obviously need this content, because it's the purpose of the app.
I have tried GitLab CI cache - no luck. I have also tried making a new folder, that isn't in the repository, it deletes it too.
Running my first job looks like so:
Job
As you can see there are a lot of lines, that says "Removing ..."

In order to persist a folder with local files while integrating CI pipelines, the best approach is to use Docker data persistency, as you'll be able to delete everything from the last build while keeping local files inside your application between your builds, while maintains the ability to start from stretch every time you start a new pipeline.
Bind-mount volumes
Volumes managed by Docker
GitLab's CI/CD Documentation provides a short briefing on how to persist storage between jobs when using Docker to build your applications.
I'd also like to point out that if you're using Gitlab Runner through SSH, they explicitly state they do not support caching between builds when using this functionality. Even when using the standard Shell executor, they highly discourage saving data to the Builds folder. so it can be argued that the best practice approach is to use a bind-mount volume to your host and isolate the application from the user uploaded data.

Related

How to cache job results only for running pipeline?

I wrote a pipline to build my Java application with Maven. I have feature branches and master branch in my Git repository, so I have to separate Maven goal package and deploy. Therefore I created two jobs in my pipeline. Last job needs job results from first job.
I know that I have to cache the job results, but I don't want to
expose the job results to GitLab UI
expose it to the next run of the pipeline
I tried following solutions without success.
Using cache
I followed How to deploy Maven projects to Artifactory with GitLab CI/CD:
Caching the .m2/repository folder (where all the Maven files are stored), and the target folder (where our application will be created), is useful for speeding up the process by running all Maven phases in a sequential order, therefore, executing mvn test will automatically run mvn compile if necessary.
but this solution shares job results between piplines, see Cache dependencies in GitLab CI/CD:
If caching is enabled, it’s shared between pipelines and jobs at the project level by default, starting from GitLab 9.0. Caches are not shared across projects.
and also it should not be used for caching in the same pipeline, see Cache vs artifacts:
Don’t use caching for passing artifacts between stages, as it is designed to store runtime dependencies needed to compile the project:
cache: For storing project dependencies
Caches are used to speed up runs of a given job in subsequent pipelines, by storing downloaded dependencies so that they don’t have to be fetched from the internet again (like npm packages, Go vendor packages, etc.) While the cache could be configured to pass intermediate build results between stages, this should be done with artifacts instead.
artifacts: Use for stage results that will be passed between stages.
Artifacts are files generated by a job which are stored and uploaded, and can then be fetched and used by jobs in later stages of the same pipeline. This data will not be available in different pipelines, but is available to be downloaded from the UI.
Using artifacts
This solution is exposing the job results to the GitLab UI, see artifacts:
The artifacts will be sent to GitLab after the job finishes and will be available for download in the GitLab UI.
and there is no way to expire the cache after finishing the pipeline, see artifacts:expire_in:
The value of expire_in is an elapsed time in seconds, unless a unit is provided.
Is there any way to cache job results only for the running pipline?
There is no way to send build artifacts between jobs in GitLab that only keeps them as long as the pipeline is running. This is how GitLab has designed their CI solution.
The recommended way to send build artifacts between jobs in GitLab is to use artifacts. This feature always upload the files to the GitLab instance, that they call the coordinator in this case. These files are available through the GitLab UI, as you write. For most cases this is a complete waste of space, but in rare cases it is very useful as you can download the artifacts and check why your pipeline broke.
The artifacts are available for download by project members that are at least Reporters, but can be viewed by everybody if public pipelines is enabled. You can read more about permissions here.
To not fill up your hard disk or quotas, you should use an expire_in. You could set it to just a few hours if you really don't want to waste space. I would not recommend this though, as if a job that depend on these artifacts fails and you retry it, if the artifacts have expired, you will have to restart the whole pipeline. I usually put this to one week for intermediate build artifacts as that often fits my needs.
If you want to use caches for keeping build artifacts, maybe because your build artifacts are huge and you need to optimize it, it should be possible to use CI_PIPELINE_ID as the key of the cache (I haven't tested this):
cache:
key: ${CI_PIPELINE_ID}
The files in the cache should be stored where your runner is installed. If you make sure that all jobs that need these build artifacts are executed by runners that have access to this cache, it should work.
You could also try some of the other predefined environment variables as key our your cache.

Deploy code from gitlab on ec2 WITHOUT.gitlab-ci.yml file

I am using gitlab as repository and want to push my code on ec2 whenever any commit is done on gitlab. The gitlab CD/CI documentation states that I have to add a file .gitlab-ci.yml at the root directory of my repo. This is actually a problem for me because, I want project repo to have only code and not any configuration related info like build and deploy etc. Also when anybody clones the repo, they would have access to location where my code is pushed/deployed on ec2. Is there any work around for this problem ?
You'll need to use a gitlab-ci.yml filke to deploy your application. The file provides instructions and a pipeline "infrastructure" which, if properly configured, will build, test and automatically deploy your code.
If you are worried about leaking credentials, you should use the built-in instance variables to mask your important bits, like a "$SERVERNAME" or "$DB_PASSWORD" for instance.
Lastly, you can use the power of gitignore, in order to not publish all of your credentials or sensitive bits to your projects' servers or instances.

CI based on docker-compose?

I am currently building a little application that requires that some massively annoying software is installed and running in backround. To ease the pain of developing, I wrote a set of docker-compose files that runs the necessary daemons, creates some jobs, and throws in some test data.
Now, I'd like to run this in a CI-like manner. I currently have Jenkins check all the different repositories and execute a shell script that calles docker-compose up --abort-on-container-exit. That gets the job done, but it seems like a hack, and I'm not such a huge fan of Jenkins.
What I want to ask is: is there a more beautiful way of doing this? Specifically, is there a CI that will
watch a set of git repositories,
re-execute docker-compose (possibly multiple times with different sets of parameters), and
nicely collect and split the logs and tell me which container exactly failed how?
(Optionally) is not some cloud service but installable on my local server?
If the answer to this is "write a Jenkins module", then fine, so be it.
I'm aware that there are options like gitlab-ci, but I'd like to keep the CI script in a fashion that can also be easily executed during development, before pusing to a repo.

gcloud automatic redeployment Golang app

I have a Golang app running on Google Cloud App Engine that I can update manually with "gcloud app deploy" but I cannot figure out how to schedule automatic redeployments. I'm assuming I have to use cron.yaml, but then I'm confused about what url to use. Basically it's just a web app with one main index.html page with changing content, and I would like to schedule automatic redeployments... how do I have to go about that?
If you want to automatically re-deploy your app when the code changes, you need what's called CI/CD (Continuous integration/deployment). What a CI does is, for each new commit to your repository, check out the new code and run a test script. If all the tests pass (or if you don't have any tests at all), the CI server can then deploy your code to App Engine, all automatically.
One free (for open-source projects) CI provider is Travis CI. To configure it, you need to make an account with Travis, and a file called .travis.yml in the root of your repository. To set up App Engine deploys, you can follow this guide to set up a service account and add the encrypted file to your repo. It will run a gcloud app deploy from a container on their servers, whenever you push code to a certain branch (master by default) in your repo.
Another option, which avoids setting up CI at all, is to simply change your app to generate the dynamic parts of the page when it gets requested. Reading the documentation for html/template would point you in the right direction.

Heroku: Can I commit remotely

We have a CMS on heroku, some files were generated by the CMS, how can I pull those changes down? Can I commit the changes remotely and pull them down? Is there an FTP option of some kind?
See: https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem
It's not designed for persistent file generation and usage.
In practice, it works like this: User puts some code into a repository. That code is dynamically pulled into temporary Amazon EC instances and executed. The code can be pulled from virtual machine to virtual machine, node to node, without disruption, across data centers. There is no real "place" to get the products of your code from the environment, because anything generated by the checked-out code can (and will) be destroyed as your code deploy skips around between the temporary machines.
That being said, there are some workarounds:
If your app includes something like a file browser within your deployed code, you can grab the (entirely) temporary files using that file browser, and commit it back to your persistent code trunk.
Another option is using something like S3 for your persistent storage, with your application reading from, and writing to, a data storage service, knowing that while heroku will just re-write and destroy your local data on a frequent basis, the external service will maintain the files.
Similarly, you can change your application to use heroku's postgres for persistent data storage, or use Amazon's RDS, (etc.).
Alternately, you can edit your application in such a way as to ensure that any files generated by it will be regenerated every time the code is refreshed, redeployed, and moved around.

Resources