How to connect persistent storage to Google Cloud Build?

How to connect persistent storage to Google Cloud Build? - maven

I have a maven spring-boot project deployed on appengine that I am building and deploying using Google Cloud Build using the following builder image: https://github.com/strudeau/mvn-gcloud-builder
When performing a build, most of the time is spent downloading the plugins and dependencies from maven. I would like to be able to mount a persistent volume to this Docker image so as to be able to keep a persistent .M2 directory where my plugins and dependencies would be stored to avoid having them downloaded each time I do a build.
Google Cloud Filestore would probably be ideal if it weren't for the fact that you have to provision 1TB of data or more which becomes ridiculously expensive for a small non-production profit project.
Is there a way to mount a bucket as a filesystem on the docker image?
Can I mount a Google Persistent Disk?

You can't mount a bucket into the build, but you can copy your .M2 directory out to a bucket at the end of a build, then restore it at the beginning of a subsequent build.
I've lifted the example directly from the documentation, in case it disappears.
steps:
- name: gcr.io/cloud-builders/gsutil
args: ['cp', 'gs://mybucket/results.zip', 'previous_results.zip']
# operations that use previous_results.zip and produce new_results.zip
- name: gcr.io/cloud-builders/gsutil
args: ['cp', 'new_results.zip', 'gs://mybucket/results.zip']
Watch out when mixing this strategy with concurrent builds.

Related

Read properties file from an openshift directory

Apparently i have a basic spring boot application that i need to deploy in openshift. The openshift contains aan application.yml/application.properties in it in a var/config directory. After deployment of the application in openshift, I need to read the propert/yml file from the directory in my application. Is there any process how to do the same?

How are you building your container? Are you using S2I? Building the container yourself with Docker/podman? And what's "var/config" relative to? When you say "the OpenShift contains", what do you mean.
In short, an application running in an OpenShift container has access to whatever files you add to the container. There is effectively no such thing as an "OpenShift directory": you have access to what is in the container (and what is mounted to it). The whole point of containers is that you are limited to that.
So yourquestion probably boils down to, "how do I add a config file into my container". That will, however, depend on where the file is. Most tools for building containers will grab the standard places you'd have a config file, but if you have it somewhere non-standard you will have to copy it into the container yourself.
See this Spring Boot guide for how to use a Dockerfile to do it yourself. See here for using S2I to assemble the image (what you'd do if you were using the developer console of OpenShift to pull dir. (In general S2I tends to do what is needed automatically, but if your config is somewhere odd you may have to write an assemble script.)
This Dockerfile doc on copying files into a container might also be helpful.

How to prevent GitLab CI/CD from deleting the whole build

I'm currently having a frustrating issue.
I have a setup of GitLab CI on a VPS server, which is working completely fine, I have my pipelines running without a problem.
The issue comes after having to redo a pipeline. Each time GitLab deletes the whole folder, where the build is and builds it again to deploy it. My problem is that I have a "uploads" folder, that stores all user content, that was uploaded, and each time I redo a pipeline everything gets deleted from this folder and I obviously need this content, because it's the purpose of the app.
I have tried GitLab CI cache - no luck. I have also tried making a new folder, that isn't in the repository, it deletes it too.
Running my first job looks like so:
Job
As you can see there are a lot of lines, that says "Removing ..."

In order to persist a folder with local files while integrating CI pipelines, the best approach is to use Docker data persistency, as you'll be able to delete everything from the last build while keeping local files inside your application between your builds, while maintains the ability to start from stretch every time you start a new pipeline.
Bind-mount volumes
Volumes managed by Docker
GitLab's CI/CD Documentation provides a short briefing on how to persist storage between jobs when using Docker to build your applications.
I'd also like to point out that if you're using Gitlab Runner through SSH, they explicitly state they do not support caching between builds when using this functionality. Even when using the standard Shell executor, they highly discourage saving data to the Builds folder. so it can be argued that the best practice approach is to use a bind-mount volume to your host and isolate the application from the user uploaded data.

How can I update an Image in Google Artifact Registry?

When viewing Images in Google Cloud Platform's Artifact Registry, there is an "Updated" time column. However, whenever I build the same image and push it again, it creates a new image.
As part of a Cloud Build process, I am pulling this Ruby-based image, updating gems, then pushing it back to the Artifact Registry for use in later build steps (DB migration, unit tests). My hope is that upon updating the Ruby gems, nothing would happen in most cases, resulting in an identical Docker Image. In such a case, I'd expect no new layers to be pushed. However, every time I build, there is always a new layer pushed, and therefore a new Artifact.
Thus, the problem may be with how Cloud Build's gcr.io/cloud-builders/gsutil works rather than Artifact Registry itself. Here're my relevant build steps in case it matters:
- id: update_gems
name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/{my repo}/{my image}:deploy',
'-f', 'docker/bundled.Dockerfile', '.' ]
- id: update_image
name: 'gcr.io/cloud-builders/docker'
args: [ 'push', 'us-central1-docker.pkg.dev/$PROJECT_ID/{my repo}/{my image}:deploy' ]
The first step refers to "bundled.Dockerfile" which has these contents:
FROM us-central1-docker.pkg.dev/{same project as above}/{my repo}/{my image}:deploy
WORKDIR /workspace
RUN bundle update
RUN bundle install
Is there a way to accomplish what I'm currently doing (ie update a Deploy-time container used to run rspec tests and run rake db:migrate without making new images every time we build? I assume those images are taking up space and I'm getting billed for it. I assume there's a way to "Update" an existing Image in the Artifact Registry since there is an "Updated" column.

You are not looking at container "images". You are looking at "layers" of an image. The combination of layers results in a container image. These can also be artifacts for Cloud Build, etc.
You cannot directly modify a layer in Artifact Registry. Any changes you make to the image creation will result in one or more layers changing while results in one or more new layers being created. Creating an image does usually does not result in all layers changing. Your new image is probably the result of old and new layers. Layers are cached in Artifact Registry for future images/builds.
More than one container image can use the same layers. If Google allowed you to modify individual layers, you would break/corrupt the resulting containers.

How to cache job results only for running pipeline?

I wrote a pipline to build my Java application with Maven. I have feature branches and master branch in my Git repository, so I have to separate Maven goal package and deploy. Therefore I created two jobs in my pipeline. Last job needs job results from first job.
I know that I have to cache the job results, but I don't want to
expose the job results to GitLab UI
expose it to the next run of the pipeline
I tried following solutions without success.
Using cache
I followed How to deploy Maven projects to Artifactory with GitLab CI/CD:
Caching the .m2/repository folder (where all the Maven files are stored), and the target folder (where our application will be created), is useful for speeding up the process by running all Maven phases in a sequential order, therefore, executing mvn test will automatically run mvn compile if necessary.
but this solution shares job results between piplines, see Cache dependencies in GitLab CI/CD:
If caching is enabled, it’s shared between pipelines and jobs at the project level by default, starting from GitLab 9.0. Caches are not shared across projects.
and also it should not be used for caching in the same pipeline, see Cache vs artifacts:
Don’t use caching for passing artifacts between stages, as it is designed to store runtime dependencies needed to compile the project:
cache: For storing project dependencies
Caches are used to speed up runs of a given job in subsequent pipelines, by storing downloaded dependencies so that they don’t have to be fetched from the internet again (like npm packages, Go vendor packages, etc.) While the cache could be configured to pass intermediate build results between stages, this should be done with artifacts instead.
artifacts: Use for stage results that will be passed between stages.
Artifacts are files generated by a job which are stored and uploaded, and can then be fetched and used by jobs in later stages of the same pipeline. This data will not be available in different pipelines, but is available to be downloaded from the UI.
Using artifacts
This solution is exposing the job results to the GitLab UI, see artifacts:
The artifacts will be sent to GitLab after the job finishes and will be available for download in the GitLab UI.
and there is no way to expire the cache after finishing the pipeline, see artifacts:expire_in:
The value of expire_in is an elapsed time in seconds, unless a unit is provided.
Is there any way to cache job results only for the running pipline?

There is no way to send build artifacts between jobs in GitLab that only keeps them as long as the pipeline is running. This is how GitLab has designed their CI solution.
The recommended way to send build artifacts between jobs in GitLab is to use artifacts. This feature always upload the files to the GitLab instance, that they call the coordinator in this case. These files are available through the GitLab UI, as you write. For most cases this is a complete waste of space, but in rare cases it is very useful as you can download the artifacts and check why your pipeline broke.
The artifacts are available for download by project members that are at least Reporters, but can be viewed by everybody if public pipelines is enabled. You can read more about permissions here.
To not fill up your hard disk or quotas, you should use an expire_in. You could set it to just a few hours if you really don't want to waste space. I would not recommend this though, as if a job that depend on these artifacts fails and you retry it, if the artifacts have expired, you will have to restart the whole pipeline. I usually put this to one week for intermediate build artifacts as that often fits my needs.
If you want to use caches for keeping build artifacts, maybe because your build artifacts are huge and you need to optimize it, it should be possible to use CI_PIPELINE_ID as the key of the cache (I haven't tested this):
cache:
key: ${CI_PIPELINE_ID}
The files in the cache should be stored where your runner is installed. If you make sure that all jobs that need these build artifacts are executed by runners that have access to this cache, it should work.
You could also try some of the other predefined environment variables as key our your cache.

Cache Data in Google Cloudbuild

I would like to use Google Cloudbuild to run integration tests. Currently, my tests take 30 minutes to run. The main bottleneck is that the tests query lots of data from external sources. I don't mind reusing the same data every time I run the tests. Is there a way for me to cache that data somewhere local to Cloudbuild so that it loads much faster?

There is a contributed cache cloud builder at https://github.com/GoogleCloudPlatform/cloud-builders-community/tree/master/cache that facilitates less anemic caching functionality into a GCS bucket.
I'd still love to see something more functional with more pre-fabbed cache rules like Travis CI has.

The only cache that I know in Cloud Build is Kaniko cache which allow to cache the layer of your container.
Cloud Build also have an internal cache for caching the "cloud builder" image, (the image that you set in the name of your steps). You can see that in your Cloud Build logs:
Starting Step #0
Step #0: Already have image (with digest): gcr.io/cloud-builders/gcloud
The only way that I see is to build a custom "cloud builder" container with all your static file in it. Cloud Build have to download it only once and it will be cached (I don't know the TTL). In any case, the download from GCR will be very quick.
However, when your files change, you have to rebuild it. This is a new CI pipeline in your project.

You could store a static copy of the requirements in a folder in GCS and use an rsync and GCP's internal bandwidth to pull the files before you execute your build. This is much quicker than pulling them from across the internet. Just add a step early in the build like this.
- name: gcr.io/cloud-builders/gsutil
args: ['rsync', '-r', 'gs://my-cache-bucket/repository', 'local-cache-dir']

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio