When viewing Images in Google Cloud Platform's Artifact Registry, there is an "Updated" time column. However, whenever I build the same image and push it again, it creates a new image.
As part of a Cloud Build process, I am pulling this Ruby-based image, updating gems, then pushing it back to the Artifact Registry for use in later build steps (DB migration, unit tests). My hope is that upon updating the Ruby gems, nothing would happen in most cases, resulting in an identical Docker Image. In such a case, I'd expect no new layers to be pushed. However, every time I build, there is always a new layer pushed, and therefore a new Artifact.
Thus, the problem may be with how Cloud Build's gcr.io/cloud-builders/gsutil works rather than Artifact Registry itself. Here're my relevant build steps in case it matters:
- id: update_gems
name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/{my repo}/{my image}:deploy',
'-f', 'docker/bundled.Dockerfile', '.' ]
- id: update_image
name: 'gcr.io/cloud-builders/docker'
args: [ 'push', 'us-central1-docker.pkg.dev/$PROJECT_ID/{my repo}/{my image}:deploy' ]
The first step refers to "bundled.Dockerfile" which has these contents:
FROM us-central1-docker.pkg.dev/{same project as above}/{my repo}/{my image}:deploy
WORKDIR /workspace
RUN bundle update
RUN bundle install
Is there a way to accomplish what I'm currently doing (ie update a Deploy-time container used to run rspec tests and run rake db:migrate without making new images every time we build? I assume those images are taking up space and I'm getting billed for it. I assume there's a way to "Update" an existing Image in the Artifact Registry since there is an "Updated" column.
You are not looking at container "images". You are looking at "layers" of an image. The combination of layers results in a container image. These can also be artifacts for Cloud Build, etc.
You cannot directly modify a layer in Artifact Registry. Any changes you make to the image creation will result in one or more layers changing while results in one or more new layers being created. Creating an image does usually does not result in all layers changing. Your new image is probably the result of old and new layers. Layers are cached in Artifact Registry for future images/builds.
More than one container image can use the same layers. If Google allowed you to modify individual layers, you would break/corrupt the resulting containers.
Related
I'm currently having a frustrating issue.
I have a setup of GitLab CI on a VPS server, which is working completely fine, I have my pipelines running without a problem.
The issue comes after having to redo a pipeline. Each time GitLab deletes the whole folder, where the build is and builds it again to deploy it. My problem is that I have a "uploads" folder, that stores all user content, that was uploaded, and each time I redo a pipeline everything gets deleted from this folder and I obviously need this content, because it's the purpose of the app.
I have tried GitLab CI cache - no luck. I have also tried making a new folder, that isn't in the repository, it deletes it too.
Running my first job looks like so:
Job
As you can see there are a lot of lines, that says "Removing ..."
In order to persist a folder with local files while integrating CI pipelines, the best approach is to use Docker data persistency, as you'll be able to delete everything from the last build while keeping local files inside your application between your builds, while maintains the ability to start from stretch every time you start a new pipeline.
Bind-mount volumes
Volumes managed by Docker
GitLab's CI/CD Documentation provides a short briefing on how to persist storage between jobs when using Docker to build your applications.
I'd also like to point out that if you're using Gitlab Runner through SSH, they explicitly state they do not support caching between builds when using this functionality. Even when using the standard Shell executor, they highly discourage saving data to the Builds folder. so it can be argued that the best practice approach is to use a bind-mount volume to your host and isolate the application from the user uploaded data.
I recently deployed a Springboot application to AWS using Docker.
I'm going crazy trying to update my image/container. I've tried deleting everything, used and unused containers, images, tags, etc. and pushing everything again. Docker system prune, docker rm, docker rmi, using a different account too... It still runs that old version of the project.
It all indicates that there's something going on at the server level. I'm using PuTTY.
Help is much appreciated.
What do you mean by old container ? Is it some changes you did from some version control then didn't update on the container ?or just do docker restart container I'd
There's a lot to unpack here, so if you can provide more information - that'd be good. But here's a likely common denominator.
If you're using any AWS service (EKS, Farget, ECS), these are just based on the docker image you provide them. So if your image isn't correctly updated, they won't update.
But the same situation will occur with docker run also. If you keep pointing to the same image (or the image is conceptually unchanged) then you won't have a change.
So I doubt the problem is within Docker or AWS.
Most Likely
You're not rebuilding the spring application binaries with each change
Your Dockerfile is pulling in the wrong binaries
If you are using an image hosting service, like ECR or Nexus, then you need to make sure the image name is pointing to the correct image:tag combination. If you update the image, it should be given a unique tag and then that tagged image referenced by Docker/AWS
After you build an image, you can verify that the correct binaries were copied by using docker export <container_id> | tar -xf - <location_of_binary_in_image_filesystem>
That will pull out the binary. Then you can run it locally to test if it's what you wanted.
You can view the entire filesystem with docker export <container_id> | tar -tf - | less
I would like to use Google Cloudbuild to run integration tests. Currently, my tests take 30 minutes to run. The main bottleneck is that the tests query lots of data from external sources. I don't mind reusing the same data every time I run the tests. Is there a way for me to cache that data somewhere local to Cloudbuild so that it loads much faster?
There is a contributed cache cloud builder at https://github.com/GoogleCloudPlatform/cloud-builders-community/tree/master/cache that facilitates less anemic caching functionality into a GCS bucket.
I'd still love to see something more functional with more pre-fabbed cache rules like Travis CI has.
The only cache that I know in Cloud Build is Kaniko cache which allow to cache the layer of your container.
Cloud Build also have an internal cache for caching the "cloud builder" image, (the image that you set in the name of your steps). You can see that in your Cloud Build logs:
Starting Step #0
Step #0: Already have image (with digest): gcr.io/cloud-builders/gcloud
The only way that I see is to build a custom "cloud builder" container with all your static file in it. Cloud Build have to download it only once and it will be cached (I don't know the TTL). In any case, the download from GCR will be very quick.
However, when your files change, you have to rebuild it. This is a new CI pipeline in your project.
You could store a static copy of the requirements in a folder in GCS and use an rsync and GCP's internal bandwidth to pull the files before you execute your build. This is much quicker than pulling them from across the internet. Just add a step early in the build like this.
- name: gcr.io/cloud-builders/gsutil
args: ['rsync', '-r', 'gs://my-cache-bucket/repository', 'local-cache-dir']
I have a maven spring-boot project deployed on appengine that I am building and deploying using Google Cloud Build using the following builder image: https://github.com/strudeau/mvn-gcloud-builder
When performing a build, most of the time is spent downloading the plugins and dependencies from maven. I would like to be able to mount a persistent volume to this Docker image so as to be able to keep a persistent .M2 directory where my plugins and dependencies would be stored to avoid having them downloaded each time I do a build.
Google Cloud Filestore would probably be ideal if it weren't for the fact that you have to provision 1TB of data or more which becomes ridiculously expensive for a small non-production profit project.
Is there a way to mount a bucket as a filesystem on the docker image?
Can I mount a Google Persistent Disk?
You can't mount a bucket into the build, but you can copy your .M2 directory out to a bucket at the end of a build, then restore it at the beginning of a subsequent build.
I've lifted the example directly from the documentation, in case it disappears.
steps:
- name: gcr.io/cloud-builders/gsutil
args: ['cp', 'gs://mybucket/results.zip', 'previous_results.zip']
# operations that use previous_results.zip and produce new_results.zip
- name: gcr.io/cloud-builders/gsutil
args: ['cp', 'new_results.zip', 'gs://mybucket/results.zip']
Watch out when mixing this strategy with concurrent builds.
I know that an image consists of many layers.
for example, if u run "docker history [Image]", u can get a sequence of ids, and the ID on the top is the same as the image id, the rest IDs are layer ID.
in this case, are these rest layer IDs correspond to some other images? if it is true, can I view a layer as an image?
Layers are what compose the file system for both Docker images and Docker containers.
It is thanks to layers that when you pull a image, you eventually don't have to download all of its filesystem. If you already have another image that has some of the layers of the image you pull, only the missing layers are actually downloaded.
are these rest layer IDs correspond to some other images?
yes, they are just like images, but without any tag to identify them.
can I view a layer as an image?
yes
show case
docker pull busybox
docker history busybox
IMAGE CREATED CREATED BY SIZE COMMENT
d7057cb02084 39 hours ago /bin/sh -c #(nop) CMD ["sh"] 0 B
cfa753dfea5e 39 hours ago /bin/sh -c #(nop) ADD file:6cccb5f0a3b3947116 1.096 MB
Now create a new container from layer cfa753dfea5e as if it was an image:
docker run -it cfa753dfea5e sh -c "ls /"
bin dev etc home proc root sys tmp usr var
Layers and Images not strictly synonymous.
https://windsock.io/explaining-docker-image-ids/
When you pull an image from Docker hub, "layers" have "" Image IDs.
When you commit changes to locally built images, these layers will have Images IDs. Until when you push to Dockerhub. Only the leaf image will have Image ID for all others users pulling that image you uploaded.
From docker documentation:
A Docker image is a read-only template. For example, an image could contain an Ubuntu operating system with Apache and your web application installed. Images are used to create Docker containers. Docker provides a simple way to build new images or update existing images, or you can download Docker images that other people have already created. Docker images are the build component of Docker.
Each image consists of a series of layers. Docker makes use of union file systems to combine these layers into a single image. Union file systems allow files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system.
One of the reasons Docker is so lightweight is because of these layers. When you change a Docker image—for example, update an application to a new version— a new layer gets built. Thus, rather than replacing the whole image or entirely rebuilding, as you may do with a virtual machine, only that layer is added or updated. Now you don’t need to distribute a whole new image, just the update, making distributing Docker images faster and simpler.
The way I like to look at these things is like backup types. We can create full backups and after that create incremental backups. The full backup is not changed (although in some systems to decrease restore time after each incremental backup the full backup is changed to contain changes but for this discussion we can ignore this case) and just changes are backed up in a separate manner. So we can have different layers of backups, like we have different layers of images.
EDIT:
View the following links for more information:
Docker image vs container
Finding the layers and layer sizes for each Docker image