tar: file changed as we read it - makefile

I am using make and tar to backup. When executing makefile, tar command shows file changed as we read it. In this case,
the tar package is ok when the warning comes up
but it stops the tar command for the following backup
the file showing the warning in fact doesn't change -- it is really strange that the warning comes up
the files showing the warning come up randomly, I mean, everytime I run my makefile, the files showing the warning are different
--ignore-failed-read doesn't help. I am using tar 1.23 in MinGW
I just changed my computer to WIN7 64 bit. The script works well in old WIN7 32 bit. But the tar version is not as new as the 1.23.
How can I stop the tar's warning to stop my backup following the warning?
Edit-2: it might be the reason
As I said above, the bash shell script worked well in my old computer. Comparing with the old computer, the msys version is different. So is the version of tar command. In the old computer, tar is 1.13.19 and it is 1.23 in the new computer. I copied the old tar command without copying its dependency msys-1.0.dll to the new computer and renamed it tar_old. And I also updated the tar command in the shell script and run the script. Then everything is ok. So, it seemed that the problem is the tar command. I am sure that there is no any file changed when taring. Is it a bug for tar command in new version? I don't know.
Edit-1: add more details
The backup is invoked by a bash shell script. It scans the target directory and builds makefile then invokes make to use tar command for backup. Followed is a typical makefile built by the bash shell script.
#--------------------------------------------
# backup VC
#--------------------------------------------
# the program for packing
PACK_TOOL=tar
# the option for packing tool
PACK_OPTION=cjvf
# M$: C driver
WIN_C_DIR=c:
# M$: D driver
WIN_D_DIR=d:
# M$: where the software is
WIN_PRG_DIR=wuyu/tools
# WIN_PRG_DIR=
# where to save the backup files
BAKDIR=/home/Wu.Y/MS_bak_MSYS
VC_FRAMEWORK=/home/Wu.Y/MS_bak_MSYS/tools/VC/VC_framework.tar.bz2
VC_2010=/home/Wu.Y/MS_bak_MSYS/tools/VC/VC_2010.tar.bz2
.PHONY: all
all: $(VC_FRAMEWORK) $(VC_2010)
$(VC_FRAMEWORK): $(WIN_C_DIR)/$(WIN_PRG_DIR)/VC/Framework/*
#$(PACK_TOOL) $(PACK_OPTION) "$#" --ignore-failed-read /c/$(WIN_PRG_DIR)/VC/Framework
$(VC_2010): $(WIN_C_DIR)/$(WIN_PRG_DIR)/VC/VS2010/*
#$(PACK_TOOL) $(PACK_OPTION) "$#" --ignore-failed-read /c/$(WIN_PRG_DIR)/VC/VS2010
As you can see, the tar package is stored in ~/MS_bak_MSYS/tools/VC/VC_2010.tar.bz2. I run the script in ~/qqaa. ~/MS_bak_MSYS is excluded from tar command. So, the tar file I am creating is not inside a directory I am trying to put into tar file. This is why I felt it strange that the warning came up.

I also encounter the tar messages "changed as we read it". For me these message occurred when I was making tar file of Linux file system in bitbake build environment. This error was sporadic.
For me this was not due to creating tar file from the same directory. I am assuming there is actually some file overwritten or changed during tar file creation.
The message is a warning and it still creates the tar file. We can still suppress these warning message by setting option
--warning=no-file-changed
(http://www.gnu.org/software/tar/manual/html_section/warnings.html
)
Still the exit code return by the tar is "1" in warning message case:
http://www.gnu.org/software/tar/manual/html_section/Synopsis.html
So if we are calling the tar file from some function in scripts, we can handle the exit code something like this:
set +e
tar -czf sample.tar.gz dir1 dir2
exitcode=$?
if [ "$exitcode" != "1" ] && [ "$exitcode" != "0" ]; then
exit $exitcode
fi
set -e

Although its very late but I recently had the same issue.
Issue is because dir . is changing as xyz.tar.gz is created after running the command. There are two solutions:
Solution 1:
tar will not mind if the archive is created in any directory inside .. There can be reasons why can't create the archive outside the work space. Worked around it by creating a temporary directory for putting the archive as:
mkdir artefacts
tar -zcvf artefacts/archive.tar.gz --exclude=./artefacts .
echo $?
0
Solution 2:
This one I like. create the archive file before running tar:
touch archive.tar.gz
tar --exclude=archive.tar.gz -zcvf archive.tar.gz .
echo $?
0

If you want help debugging a problem like this you need to provide the make rule or at least the tar command you invoked. How can we see what's wrong with the command if there's no command to see?
However, 99% of the time an error like this means that you're creating the tar file inside a directory that you're trying to put into the tar file. So, when tar tries to read the directory it finds the tar file as a member of the directory, starts to read it and write it out to the tar file, and so between the time it starts to read the tar file and when it finishes reading the tar file, the tar file has changed.
So for example something like:
tar cf ./foo.tar .
There's no way to "stop" this, because it's not wrong. Just put your tar file somewhere else when you create it, or find another way (using --exclude or whatever) to omit the tar file.

Here is a one-liner for ignoring the tar exit status if it is 1. There is no need to set +e as in sandeep's script. If the tar exit status is 0 or 1, this one-liner will return with exit status 0. Otherwise it will return with exit status 1. This is different from sandeep's script where the original exit status value is preserved if it is different from 1.
tar -czf sample.tar.gz dir1 dir2 || [[ $? -eq 1 ]]

To enhance Fabian's one-liner; let us say that we want to ignore only exit status 1 but to preserve the exit status if it is anything else:
tar -czf sample.tar.gz dir1 dir2 || ( export ret=$?; [[ $ret -eq 1 ]] || exit "$ret" )
This does everything sandeep's script does, on one line.

Simply using an outer directory for the output, solved the problem for me.
sudo tar czf ./../31OCT18.tar.gz ./

Exit codes for tar are restricted, so you don't get to much information.
You can assume that ec=1 is safe to ignore, but it might trip - i.e. the gzip-example in other posts (exit code from external programs).
The reason for the file changed as we read it error/warning can be varying.
A log file inside the directory.
Writing to a tar file in the same directory you are trying to back up.
etc.
Possible workarounds can involve:
exclude known files (log files, tar-files, etc)
ensure log files are written to other directories
This can be quite involved, so you might want to still just run the tar command and preferably safely ignore some errors / warnings.
To do this you will have to:
Save the tar output.
Save the exit code
Check the output against known warnings and errors, not unlike tar's own ignore.
Conditionally pass another exit code to the next program in the pipe.
In OP's case this would have to be wrapped in a script and run as PACK_TOOL.
# List of errors and warnings from "tar" which we will safely ignore.
# Adapt to your findings and needs
IGNORE_ERROR="^tar:.*(Removing leading|socket ignored|file changed as we read it)"
# Save stderr from "tar"
RET=$(tar zcf $BACKUP --exclude Cache --exclude output.log --exclude "*cron*sysout*" $DIR 2>&1)
EC=$? # Save "tar's" exit code
echo "$RET"
if [ $EC -ne 0 ]
then
# Check the RET output, remove (grep -v) any errors / warning you wish to ignore
REAL_ERRORS=$(echo "$RET" | grep "^tar: " | grep -Ev "${IGNORE_ERROR:?}")
# If there is any output left you actually got an error to check
if [ -n "$REAL_ERRORS" ]
then
echo "ERROR during backup of ${DIR:?} to ${BACKUP:?}"
else
echo "OK backup of (warnings ignored) ${DIR:?}"
EC=0
fi
else
echo "OK backup of ${DIR:?}"
fi

It worked for me by adding a simple sleep timeout of 20 sec.
This might happen if your source directory is still writing. Hence put a sleep so that the backup would finish and then tar should work fine. This also helped me in getting the right exit status.
sleep 20
tar -czf ${DB}.${DATE}.tgz ./${DB}.${DATE}

I am not sure does it suit you but I noticed that tar does not fail on changed/deleted files in pipe mode. See what I mean.
Test script:
#!/usr/bin/env bash
set -ex
tar cpf - ./files | aws s3 cp - s3://my-bucket/files.tar
echo $?
Deleting random files manually...
Output:
+ aws s3 cp - s3://my-bucket/files.tar
+ tar cpf - ./files
tar: ./files/default_images: File removed before we read it
tar: ./files: file changed as we read it
+ echo 0
0

Answer should be very simple: Don't save your tar file while "Taring" it in the same directory.
Just do: tar -cvzf resources/docker/php/php.tar.gz .
Eventually,
it will tar the current directory and save it to another directory.
That's easy peasy, lemon squeezy fellas

Related

Bash script to check if a new file has been created on a directory after run a command

By using bash script, I'm trying to detect whether a file has been created on a directory or not while running commands. Let me illustrate the problem;
#!/bin/bash
# give base directory to watch file changes
WATCH_DIR=./tmp
# get list of files on that directory
FILES_BEFORE= ls $WATCH_DIR
# actually a command is running here but lets assume I've created a new file there.
echo >$WATCH_DIR/filename
# and I'm getting new list of files.
FILES_AFTER= ls $WATCH_DIR
# detect changes and if any changes has been occurred exit the program.
After that I've just tried to compare these FILES_BEFORE and FILES_AFTER however couldn't accomplish that. I've tried;
comm -23 <($FILES_AFTER |sort) <($FILES_BEFORE|sort)
diff $FILES_AFTER $FILES_BEFORE > /dev/null 2>&1
cat $FILES_AFTER $FILES_BEFORE | sort | uniq -u
None of them gave me a result to understand there is a change or not. What I need is detecting the change and exiting the program if any. I am not really good at this bash script, searched a lot on the internet however couldn't find what I need. Any help will be appreciated. Thanks.
Thanks to informative comments, I've just realized that I've missed the basics of bash script but finally made that work. I'll leave my solution here as an answer for those who struggle like me.:
WATCH_DIR=./tmp
FILES_BEFORE=$(ls $WATCH_DIR)
echo >$WATCH_DIR/filename
FILES_AFTER=$(ls $WATCH_DIR)
if diff <(echo "$FILES_AFTER") <(echo "$FILES_BEFORE")
then
echo "No changes"
else
echo "Changes"
fi
It outputs "Changes" on the first run and "No Changes" for the other unless you delete the newly added documents.
I'm trying to interpret your script (which contains some errors) into an understanding of your requirements.
I think the simplest way is simply to rediect the ls command outputto named files then diff those files:
#!/bin/bash
# give base directory to watch file changes
WATCH_DIR=./tmp
# get list of files on that directory
ls $WATCH_DIR > /tmp/watch_dir.before
# actually a command is running here but lets assume I've created a new file there.
echo >$WATCH_DIR/filename
# and I'm getting new list of files.
ls $WATCH_DIR > /tmp/watch_dir.after
# detect changes and if any changes has been occurred exit the program.
diff -c /tmp/watch_dir.after /tmp/watch_dir.before
If the any files are modified by the 'commands', i.e. the files exists in the 'before' list, but might change, the above will not show that as a difference.
In this case you might be better off using a 'marker' file created to mark the instance the monitoring started, then use the find command to list any newer/modified files since the market file. Something like this:
#!/bin/bash
# give base directory to watch file changes
WATCH_DIR=./tmp
# get list of files on that directory
ls $WATCH_DIR > /tmp/watch_dir.before
# actually a command is running here but lets assume I've created a new file there.
echo >$WATCH_DIR/filename
# and I'm getting new list of files.
find $WATCH_DIR -type f -newer /tmp/watch_dir.before -exec ls -l {} \;
What this won't do is show any files that were deleted, so perhaps a hybrid list could be used.
Here is how I got it to work. It's also setup up so that you can have multiple watched directories with the same script with cron.
for example, if you wanted one to run every minute.
* * * * * /usr/local/bin/watchdir.sh /makepdf
and one every hour.
0 * * * * /user/local/bin/watchdir.sh /incoming
#!/bin/bash
WATCHDIR="$1"
NEWFILESNAME=.newfiles$(basename "$WATCHDIR")
if [ ! -f "$WATCHDIR"/.oldfiles ]
then
ls -A "$WATCHDIR" > "$WATCHDIR"/.oldfiles
fi
ls -A "$WATCHDIR" > $NEWFILESNAME
DIRDIFF=$(diff "$WATCHDIR"/.oldfiles $NEWFILESNAME | cut -f 2 -d "")
for file in $DIRDIFF
do
if [ -e "$WATCHDIR"/$file ];then
#do what you want to the file(s) here
echo $file
fi
done
rm $NEWFILESNAME

how to unzip a file using unzip command?

I have an script which creates a folder named "data". Then it downloads a file using wget and these files (.zip format) are moved from the current directory to the folder "data". After that what I want is to unzip these files. I'm using unzip filename.zip and it works when I use it on the cmd, however I don't know why it's not working in the script.
Here is the script:
#!/bin/bash
mkdir data
wget http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip && mv datos_abiertos_covid19.zip data && unzip datos_abiertos_covid19.zip
wget http://187.191.75.115/gobmx/salud/datos_abiertos/diccionario_datos_covid19.zip && mv diccionario_datos_covid19.zip data && unzip diccionario_datos_covid19.zip
datos_abiertos_covid19.zip and diccionario_datos_covid19.zip are the files I want to unzip once they are in my folder "data". I would really appreciate if someone can help me. Thanks in advance!
It fails because unzip foo.zip assumes foo.zip is in the current directory, but you just moved it to a subdirectory data. Interactively, you probably cd data first and that's why it works.
To make it work in your script, just have your script cd data as well:
#!/bin/bash
mkdir data
cd data || exit 1
wget http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip && unzip datos_abiertos_covid19.zip
That way, the file is downloaded directly to the data directory so no mv is necessary, and the unzip command works as expected.
My approach:
#!/bin/bash
set -e # Exit if any command fails
mkdir data
pushd ./data >/dev/null
for i in 'datos_abiertos_covid19.zip' 'diccionario_datos_covid19.zip'; do
# Don't unzip (or exit) if 'wget' fails, don't exit if 'unzip' fails
wget "http://187.191.75.115/gobmx/salud/datos_abiertos/$i" -O "./$i" || continue
unzip "./$i" || true
done
popd >/dev/null
The file names don't need to be quoted in this case, but I did so anyway, to emphasise you can/should do so if necessary
You could of course use variables for the file list, URL, download dir, etc. if you wanted to build a more general script for downloading zip files
I know it's marked bash, but worth mentioning: pushd and popd are not defined in POSIX, you can change those to cd ./data and cd .. for more portability. Obviously wget is not POSIX either, but very common (see this thread for interesting info on that topic)

Extracting certain files from a tar archive on a remote ssh server

I am running numerous simulations on a remote server (via ssh). The outcomes of these simulations are stored as .tar archives in an archive directory on this remote server.
What I would like to do, is write a bash script which connects to the remote server via ssh and extracts the required output files from each .tar archive into separate folders on my local hard drive.
These folders should have the same name as the .tar file from which the files come (To give an example, say the output of simulation 1 is stored in the archive S1.tar on the remote server, I want all '.dat' and '.def' files within this .tar archive to be extracted to a directory S1 on my local drive).
For the extraction itself, I was trying:
for f in *.tar; do
(
mkdir ../${f%.tar}
tar -x -f "$f" -C ../${f%.tar} "*.dat" "*.def"
)
done
wait
Every .tar file is around 1GB and there is a lot of them. So downloading everything takes too much time, which is why I only want to extract the necessary files (see the extensions in the code above).
Now the code works perfectly when I have the .tar files on my local drive. However, what I can't figure out is how I can do it without first having to download all the .tar archives from the server.
When I first connect to the remote server via ssh username#host, then the terminal stops with the script and just connects to the server.
Btw I am doing this in VS Code and running the script through terminal on my MacBook.
I hope I have described it clear enough. Thanks for the help!
Stream the results of tar back with filenames via SSH
To get the data you wish to retrieve from .tar files, you'll need to pass the results of tar to a string of commands with the --to-command option. In the example below, we'll run three commands.
# Send the files name back to your shell
echo $TAR_FILENAME
# Send the contents of the file back
cat /dev/stdin
# Send EOF (Ctrl+d) back (note: since we're already in a $'' we don't use the $ again)
echo '\004'
Once the information is captured in your shell, we can start to process the data. This is a three-step process.
Get the file's name
note that, in this code, we aren't handling directories at all (simply stripping them away; i.e. dir/1.dat -> 1.dat)
you can write code to create directories for the file by replacing the forward slashes / with spaces and iterating over each directory name but that seems out-of-scope for this.
Check for the EOF (end-of-file)
Add content to file
# Get the files via ssh and tar
files=$(ssh -n <user#server> $'tar -xf <tar-file> --wildcards \'*\' --to-command=$\'echo $TAR_FILENAME; cat /dev/stdin; echo \'\004\'\'')
# Keeps track of what state we're in (filename or content)
state="filename"
filename=""
# Each line is one of these:
# - file's name
# - file's data
# - EOF
while read line; do
if [[ $state == "filename" ]]; then
filename=${line/*\//}
touch $filename
echo "Copying: $filename"
state="content"
elif [[ $state == "content" ]]; then
# look for EOF (ctrl+d)
if [[ $line == $'\004' ]]; then
filename=""
state="filename"
else
# append data to file
echo $line >> <output-folder>/$filename
fi
fi
# Double quotes here are very important
done < <(echo -e "$files")
Alternative: tar + scp
If the above example seems overly complex for what it's doing, it is. An alternative that touches the disk more and requires to separate ssh connections would be to extract the files you need from your .tar file to a folder and scp that folder back to your workstation.
ssh -n <username>#<server> 'mkdir output/; tar -C output/ -xf <tar-file> --wildcards *.dat *.def'
scp -r <username>#<server>:output/ ./
The breakdown
First, we'll make a place to keep our outputted files. You can skip this if you already know the folder they'll be in.
mkdir output/
Then, we'll extract the matching files to this folder we created (if you don't want them to be in a different folder remove the -C output/ option).
tar -C output/ -xf <tar-file> --wildcards *.dat *.def
Lastly, now that we're running commands on our machine again, we can run scp to reconnect to the remote machine and pull the files back.
scp -r <username>#<server>:output/ ./

Short tar script: 'command not found' when trying to add today's date to a compressed file name

I'm trying to create a script that will do the following:
create /home/testuser/backup as a directory if it doesn't exist (and won't show an error message if it does exist)
obtain the current date and store it as a variable
Using Tar:
backup the entire projectfiles directory
backup is compressed, in gzip format, in archive format
uses the stored variable to include the date in the tar filename
the backup goes to the /home/testuser/backup directory
create a log file called testuser.log with all messages generated by the tar command (using verbose mode)
save the log file in /home/testuser/backup/testuser.log
I'm having trouble with the command syntax and I don't quite understand what I'm doing wrong.
cd /home/testuser
mkdir -p /home/testuser/backup
today=$(date'+%d-%m-%y')
tar -zcvf testuserbackup-$today.tar.gz projectfiles &&
testuserbackup-$today.tar.gz /home/testuser/backup
testuserbackup-$today.tar.gz >> testuser.log 2>/dev/null
mv testuser.log /home/testuser/backup
When I try to run the script I get the following terminal output:
./script2.sh: line 6: date+%d-%m-%y: command not found
projectfiles/
projectfiles/budget/
projectfiles/budget/testuserbudget1.txt
projectfiles/budget/testuserbudget2.txt
projectfiles/old/
projectfiles/old/testuserold2.txt
projectfiles/old/testuserold1.txt
projectfiles/documents/
projectfiles/documents/testuserdoc2.txt
projectfiles/documents/testuserdoc1.txt
./script2.sh: line 7: testuserbackup-.tar.gz: command not found
I'm open to any suggestions. This task is from an old assignment from last semester that I'm revisiting for fun...
According to my old assignment notes this task should be able to be done in no more than 4 lines of code.
**EDIT:**Finished script (with assistance of John)
#!/bin/bash
mkdir -p /home/testuser/backup
today=$(date '+%d-%m-%y')
tar -zcvf backup/testuserbackup-"$today".tar.gz projectfiles >
backup/testuser.log 2>&1
You're missing a space:
today=$(date '+%d-%m-%y')
# ^
Additionally, these lines should all be combined:
tar -zcvf testuserbackup-$today.tar.gz projectfiles &&
testuserbackup-$today.tar.gz /home/testuser/backup
testuserbackup-$today.tar.gz >> testuser.log 2>/dev/null
mv testuser.log /home/testuser/backup
The log file needs to be created in the same line as the tar command, and making the tarball and the log file show up in the right location can be done by writing out their full paths. That gets rid of the need to move them later.
tar -zcvf backup/testuserbackup-"$today".tar.gz projectfiles > backup/testuser.log 2>&1
It's a good idea to capture stderr as well as stdout, so I changed 2>/dev/null to 2>&1.

Shell Script to redirect to different directory and create a list file

src_dir="/export/home/destination"
list_file="client_list_file.txt"
file=".csv"
echo "src directory="$src_dir
echo "list_file="$list_file
echo "file="$file
cd /export/home/destination
touch $list_file
x=`ls *$file | sort >$list_file`
if [ -s $list_file ]
then
echo "List File is available, archiving now"
y=`tar -cvf mystuff.tar $list_file`
else
echo "List File is not available"
fi
The above script is working fine and it's supposed to create a list file of all .csv files and tar's it.
However I am trying to do it from a different directory while running the script, so it should go to the destination directory and makes a list file with all the .csv in destination directory and make a .tar from the list file(i.e archive the list file)
So i am not sure what to change
there are a lot of tricks in filename handling. the one thing you should know is file naming under POSIX sucks. commands like ls or find may not return the expected result(but 99% of the time they will). so here is what you have to do to get the list of files truely:
for file in $src_dir/*.csv; do
echo `basename $file` >> $src_dir/$list_file
done
tar cvf $src_dir/mystuff.tar $src_dir/$list_file
maybe you should learn bash in a serious manner and try to google first before you asking question in SO next time.
http://www.gnu.org/software/bash/manual/html_node/index.html#SEC_Contents
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

Resources