Shell script to copy the files - shell

I worked very little with scripts and I don't know..
I need to create a script (in Ubuntu) that copies only files where a certain user modified more than 20 lines at a given time.
I know that to copy a file elsewhere I use this code:
$ ls dir1/
dir2/
$ cp -r dir1/ dir1.copy
$ ls dir1.copy
dir2/
And to count lines: wc -l file1
But how could I check if a user has modified more than 20 lines in a file (eg a simple txt, for example today)?
Thank you in advance !

In the first place, if by "modifying lines in a file" you mean "adding lines to a file", then you can do something about it. If you are literally talking about modifying lines in files, there is nothing you can do to track that activity without setting up some version control first.
So, assuming we are talking about files in which your users will be adding lines, a workaround for that may consist of setting up some scheduled tasks to check the line numbers of those files "at a given time" (as you said) and compare that value to a previous result, and then if there are more than 20 additional lines than from the last value, copy the files elsewhere.
First things first, counting the lines of your files is something you have already mentioned and it is right: I will propose using wc -l too.
Once here you will need two things: one place (tipically a file) to periodically save the number of lines of your files at a given time and one trigger that would start copying the files in case there have been more than 20 lines added.
So for example, in this case you can set up a cron job, like this (i.e. to run every hour):
0 */1 * * * cat ${FILE} |wc -l > /tmp/${FILE}_counter
That one will check the number of lines of a given file and send the output to a temporary file that we will be using soon. In case you have multiple files you can easily script that and make a loop, like this:
#!/bin/bash
for FILE in file1 file2 file3; do
cat ${FILE} |wc -l > /tmp/${FILE}_counter
done
Don't forget to add the path to the script in the cron job if you do that this way. After that, you will have something like this in your /tmp directory:
/tmp/file1_counter
/tmp/file2_counter
/tmp/file3_counter
...
At this point you only need a trigger, which can be another script, to compare the current number of lines of a file at a given time and start copying it elsewhere in case there are more than 20 additional lines than in the previous check. Consider this:
#!/bin/bash
LAST_VALUE=$(cat /tmp/${FILE}_counter)
CURRENT_VALUE=$(cat ${FILE} |wc -l)
if [ ${CURRENT_VALUE} -gt $(expr ${LAST_VALUE} + 20) ]; then
# Your cp stuff here
fi
Of course you can add a loop here too in case of handling multiple files:
#!/bin/bash
LAST_VALUE=$(cat /tmp/${FILE}_counter)
CURRENT_VALUE=$(cat ${FILE} |wc -l)
for FILE in file1 file2 file3; do
if [ ${CURRENT_VALUE} -gt $(expr ${LAST_VALUE} + 20) ]; then
# Your cp stuff here
fi
done
Then you only have to add this last script to a cron job too, and you should be done.
Hope you find this useful.

You can use diff to compare 2 files. With the -u0 option, it will show you the added/deleted/modified lines, prefixed with "+" or "-". You can then count lines starting with "+" or "-" with grep and it's -c option.
So for the number of lines added or modified, which begin with "+" :
diff -u0 $file_before $file_after | grep -c '^+'
and this will count the deleted or modified lines, which start with "-" :
diff -u0 $file_before $file_after | grep -c '^-'
Note that there are 2 header lines in this format, which also start with "+" and "-", so you may want to take that in account.

Related

Show only newly added lines of logfile in terminal

I use tail -f to show the contents of a logfile.
What I want is when the logfile content changes, instead of appending the new lines to my screen, only the newly added lines should be shown on my screen.
So as if a clearscreen was made every time before printing the new lines.
I tried to find a solution by web search but couldn't find anything useful.
edit:
In my case it happens that several lines will be added at once (it is a php error logfile). So I am looking for a solution where more than the single last line can be shown on screen.
The watch command in combination with the tail command shows the last line of a log file with the intervall of every 2 seconds. Basically it doesn't refresh whenever a new line is appended to the log file but since you could specifiy an intervall it might help you for your use case.
watch -t tail -1 <path_to_logfile>
If you need a faster intervall like every 0.5 seconds, then you could specify it with the 'n' option i.e.:
watch -t -n 0.5 tail -1 <path_to_logfile>
Try
$ watch 'tac FILE | grep -m1 -C2 PATTERN | tac'
where
PATTERN is any keyword (or regexp) to identify errors you seek in the log,
tac prints the lines in reverse,
-m is a max count of matching lines to grep,
-C is any number of lines of context (before and after the match) to show (optional).
That would be similar to
$ tail -f FILE | grep -C2 PATTERN
if you didn't mind just appending occurrences to the output in real-time.
But if you don't know any generic PATTERN to look for at all,
you'd have to just follow all the updates as the logfile grows:
$ tail -n0 -f FILE
Or even, create a copy of the logfile and then do a diff:
Copy: cp file.log{,.old}
Refresh the webpage with your .php code (or whatever, to trigger the error)
Run: diff file.log{,.old}
(or, if you prefer sort to diff: $ sort file.log{,.old} | uniq -u)
The curly braces is shorthand for both filenames (see Brace Expansion in $ man bash)
If you must avoid any temp copies, store the line count in memory:
z=$(grep -c ^ file.log)
Refresh the webpage to trigger an error
tail -n +$z file.log
The latter approach can be built upon, to create a custom scripting solution more suitable for your needs (check timestamps, clear screen, filter specific errors, etc). For example, to only show the lines that belong to the last error message in the log file updated in real-time:
$ clear; z=$(grep -c ^ FILE); while true; do d=$(date -r FILE); sleep 1; b=$(date -r FILE); if [ "$d" != "$b" ]; then clear; tail -n +$z FILE; z=$(grep -c ^ FILE); fi; done
where
FILE is, obviously, your log file name;
grep -c ^ FILE counts all lines in a file (that is almost, but not entirely unlike cat FILE|wc -l that would only count newlines);
sleep 1 sets the pause/delay between checking the file timestamps to 1 second, but you could change it to even a floating point number (the less the interval, the higher the CPU usage).
To simplify any repetitive invocations in future, you could save this compound command in a Bash script that could take a target logfile name as an argument, or define a shell function, or create an alias in your shell, or just reverse-search your bash history with CTRL+R. Hope it helps!

Executing a bash loop script from a file

I am trying to execute this in unix. So let's for example say I have five files named after dates, and in each of those files there are thousand of numerical values (six to ten digit number). Now, lets say I also have bunch of numerical values and I want to know which value belongs to which file.I am trying to do it the hard way like below but how do I put all my values in a file and just do a loop from there.
FILES:
20170101
20170102
20170103
20170104
20170105
Code:
for i in 5555555 67554363 564324323 23454657 666577878 345576867; do
echo $i; grep -l $i 201701*;
done
Or, why loop at all? If you have a file containing all your numbers (say numbers.txt you can find in which date file each are contained and on what line with a simple
grep -nH -w -f numbers.txt 201701*
Where the -f option simply tells grep to use the values contained in the file numbers.txt to search in each of the files matching 201701*. The -nH options for listing the line number and filename associated with each match, respectively. And as Ed points out below, the -w option to insure grep only select lines containing the whole word sought.
You can also do it with a while loop and read from the file if you create it as #Barmar suggested:
while read -r i; do
...
done < numbers.txt
Put the values in a file numbers.txt and do:
for i in $(cat numbers.txt); do
...
done

How to remove the path part from a list of files and copy it into another file?

I need to accomplish the following things with bash scripting in FreeBSD:
Create a directory.
Generate 1000 unique files whose names are taken from other random files in the system.
Each file must contain information about the original file whose name it has taken - name and size without the original contents of the file.
The script must show information about the speed of its execution in ms.
What I could accomplish was to take the names and paths of 1000 unique files with the commands find and grep and put them in a list. Then I just can't imagine how to remove the path part and create the files in the other directory with names taken from the list of random files. I tried a for loop with the basename command in it but somehow I can't get it to work and I don't know how to do the other tasks as well...
[Update: I've wanted to come back to this question to try to make my response more useful and portable across platforms (OS X is a Unix!) and $SHELLs, even though the original question specified bash and zsh. Other responses assumed a temporary file listing of "random" file names since the question did not show how the list was constructed or how the selection was made. I show one method for constructing the list in my response using a temporary file. I'm not sure how one could randomize the find operation "inline" and hope someone else can show how this might be done (portably). I also hope this attracts some comments and critique: you never can know too many $SHELL tricks. I removed the perl reference, but I hereby challenge myself to do this again in perl and - because perl is pretty portable - make it run on Windows. I will wait a while for comments and then shorten and clean up this answer. Thanks.]
Creating the file listing
You can do a lot with GNU find(1). The following would create a single file with the file names and three, tab-separated columns of the data you want (name of file, location, size in kilobytes).
find / -type f -fprintf tmp.txt '%f\t%h/%f\t%k \n'
I'm assuming that you want to be random across all filenames (i.e. no links) so you'll grab the entries from the whole file system. I have 800000 files on my workstation but a lot of RAM, so this doesn't take too long to do. My laptop has ~ 300K files and not much memory, but creating the complete listing still only took a couple minutes or so. You'll want to adjust by excluding or pruning certain directories from the search.
A nice thing about the -fprintf flag is that it seems to take care of spaces in file names. By examining the file with vim and sed (i.e. looking for lines with spaces) and comparing the output of wc -l and uniq you can get a sense of your output and whether the resulting listing is sane or not. You could then pipe this through cut, grep or sed, awk and friends in order to to create the files in the way you want. For example from the shell prompt:
~/# touch `cat tmp.txt |cut -f1`
~/# for i in `cat tmp.txt|cut -f1`; do cat tmp.txt | grep $i > $i.dat ; done
I'm giving the files we create a .dat extension here to distinguish them from the files to which they refer, and to make it easier to move them around or delete them, you don't have to do that: just leave off the extension $i > $i.
The bad thing about the -fprintf flag is that it is only available with GNU find and is not a POSIX standard flag so it won't be available on OS X or BSD find(1) (though GNU find may be installed on your Unix as gfind or gnufind). A more portable way to do this is to create a straight up list of files with find / -type f > tmp.txt (this takes about 15 seconds on my system with 800k files and many slow drives in a ZFS pool. Coming up with something more efficient should be easy for people to do in the comments!). From there you can create the data values you want using standard utilities to process the file listing as Florin Stingaciu shows above.
#!/bin/sh
# portably get a random number (OS X, BSD, Linux and $SHELLs w/o $RANDOM)
randnum=`od -An -N 4 -D < /dev/urandom` ; echo $randnum
for file in `cat tmp.txt`
do
name=`basename $file`
size=`wc -c $file |awk '{print $1}'`
# Uncomment the next line to see the values on STDOUT
# printf "Location: $name \nSize: $size \n"
# Uncomment the next line to put data into the respective .dat files
# printf "Location: $file \nSize: $size \n" > $name.dat
done
# vim: ft=sh
If you've been following this far you'll realize that this will create a lot of files - on my workstation this would create 800k of .dat files which is not what we want! So, how to randomly select 1000 files from our listing of 800k for processing? There's several ways to go about it.
Randomly selecting from the file listing
We have a listing of all the files on the system (!). Now in order to select 1000 files we just need to randomly select 1000 lines from our listing file (tmp.txt). We can set an upper limit of the line number to select by generating a random number using the cool od technique you saw above - it's so cool and cross-platform that I have this aliased in my shell ;-) - then performing modulo division (%) on it using the number of lines in the file as the divisor. Then we just take that number and select the line in the file to which it corresponds with awk or sed (e.g. sed -n <$RANDOMNUMBER>p filelist), iterate 1000 times and presto! We have a new list of 1000 random files. Or not ... it's really slow! While looking for a way to speed up awk and sed I came across an excellent trick using dd from Alex Lines that searches the file by bytes (instead of lines) and translates the result into a line using sed or awk.
See Alex's blog for the details. My only problems with his technique came with setting the count= switch to a high enough number. For mysterious reasons (which I hope someone will explain) - perhaps because my locale is LC_ALL=en_US.UTF-8 - dd would spit incomplete lines into randlist.txt unless I set count= to a much higher number that the actual maximum line length. I think I was probably mixing up characters and bytes. Any explanations?
So after the above caveats and hoping it works on more than two platforms, here's my attempt at solving the problem:
#!/bin/sh
IFS='
'
# We create tmp.txt with
# find / -type f > tmp.txt # tweak as needed.
#
files="tmp.txt"
# Get the number of lines and maximum line length for later
bytesize=`wc -c < $files`
# wc -L is not POSIX and we need to multiply so:
linelenx10=`awk '{if(length > x) {x=length; y = $0} }END{print x*10}' $files`
# A function to generate a random number modulo the
# number of bytes in the file. We'll use this to find a
# random location in our file where we can grab a line
# using dd and sed.
genrand () {
echo `od -An -N 4 -D < /dev/urandom` ' % ' $bytesize | bc
}
rm -f randlist.txt
i=1
while [ $i -le 1000 ]
do
# This probably works but is way too slow: sed -n `genrand`p $files
# Instead, use Alex Lines' dd seek method:
dd if=$files skip=`genrand` ibs=1 count=$linelenx10 2>/dev/null |awk 'NR==2 {print;exit}'>> randlist.txt
true $((i=i+1)) # Bourne shell equivalent of $i++ iteration
done
for file in `cat randlist.txt`
do
name=`basename $file`
size=`wc -c <"$file"`
echo -e "Location: $file \n\n Size: $size" > $name.dat
done
# vim: ft=sh
What I could accomplish was to take the names and paths of 1000 unique files with the commands "find" and "grep" and put them in a list
I'm going to assume that there is a file that holds on each line a full path to each file (FULL_PATH_TO_LIST_FILE). Considering there's not much statistics associated with this process, I omitted that. You can add your own however.
cd WHEREVER_YOU_WANT_TO_CREATE_NEW_FILES
for file_path in `cat FULL_PATH_TO_LIST_FILE`
do
## This extracts only the file name from the path
file_name=`basename $file_path`
## This grabs the files size in bytes
file_size=`wc -c < $file_path`
## Create the file and place info regarding original file within new file
echo -e "$file_name \nThis file is $file_size bytes "> $file_name
done

Create files using grep and wildcards with input file

This should be a no-brainer, but apparently I have no brain today.
I have 50 20-gig logs that contain entries from multiple apps, one of which addes a transaction ID to its log lines. I have 42 transaction IDs I need to review, and I'd like to parse out the appropriate lines into separate files.
To do a single file, the command would be simply,
grep CDBBDEADBEEF2020X02393 server.log* > CDBBDEADBEEF2020X02393.log
that creates a log isolated to that transaction, from all 50 server.logs.
Now, I have a file with 42 txnIDs (shortening to 4 here):
CDBBDEADBEEF2020X02393
CDBBDEADBEEF6548X02302
CDBBDE15644F2020X02354
ABBDEADBEEF21014777811
And I wrote:
#/bin/sh
grep $1 server.\* > $1.log
But that is not working. Changing the shebang to #/bin/bash -xv, gives me this weird output (obviously I'm playing with what the correct escape magic must be):
$ ./xtrakt.sh B7F6E465E006B1F1A
#!/bin/bash -xv
grep - ./server\.\*
' grep - './server.*
: No such file or directory
I have also tried the command line
grep - server.* < txids.txt > $1
But OBVIOUSLY that $1 is pointless and I have no idea how to get a file named per txid using the input redirect form of the command.
Thanks in advance for any ideas. I haven't gone the route of doing a foreach in the shell script, because I want grep to put the original filename in the output lines so I can examine context later if I need to.
Also - it would be great to have the server.* files ordered numerically (server.log.1, server.log.2 NOT server.log.1, server.log.10...)
try this:
while read -r txid
do
grep "$txid" server.* > "$txid.log"
done < txids.txt
and for the file ordering - rename files with one digit to two digit, with leading zeroes, e.g. mv server.log.1 server.log.01.

how to make a winmerge equivalent in linux

My friend recently asked how to compare two folders in linux and then run meld against any text files that are different. I'm slowly catching on to the linux philosophy of piping many granular utilities together, and I put together the following solution. My question is, how could I improve this script. There seems to be quite a bit of redundancy and I'd appreciate learning better ways to script unix.
#!/bin/bash
dir1=$1
dir2=$2
# show files that are different only
cmd="diff -rq $dir1 $dir2"
eval $cmd # print this out to the user too
filenames_str=`$cmd`
# remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different
tmp1=`echo "$filenames_str" | sed -n '/ differ$/p'`
# grab just the first filename for the lines of output
tmp2=`echo "$tmp1" | awk '{ print $2 }'`
# convert newlines sep to space
fs=$(echo "$tmp2")
# convert string to array
fa=($fs)
for file in "${fa[#]}"
do
# drop first directory in path to get relative filename
rel=`echo $file | sed "s#${dir1}/##"`
# determine the type of file
file_type=`file -i $file | awk '{print $2}' | awk -F"/" '{print $1}'`
# if it's a text file send it to meld
if [ $file_type == "text" ]
then
# throw out error messages with &> /dev/null
meld $dir1/$rel $dir2/$rel &> /dev/null
fi
done
please preserve/promote readability in your answers. An answer that is shorter but harder to understand won't qualify as an answer.
It's an old question, but let's work a bit on it just for fun, without thinking in the final goal (maybe SCM) nor in tools that already do this in a better way. Just let's focus in the script itself.
In the OP's script, there are a lot of string processing inside bash, using tools like sed and awk, sometimes more than once in the same command line or inside a loop executing n times (one per file).
That's ok, but it's necessary to remember that:
Each time the script calls any of those programs, it's created a new process in the OS, and that is expensive in time and resources. So the less programs are called, the better is the performance of script that is executing:
diff 2 times (1 just to print to user)
sed 1 time processing diff result and 1 time for each file
awk 1 time processing sed result and 2 times for each file (processing file result)
file 1 time for each file
That doesn't apply to echo, read, test and others that are builtin commands of bash, so no external program is executed.
meld is the final command that will display the files to user, so it doesn't count.
Even with the builtin commands, redirection pipelines | has a cost too, because the shell has to create pipes, duplicate handles, and maybe even creating forks of the shell (that is a process itself). So again: less is better.
The messages of diff command are locale dependants, so if the system is not in english, the whole script won't work.
Thinking that, let's clean a bit the original script, mantaining the OP's logic:
#!/bin/bash
dir1=$1
dir2=$2
# Set english as current language
LANG=en_US.UTF-8
# (1) show files that are different only
diff -rq $dir1 $dir2 |
# (2) remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different, delete all but left filename
sed '/ differ$/!d; s/^Files //; s/ and .*//' |
# (3) determine the type of file
file -i -f - |
# (4) for each file
while IFS=":" read file file_type
do
# (5) drop first directory in path to get relative filename
rel=${file#$dir1}
# (6) if it's a text file send it to meld
if [[ "$file_type" =~ "text/" ]]
then
# throw out error messages with &> /dev/null
meld ${dir1}${rel} ${dir2}${rel} &> /dev/null
fi
done
A little explaining:
Unique chain of commands cmd1 | cmd2 | ... where the output (stdout) of previous one is the input (stdin) of the next one.
Execute sed just once to execute 3 operations (separated with ;) in diff output:
Deleting lines ending with " differ"
Delete "Files " at the beginning of remaining lines
Delete from " and " to the end of remaining lines
Execute command file once to process the file list in stdin (option -f -)
Use the while bash sentence to read two values separated by : for each line line of stdin.
Use bash variable substitution to extract filename from a variable
Use bash test to compare a file type with a regular expression
For clarity reasons, I didn't considerate that file and directory names may have spaces. In such cases, both scripts will fail. To avoid that is necessary enclose in double quotes any reference to file/dir name variable.
I didn't use awk, because it is powerful enough that can replace almost the entire script ;-)

Resources