I have a bash script that I'm trying to put together that finds all of the images in a folder and then puts the names of those files into a pre-formatted CSV. I actually have the more complicated parts of it figured out and working well... I'm stuck on a really basic part. I have a variable that I need to increment for each file found, simple enough right? I've tried a bunch of different things and cannot for the life of me get it to increment. Here's the script I'm working with:
EDITED to show less context
i=0
find "$(pwd)" -maxdepth 1 -type f -exec file {} \; | awk -F: '{if ($2 ~/image/) print $1}' | grep -o -E '[^/]*$' | sed -e "s/^/$((++i))/" > "$(pwd)/inventory-$(date +%Y%m%d)-$(date +%I%M).csv"
I've tried incrementing it with i++, i=+1, i=i+1 as well as putting the dollar sign before the different iterations of the i variable... nothing seems to actually increment the variable. My best guess is that this isn't a true loop so it doesn't save the changes to the variable? Any guidance would be greatly appreciated!
The $((++i)) is performed by your shell. But the shell executes this line only once. This line cannot do what you need.
I would increment in inside awk, print it alongside the file name, and then combine output in the further commands.
Related
I have a script that takes a snapshot of the contents of a directory and then sleeps for a certain amount of time.
After it wakes up, it should accumulate the sum of the size of the files DELETED, not shrunk or grown.
My approach to this was something like this:
FILES_SNAPSHOT="$(ls -l | awk 'NR>1 {print $5,$9}')"
echo "Sleeping for $1 seconds..."
sleep $1
FILES_CURRENT="$(ls -l | awk 'NR>1 {print $5,$9}')"
So the snapshot basically stores this:
762 filename.sh
16 anotherfile.sh
...
the first column is the size of the file, second name of the file.
after I store this, the script sleeps, and I take another snapshot.
Now, I need to compare the 2nd column of my 2 variables and check for missing filenames, that have been deleted.
After, I need to take these names and match them to the size, using the first var I declared where I still have this information.
I am not quite sure, but I assume these variables just store a regular string that is formatted nicely, now how can I iterate over both strings I store, and know which file is missing? should I use a loop to iterate and then inside the loop should I look for index -1 for missing files?
Or should I use awk somehow, in a combination with merging these variables?
Any help would be much appreciated, thanks in advance.
init:
snap1=$(mktemp)
snap2=$(mktemp)
find -maxdepth 1 -type f -printf "%p\t%k\n" | sort > $snap2
loop:
cp $snap2 $snap1
sleep 60
find -maxdepth 1 -type f -printf "%p\t%k\n" | sort > $snap2
awk 'BEGIN {FS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {delete a[$1]}
END {for(k in a) {sum+=a[k]}; print sum+0}' $snap1 $snap2
print total size of deleted files in kb.
end:
unlink $snap1
unlink $snap2
Using ls is not recommended in scripts, use find instead as shown. Creating files is no concern in Unix systems, you can avoid it if you like to but will just make it harder. I'm not sure how your script is currently written but the loop content can be replaced with what I suggested. You just need to initialize one of the snapshots and at the end delete both temp files.
I have a small script which basically generates a menu of all the scripts in my ~/scripts folder and next to each of them displays a sentence describing it, that sentence being the third line within the script commented out. I then plan to pipe this into fzf or dmenu to select it and start editing it or whatever.
1 #!/bin/bash
2
3 # a script to do
So it would look something like this
foo.sh a script to do X
bar.sh a script to do Y
Currently I have it run a for loop over all the files in the scripts folder and then run sed -n 3p on all of them.
for i in $(ls -1 ~/scripts); do
echo -n "$i"
sed -n 3p "~/scripts/$i"
echo
done | column -t -s '#' | ...
I was wondering if there is a more efficient way of doing this that did not involve a for loop and only used sed. Any help will be appreciated. Thanks!
Instead of a loop that is parsing ls output + sed, you may try this awk command:
awk 'FNR == 3 {
f = FILENAME; sub(/^.*\//, "", f); print f, $0; nextfile
}' ~/scripts/* | column -t -s '#' | ...
Yes there is a more efficient way, but no, it doesn't only use sed. This is probably a silly optimization for your use case though, but it may be worthwhile nonetheless.
The inefficiency is that you're using ls to read the directory and then parse its output. For large directories, that causes lots of overhead for keeping that list in memory even though you only traverse it once. Also, it's not done correctly, consider filenames with special characters that the shell interprets.
The more efficient way is to use find in combination with its -exec option, which starts a second program with each found file in turn.
BTW: If you didn't rely on line numbers but maybe a tag to mark the description, you could also use grep -r, which avoids an additional process per file altogether.
This might work for you (GNU sed):
sed -sn '1h;3{H;g;s/\n/ /p}' ~/scripts/*
Use the -s option to reset the line number addresses for each file.
Copy line 1 to the hold space.
Append line 3 to the hold space.
Swap the hold space for the pattern space.
Replace the newline with a space and print the result.
All files in the directory ~/scripts will be processed.
N.B. You may wish to replace the space delimiter by a tab or pipe the results to the column command.
New to UNIX, currently learning UNIX via secureshell in a class. We've been given a few basic assignments such as creating loops and finding files. Our last assignment asked us to
write code that will estimate the number of shell scripts in the current directory and then print out that total number as "Estimated number of shell script files in this directory:"
Unlike in our previous assignments we are now allowed to use conditional loops, we are encouraged to use grep and wc statements.
On a basic level I know I can enter
ls * .sh
to find all shell scripts in the current directory. Unfortunately, this doesn't estimate the total number or use grep. Hence my question, I imagine he wants us to go
grep -f .sh (or something)
but I'm not exactly sure if I am on the right path and would greatly appreciate any help.
Thank You
You can do it like:
echo "Estimated number of shell script files in this directory:" `ls *.sh | wc -l`
I'd do it this way:
find . -executable -execdir file {} + | egrep '\.sh: | Bourne| bash' | wc -l
Find all files in the current directory (.) which are executable.
For each file, run the file(1) command, which tries to guess what type of file it is (not perfect).
Grep for known patterns: filenames ending with .sh, or file types containing "Bourne" or "bash".
Count lines.
Huhu, there's a trap, .sh file are not always shell script as the extension is not mandatory.
What tells you this is a shell script will be the Shebang #!/bin/*sh ( I put a * as it could be bash, csh, tcsh, zsh, which are shells) at top of line, hence the hint to use grep, so the best answer would be:
grep '^#!/bin/.*sh' * | wc -l
This give output:
sensible-pager:#!/bin/sh
service:#!/bin/sh
shelltest:#!/bin/bash
smbtar:#!/bin/sh
grep works with regular expression by default, so the match #!/bin/.*sh will match files with a line starting (the ^) by #!/bin/ followed by 0 or unlimited characters .* followed by sh
You may test regex and get explanation of them on http://regex101.com
Piping the result to wc -l to get the number of files containing this.
To display the result, backticks or $() in an echo line is ok.
grep -l <string> *
will return a list of all files that contain in the current directory. Pipe that output into wc -l and you have your answer.
Easiest way:
ls | grep .sh > tmp
wc tmp
That will print the number of lines, bytes and charcters of 'tmp' file. But in 'tmp' there's a line for each *.sh file in your working directory. So the number of lines will give an estimated number of shell scripts you have.
wc tmp | awk '{print $1}' # Using awk to filter that output like...
wc -l tmp # Which it returns the number of lines follow by the name of file
But as many people say, the only certain way to know a file is a shell script is by taking a look at the first line an see if there is #!/bin/bash. If you wanna develop it that way, keep in mind:
cat possible_script.x | head -n1 # That will give you the first line.
I have a directory with files that I want to process one by one and for which each output looks like this:
==== S=721 I=47 D=654 N=2964 WER=47.976% (1422)
Then I want to calculate the average percentage (column 6) by piping the output to AWK. I would prefer to do this all in one script and wrote the following code:
for f in $dir; do
echo -ne "$f "
process $f
done | awk '{print $7}' | awk -F "=" '{sum+=$2}END{print sum/NR}'
When I run this several times, I often get different results although in my view nothing really changes. The result is almost always incorrect though.
However, if I only put the for loop in the script and pipe to AWK on the command line, the result is always the same and correct.
What is the difference and how can I change my script to achieve the correct result?
Guessing a little about what you're trying to do, and without more details it's hard to say what exactly is going wrong.
for f in $dir; do
unset TEMPVAR
echo -ne "$f "
TEMPVAR=$(process $f | awk '{print $7}')
ARRAY+=($TEMPVAR)
done
I would append all your values to an array inside your for loop. Now all your percentages are in $ARRAY. It should be easy to calculate the average value, using whatever tool you like.
This will also help you troubleshoot. If you get too few elements in the array ${#ARRAY[#]} then you will know where your loop is terminating early.
# To get the percentage of all files
Percs=$(sed -r 's/.*WER=([[:digit:].]*).*/\1/' *)
# The divisor
Lines=$(wc -l <<< "$Percs")
# To change new lines into spaces
P=$(echo $Percs)
# Execute one time without the bc. It's easier to understand
echo "scale=3; (${P// /+})/$Lines" | bc
Straight to the point, I'm wondering how to use grep/find/sed/awk to match a certain string (that ends with a number) and increment that number by 1. The closest I've come is to concatenate a 1 to the end (which works well enough) because the main point is to simply change the value. Here's what I'm currently doing:
find . -type f | xargs sed -i 's/\(\?cache_version\=[0-9]\+\)/\11/g'
Since I couldn't figure out how to increment the number, I captured the whole thing and just appended a "1". Before, I had something like this:
find . -type f | xargs sed -i 's/\?cache_version\=\([0-9]\+\)/?cache_version=\11/g'
So at least I understand how to capture what I need.
Instead of explaining what this is for, I'll just explain what I want it to do. It should find text in any file, recursively, based on the current directory (isn't important, it could be any directory, so I'd configure that later), that matches "?cache_version=" with a number. It will then increment that number and replace it in the file.
Currently the stuff I have above works, it's just that I can't increment that found number at the end. It would be nicer to be able to increment instead of appending a "1" so that the future values wouldn't be "11", "111", "1111", "11111", and so on.
I've gone through dozens of articles/explanations, and often enough, the suggestion is to use awk, but I cannot for the life of me mix them. The closest I came to using awk, which doesn't actually replace anything, is:
grep -Pro '(?<=\?cache_version=)[0-9]+' . | awk -F: '{ print "match is", $2+1 }'
I'm wondering if there's some way to pipe a sed at the end and pass the original file name so that sed can have the file name and incremented number (from the awk), or whatever it needs that xargs has.
Technically, this number has no importance; this replacement is mainly to make sure there is a new number there, 100% for sure different than the last. So as I was writing this question, I realized I might as well use the system time - seconds since epoch (the technique often used by AJAX to eliminate caching for subsequent "identical" requests). I ended up with this, and it seems perfect:
CXREPLACETIME=`date +%s`; find . -type f | xargs sed -i "s/\(\?cache_version\=\)[0-9]\+/\1$CXREPLACETIME/g"
(I store the value first so all files get the same value, in case it spans multiple seconds for whatever reason)
But I would still love to know the original question, on incrementing a matched number. I'm guessing an easy solution would be to make it a bash script, but still, I thought there would be an easier way than looping through every file recursively and checking its contents for a match then replacing, since it's simply incrementing a matched number...not much else logic. I just don't want to write to any other files or something like that - it should do it in place, like sed does with the "i" option.
I think finding file isn't the difficult part for you. I therefore just go to the point, to do the +1 calculation. If you have gnu sed, it could be done in this way:
sed -r 's/(.*)(\?cache_version=)([0-9]+)(.*)/echo "\1\2$((\3+1))\4"/ge' file
let's take an example:
kent$ cat test
ello
barbaz?cache_version=3fooooo
bye
kent$ sed -r 's/(.*)(\?cache_version=)([0-9]+)(.*)/echo "\1\2$((\3+1))\4"/ge' test
ello
barbaz?cache_version=4fooooo
bye
you could add -i option if you like.
edit
/e allows you to pass matched part to external command, and do substitution with the execution result. Gnu sed only.
see this example: external command/tool echo, bc are used
kent$ echo "result:3*3"|sed -r 's/(result:)(.*)/echo \1$(echo "\2"\|bc)/ge'
gives output:
result:9
you could use other powerful external command, like cut, sed (again), awk...
Pure sed version:
This version has no dependencies on other commands or environment variables.
It uses explicit carrying. For carry I use the # symbol, but another name can be used if you like. Use something that is not present in your input file.
First it finds SEARCHSTRING<number> and appends a # to it.
It repeats incrementing digits that have a pending carry (that is, have a carry symbol after it: [0-9]#)
If 9 was incremented, this increment yields a carry itself, and the process will repeat until there are no more pending carries.
Finally, carries that were yielded but not added to a digit yet are replaced by 1.
sed "s/SEARCHSTRING[0-9]*[0-9]/&#/g;:a {s/0#/1/g;s/1#/2/g;s/2#/3/g;s/3#/4/g;s/4#/5/g;s/5#/6/g;s/6#/7/g;s/7#/8/g;s/8#/9/g;s/9#/#0/g;t a};s/#/1/g" numbers.txt
This perl command will search all files in current directory (without traverse it, you will need File::Find module or similar for that more complex task) and will increment the number of a line that matches cache_version=. It uses the /e flag of the regular expression that evaluates the replacement part.
perl -i.bak -lpe 'BEGIN { sub inc { my ($num) = #_; ++$num } } s/(cache_version=)(\d+)/$1 . (inc($2))/eg' *
I tested it with file in current directory with following data:
hello
cache_version=3
bye
It backups original file (ls -1):
file
file.bak
And file now with:
hello
cache_version=4
bye
I hope it can be useful for what you are looking for.
UPDATE to use File::Find for traversing directories. It accepts * as argument but will discard them with those found with File::Find. The directory to begin the search is the current of execution of the script. It is hardcoded in the line find( \&wanted, "." ).
perl -MFile::Find -i.bak -lpe '
BEGIN {
sub inc {
my ($num) = #_;
++$num
}
sub wanted {
if ( -f && ! -l ) {
push #ARGV, $File::Find::name;
}
}
#ARGV = ();
find( \&wanted, "." );
}
s/(cache_version=)(\d+)/$1 . (inc($2))/eg
' *
This is ugly (I'm a little rusty), but here's a start using sed:
orig="something1" ;
text=`echo $orig | sed "s/\([^0-9]*\)\([0-9]*\)/\1/"` ;
num=`echo $orig | sed "s/\([^0-9]*\)\([0-9]*\)/\2/"` ;
echo $text$(($num + 1))
With an original filename ($orig) of "something1", sed splits off the text and numeric portions into $text and $num, then these are combined in the final section with an incremented number, resulting in something2.
Just a start since it doesn't consider cases with numbers within the file name or names with no number at the end, but hopefully helps with your original goal of using sed.
This can actually be simplified within sed by using buffers, I believe (sed can operate recursively), but I'm really rusty with that aspect of it.
perl -pi -e 's/(\?cache_version=)(\d+)/$1.($2+1)/ge' FILE [FILE...]
or for a complete solution:
find . -type f | xargs perl -pi -e 's/(\?cache_version=)(\d+)/$1.($2+1)/ge'
perl substitution operator
/e modifier evaluates the replacement as if it were a Perl statement, using its return value as the replacement text.
. operator concatenates strings in Perl. The parentheses ensures that the arithmetic operation $2+1 takes precedence over concatenation.
/g modifier applies substitution to all matched strings within line
perl options
-p ensures that perl will execute the command on every line of each file
-i ensures that each file will be edited inplace
-e specifies the perl command(s) that are executed (in this case, the substitution operation)