number and string manipulation in bash - bash

I wrote a little bash script (my first) that does the following:
sed -e 's/Alpha=0/Alpha=x/' -e 's/Beta=0/Beta=y/' <file.pov >tmpfile.pov
povray Width=480 Height=360 +Itmpfile.pov +Ofile_x_y.png
everything works as intented but now I would like to pack these two lines in a loop for x=0:30:180 and y=0:30:90 (edit: I mean all possible combinations of x in {0,30,60,90,120,180} and y in {0,30,60,90}).
So for example for x=60 and y=30 the code should behave like this:
sed -e 's/Alpha=0/Alpha=60/' -e 's/Beta=0/Beta=30/' <file.pov >tmpfile.pov
povray Width=480 Height=360 +Itmpfile.pov +Ofile_60_30.png
I am aware, it should not be too hard, but for some reason, I just couldnt work it out by myself.
Sorry to bother you with my newbie questions!

Use seq with for loops:
for x in `seq 0 30 180`
do for y in `seq 0 30 90`
do sed -e 's/Alpha=0/Alpha='$x'/' -e 's/Beta=0/Beta='$y'/' <file.pov >tmpfile.pov
fn="file_${x}_${y}.png"
povray Width=480 Height=360 +Itmpfile.pov +O${fn}
done done
seq is a great tool even if you don't need the numbers
for i in `seq 10`
do call_me_ten_times
done

here is a tricky bash hack, but it helps to reduce two nested for loop into a single while loop:
while read x y; do
sed -e 's/Alpha=0/Alpha='"$x"'/' -e 's/Beta=0/Beta='"$y"'/' <file.pov >"tmpfile${x}_$y.pov"
done < <(echo {0..180..30}' '{0..90..30} | tr ' ' '\n' | paste - -)
the process substitution at last generates all combinations of the range you specified.
one remaining problem should be the output file name. since you are overwriting the file every time. i think it would be better if you make the filename changes as it goes. perhaps in my way.

Related

How to use locale variable and 'delete last line' in sed

I am trying to delete the last three lines of a file in a shell bash script.
Since I am using local variables in combination with the Regex syntax in sed the answer proposed in How to use sed to remove the last n lines of a file does not cover this case. On the contrary, the cases covered deal with sed in a terminal and does not cover syntax in shell scripts, neither does it cover the use of variables in sed expressions.
The commands I have available is limited, since I am not on a Linux but use a MINGW64 for it.
sed does a create job so far, but deleting the last three lines gives me some headaches in relation of how to format the expression.
I use wc to be aware of how many lines the file has and subtract then with expr three lines.
n=$(wc -l < "$distribution_area")
rel=$(expr $n - 3)
The start point for deleting lines is defined by rel but accessing the local variable happens through the $ and unfortunately the syntax of sed is using the $ to define the end of file. Hence,
sed -i "$rel,$d" "$distribution_area"
won't work, and what ever variant of combinations e.g. '"'"$rel"'",$d' gives me sed: -e expression #1, char 1: unknown command: `"' or something similar.
Can somebody show me how to combine the variable with the $d regex syntax of sed?
sed -i "$rel,$d" "$distribution_area"
Here you're missing the variable name (n) for the second arg.
Consider the following example on a file called test that contains 1-10:
n=$(wc -l < test)
rel=$(($n - 3))
sed "$rel,$n d" test
Result:
1
2
3
4
5
6
To make sure the d will not interfere with the $n, you can add a space instead of escaping.
If you have a recent head available, I'd recommend something like:
head -n -3 test
Can somebody show me how to combine the variable with the $d regex syntax of sed?
$d expands to a varibale d, you have to escape it.
"$rel,\$d"
or:
"$rel"',$d'
But I would use:
head -n -3 "$distribution_area" > "$distribution_area".tmp
mv "$distribution_area".tmp "$distribution_area"
You can remove the last N lines using only pure Bash, without forking additional processes (such as sed). Such scripts look ugly, but they would work in any environment where only Bash runs and nothing else is available, no other binaries like sed, awk etc.
If the entire file fits in RAM, a straightforward solution is to split it by lines and print all but the N trailing ones:
delete_last_n_lines() {
local -ir n="$1"
local -a lines
readarray lines
((${#lines[#]} > n)) || return 0
printf '%s' "${lines[#]::${#lines[#]} - n}"
}
If the file does not fit in RAM, you can keep a FIFO buffer that stores N lines (N + 1 in the “implementation” below, but that’s just a technical detail), let the file (arbitrarily large) flow through the buffer and, after reaching the end of the file, not print out what remains in the buffer (the last N lines to remove).
delete_last_n_lines() {
local -ir n="$1 + 1"
local -a lines
local -i pos i
for ((i = 0; i < n; ++i)); do
IFS= read -r lines[i] || return 0
done
printf '%s\n' "${lines[pos]}"
while IFS= read -r lines[pos++]; do
((pos %= n))
printf '%s\n' "${lines[pos]}"
done
}
The following example gets 10 lines of input, 0 to 9, but prints out only 0 to 6, removing 7, 8 and 9 as desired:
printf '%s' {0..9}$'\n' | delete_last_n_lines 3
Last but not least, this simple hack lacks sed’s -i option to edit files in-place. That could be implemented (e.g.) using a temporary file to store the output and then renaming the temporary file to the original. (A more sophisticated approach would be needed to avoid storing the temporary copy altogether. I don’t think Bash exposes an interface like lseek() to read files “backwards”, so this cannot be done in Bash alone.)

Can I use sed to generate UUIDs inline instead of echo?

This bash snippet works to add a UUID and a tab character (\t) at the start of each line of a file.
while read; do
echo "$(uuidgen | tr 'A-Z' 'a-z')\t$REPLY"
done < sourcefile.tsv > temp_destination.tsv
(Note the reason for the pipe to TR is to convert them to lowercase in MacOS version of UUID-generation).
Although that performs well for smaller files, it doesn't seem efficient.
sed -i '' "s/^/$(uuidgen | tr 'A-Z' 'a-z')\t/" sourcefile.tsv
Again, using MacOS bash so the '' after the -i flag is required since I don't want a backup file.
I think sed would perform better, but I think I have to have the UUID generation in some sort of loop.
I'm just looking to make this faster and/or perform more efficiently. It's working, but it's pretty slow on a 20,000-line file, and all other attempts by me have stumped me.
EDIT: I tested my bash script just outputting the UUIDs to a while loop without any of the other subprocesses. With my configuration, I can generate about 250-300 per second, so updating a 20,000-line file will take a minimum of 72 seconds just because of the weak link of UUID generation. As described below, using Perl or Python will likely be faster.
EDIT 2: This little python script kill the bash script. This snippet only does part of what I need, but just for comparison, it generated about 200,000 UUIDs in a second, or 1,000,000 in 5 seconds compared to the 250-300 in the bash subprocess. Wow, what a difference.
#!/usr/bin/env python3
#this generates 1,000,000 UUIDs in about 5 seconds
import uuid
import sys
sys.stdout = open('lots-of-uuid.txt', 'w')
i = 1
while i < 1000000:
print(uuid.uuid4())
i +=1
sys.stdout.close()
Did you try something like that:
{
uuidgen | tr 'A-Z' 'a-z'
echo -n "\t"
cat 'sourcefile.tsv'
} > temp_destination.tsv
You may think it is not much different from your "read" version, but it is:
You don't capture the result of uuidgen
cat will probably perform faster than read + $REPLY
Try this out:
while read; do printf "%s\t%s\n" $(uuidgen) "$REPLY"; done < input.tsv > output.tsv
No monkeying around with building strings.
Using sed
$ sed -i '' 's/.*/printf &#;\Luuidgen/e;s/\([^#]*\)#\(.*\)/\2\t\1/' sourcefile.tsv
This might work for you (GNU sed):
sed -i 'h;s/.*/uuidgen/e;s/.*/\L&/;G;s/\n/\t/' file
Make a copy of the current line.
Replace the current line by an evaluated uuidgen command and convert the result to lowercase.
Append the copy and replace the newline by a tab.

BASH Palindrome Checker

This is my first time posting on here so bear with me please.
I received a bash assignment but my professor is completely unhelpful and so are his notes.
Our assignment is to filter and print out palindromes from a file. In this case, the directory is:
/usr/share/dict/words
The word lengths range from 3 to 45 and are supposed to only filter lowercase letters (the dictionary given has characters and uppercases, as well as lowercase letters). i.e. "-dkas-das" so something like "q-evvavve-q" may count as a palindrome but i shouldn't be getting that as a proper result.
Anyways, I can get it to filter out x amount of words and return (not filtering only lowercase though).
grep "^...$" /usr/share/dict/words |
grep "\(.\).\1"
And I can use subsequent lines for 5 letter words and 7 and so on:
grep "^.....$" /usr/share/dict/words |
grep "\(.\)\(.\).\2\1"
But the prof does not want that. We are supposed to use a loop. I get the concept but I don't know the syntax, and like I said, the notes are very unhelpful.
What I tried was setting variables x=... and y=.. and in a while loop, having x=$x$y but that didn't work (syntax error) and neither did x+=..
Any help is appreciated. Even getting my non-lowercase letters filtered out.
Thanks!
EDIT:
If you're providing a solution or a hint to a solution, the simplest method is prefered.
Preferably one that uses 2 grep statements and a loop.
Thanks again.
Like this:
for word in `grep -E '^[a-z]{3,45}$' /usr/share/dict/words`;
do [ $word == `echo $word | rev` ] && echo $word;
done;
Output using my dictionary:
aha
bib
bob
boob
...
wow
Update
As pointed out in the comments, reading in most of the dictionary into a variable in the for loop might not be the most efficient, and risks triggering errors in some shells. Here's an updated version:
grep -E '^[a-z]{3,45}$' /usr/share/dict/words | while read -r word;
do [ $word == `echo $word | rev` ] && echo $word;
done;
Why use grep? Bash will happily do that for you:
#!/bin/bash
is_pal() {
local w=$1
while (( ${#w} > 1 )); do
[[ ${w:0:1} = ${w: -1} ]] || return 1
w=${w:1:-1}
done
}
while read word; do
is_pal "$word" && echo "$word"
done
Save this as banana, chmod +x banana and enjoy:
./banana < /usr/share/dict/words
If you only want to keep the words with at least three characters:
grep ... /usr/share/dict/words | ./banana
If you only want to keep the words that only contain lowercase and have at least three letters:
grep '^[[:lower:]]\{3,\}$' /usr/share/dict/words | ./banana
The multiple greps are wasteful. You can simply do
grep -E '^([a-z])[a-z]\1$' /usr/share/dict/words
in one fell swoop, and similarly, put the expressions on grep's standard input like this:
echo '^([a-z])[a-z]\1$
^([a-z])([a-z])\2\1$
^([a-z])([a-z])[a-z]\2\1$' | grep -E -f - /usr/share/dict/words
However, regular grep does not permit backreferences beyond \9. With grep -P you can use double-digit backreferences, too.
The following script constructs the entire expression in a loop. Unfortunately, grep -P does not allow for the -f option, so we build a big thumpin' variable to hold the pattern. Then we can actually also simplify to a single pattern of the form ^(.)(?:.|(.)(?:.|(.)....\3)?\2?\1$, except we use [a-z] instead of . to restrict to just lowercase.
head=''
tail=''
for i in $(seq 1 22); do
head="$head([a-z])(?:[a-z]|"
tail="\\$i${tail:+)?}$tail"
done
grep -P "^${head%|})?$tail$" /usr/share/dict/words
The single grep should be a lot faster than individually invoking grep 22 or 43 times on the large input file. If you want to sort by length, just add that as a filter at the end of the pipeline; it should still be way faster than multiple passes over the entire dictionary.
The expression ${tail+:)?} evaluates to a closing parenthesis and question mark only when tail is non-empty, which is a convenient way to force the \1 back-reference to be non-optional. Somewhat similarly, ${head%|} trims the final alternation operator from the ultimate value of $head.
Ok here is something to get you started:
I suggest to use the plan you have above, just generate the number of "." using a for loop.
This question will explain how to make a for loop from 3 to 45:
How do I iterate over a range of numbers defined by variables in Bash?
for i in {3..45};
do
* put your code above here *
done
Now you just need to figure out how to make "i" number of dots "." in your first grep and you are done.
Also, look into sed, it can nuke the non-lowercase answers for you..
Another solution that uses a Perl-compatible regular expressions (PCRE) with recursion, heavily inspired by this answer:
grep -P '^(?:([a-z])(?=[a-z]*(\1(?(2)\2))$))++[a-z]?\2?$' /usr/share/dict/words

Fastest way to print a single line in a file

I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance).
There are many ways to do this, i manly use these 2
cat ${file} | head -1
or
cat ${file} | sed -n '1p'
I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?
Drop the useless use of cat and do:
$ sed -n '1{p;q}' file
This will quit the sed script after the line has been printed.
Benchmarking script:
#!/bin/bash
TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
echo "Lines in file: $j"
# create file containing j lines
seq 1 $j > file
# initial read of file
cat file > /dev/null
for comm in {0..3}
do
avg=0
echo
echo ${heading[$comm]}
for (( i=1; i<=$n; i++ ))
do
case $comm in
0)
t=$( { time head -1 file > /dev/null; } 2>&1);;
1)
t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
2)
t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
3)
t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
esac
avg=$avg+$t
done
echo "scale=3;($avg)/$n" | bc
done
done
Just save as benchmark.sh and run bash benchmark.sh.
Results:
head -1 file
.001
sed -n 1p file
.048
sed -n '1{p;q} file
.002
read line < file && echo $line
0
**Results from file with 1,000,000 lines.*
So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:
Note: timings are different from original post due to being on a faster Linux box.
If you are really just getting the very first line and reading hundreds of files, then consider shell builtins instead of external external commands, use read which is a shell builtin for bash and ksh. This eliminates the overhead of process creation with awk, sed, head, etc.
The other issue is doing timed performance analysis on I/O. The first time you open and then read a file, file data is probably not cached in memory. However, if you try a second command on the same file again, the data as well as the inode have been cached, so the timed results are may be faster, pretty much regardless of the command you use. Plus, inodes can stay cached practically forever. They do on Solaris for example. Or anyway, several days.
For example, linux caches everything and the kitchen sink, which is a good performance attribute. But it makes benchmarking problematic if you are not aware of the issue.
All of this caching effect "interference" is both OS and hardware dependent.
So - pick one file, read it with a command. Now it is cached. Run the same test command several dozen times, this is sampling the effect of the command and child process creation, not your I/O hardware.
this is sed vs read for 10 iterations of getting the first line of the same file, after read the file once:
sed: sed '1{p;q}' uopgenl20121216.lis
real 0m0.917s
user 0m0.258s
sys 0m0.492s
read: read foo < uopgenl20121216.lis ; export foo; echo "$foo"
real 0m0.017s
user 0m0.000s
sys 0m0.015s
This is clearly contrived, but does show the difference between builtin performance vs using a command.
If you want to print only 1 line (say the 20th one) from a large file you could also do:
head -20 filename | tail -1
I did a "basic" test with bash and it seems to perform better than the sed -n '1{p;q} solution above.
Test takes a large file and prints a line from somewhere in the middle (at line 10000000), repeats 100 times, each time selecting the next line. So it selects line 10000000,10000001,10000002, ... and so on till 10000099
$wc -l english
36374448 english
$time for i in {0..99}; do j=$((i+10000000)); sed -n $j'{p;q}' english >/dev/null; done;
real 1m27.207s
user 1m20.712s
sys 0m6.284s
vs.
$time for i in {0..99}; do j=$((i+10000000)); head -$j english | tail -1 >/dev/null; done;
real 1m3.796s
user 0m59.356s
sys 0m32.376s
For printing a line out of multiple files
$wc -l english*
36374448 english
17797377 english.1024MB
3461885 english.200MB
57633710 total
$time for i in english*; do sed -n '10000000{p;q}' $i >/dev/null; done;
real 0m2.059s
user 0m1.904s
sys 0m0.144s
$time for i in english*; do head -10000000 $i | tail -1 >/dev/null; done;
real 0m1.535s
user 0m1.420s
sys 0m0.788s
How about avoiding pipes?
Both sed and head support the filename as an argument. In this way you avoid passing by cat. I didn't measure it, but head should be faster on larger files as it stops the computation after N lines (whereas sed goes through all of them, even if it doesn't print them - unless you specify the quit option as suggested above).
Examples:
sed -n '1{p;q}' /path/to/file
head -n 1 /path/to/file
Again, I didn't test the efficiency.
I have done extensive testing, and found that, if you want every line of a file:
while IFS=$'\n' read LINE; do
echo "$LINE"
done < your_input.txt
Is much much faster then any other (Bash based) method out there. All other methods (like sed) read the file each time, at least up to the matching line. If the file is 4 lines long, you will get: 1 -> 1,2 -> 1,2,3 -> 1,2,3,4 = 10 reads whereas the while loop just maintains a position cursor (based on IFS) so would only do 4 reads in total.
On a file with ~15k lines, the difference is phenomenal: ~25-28 seconds (sed based, extracting a specific line from each time) versus ~0-1 seconds (while...read based, reading through the file once)
The above example also shows how to set IFS in a better way to newline (with thanks to Peter from comments below), and this will hopefully fix some of the other issue seen when using while... read ... in Bash at times.
For the sake of completeness you can also use the basic linux command cut:
cut -d $'\n' -f <linenumber> <filename>

Replacing specific lines in multiple files using bash

I'm relatively new to bash scripting, having started out of the need to manage my simulations on supercomputers. I'm currently stuck on writing a script to change specific lines in my pbs files.
There's 2 stages to my problem. First, I need to replace a number of lines in a text file (another script), and overwrite that file for my later use. The rough idea is:
Replace lines 27, 28 and 29 of 'filename005' with 'text1=000', 'text2=005' and 'text3=010'
Next, I'd like to do that recursively for a set of text files with numbered suffixes, and the numbering influences the replaced text.
My code so far is:
#!/bin/bash
for ((i = 1; i < 10; i++))
do
let NUM=i*5
let OLD=NUM-5
let NOW=NUM
let NEW=NUM+5
let FILE=$(printf "filename%03g" $NUM)
sed "27 c\text1=$OLD" $FILE
sed "28 c\text2=$NOW" $FILE
sed "29 c\text3=$NEW" $FILE
done
I know there are some errors in the last 4 lines of my code, and I'm still studying up on the proper way to implement sed. Appreciate any tips!
Thanks!
CS
Taking the first line of your specification:
Replace lines 27:29 of filename005, with text1=000; text2=005; text3=010
That becomes:
sed -e '27,29c\
text1=000\
text2=005\
text3=010' filename005
Rinse and repeat. The backslashes indicate to sed that the change continues. It's easier on yourself if your actual data lines do not need to end with backslashes.
You can play with:
seq 1 35 |
sed -e '27,29c\
text1=000\
text2=005\
text3=010'
to see what happens without risking damage to precious files. Given the specification lines, you could write a sed script to generate sed scripts from the specification (though I'd be tempted to use Perl or awk instead; indeed, I'd probably do the whole job in Perl).
Okay, I managed to get my code to work after finding out that for in-line replacement, I need to write it to a temporary file. So, with the recursive loop and multi-line replacement (and other small tweaks):
for ((i = 1; i < 10; i++ ))
do
let NUM=i*5
let OLD=NUM-5
let NOW=NUM
let NEW=NUM+5
FILE=`printf "filename%03g" $NUM`
sed -e "27,29 c\
OLD=$OLD\n\
NOW=$NOW\n\
NEW=$NEW" $FILE >temp.tmp && mv temp.tmp $FILE
done
Do let me know if there is a more elegant way to use sed in this context. Thanks again #Johnathan!

Resources