How cut one character after the dot in a shell script variables - shell

I need to remove all characters after the first one after the dot:
example:
the temperature is 28.34567 C°
I need only 28.3
I've tried with cut -d'.' -f1 but cut all after the dot..
Thanks a lot

If you are using Bash:
$ var=23.123
$ [[ $var =~ [0-9]*(\.[0-9]{,1})? ]] && echo ${BASH_REMATCH[0]}
Output:
23.1

You could add this line to your script if you have python3+
python -c "print(f\"{28.34567:.1f}\")"
This solution rounds the result (ceil)
Output:
28.3

There are a few ways to do this, each with its own issues. The trivial solution is sed. Something like:
$ echo "the temperature is 28.37567 C°" | sed -e 's/\([[:digit:]]\.[0-9]\)[0-9]*/\1/g'
the temperature is 28.3 C°
but you probably don't want truncation. Rounding is probably more appropriate, in which case:
$ echo "the temperature is 28.37567 C°" | awk '{$4 = sprintf("%.1f", $4)}1'
the temperature is 28.4 C°
but that's pretty fragile in matching the 4th field. You could add a loop to check all the fields, but this gives the idea. Also note that awk will squeeze all your whitespace.

2 brute force ways, depending on whether u wanna keep the celsius sign or not :
mawk '$!_=int(10*$_)/10' <<<'28.34567 C°'
28.3 C°
mawk '$!NF=int(10*$_)/10' <<<'28.34567 C°'
28.3

Related

How to loop a variable range in cut command

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.
My files are something like that:
File with 2 columns and no blank lines between lines (file1.txt):
NAME1 10
NAME2 25
NAME3 48
NAME4 66
File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
...or, more literally (for copy/paste to test):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
Desired resulting file, one sequence per line (result.txt):
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.
I tried the command:
for i in $(awk '{print $2}' file1.txt);
do
p1=$i;
p2=`expr "$1" + 10`
cut -c$p1-$2 file2.txt > result.txt;
done
I don't get any output or error message.
I also tried:
while read line; do
set $line
p2=`expr "$2" + 10`
cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt
This last command gives me an error message:
cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument
There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).
while read -r name index _; do
dd if=file2.txt bs=1 skip="$index" count=10 status=none
printf '\n'
done <file1.txt >result.txt
This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).
Using awk
$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
If file2.txt is not too large, then you can read it in memory,
and use Bash sub-strings to extract the desired ranges:
data=$(<file2.txt)
while read -r name index _; do
echo "${data:$index:10}"
done <file1.txt >result.txt
This will be much more efficient than running cut or another process for every single range definition.
(Thanks to #CharlesDuffy for the tip to read data without a useless cat, and the while loop.)
One way to solve it:
#!/bin/bash
while read line; do
pos=$(echo "$line" | cut -f2 -d' ')
x=$(head -c $(( $pos + 10 )) file2.txt | tail -c 10)
echo "$x"
done < file1.txt > result.txt
It's not the solution an experienced bash hacker would use, but it is very good for someone who is new to bash. It uses tools that are very versatile, although somewhat bad if you need high performance. Shell scripting is commonly used by people who rarely shell scripts, but knows a few commands and just wants to get the job done. That's why I'm including this solution, even if the other answers are superior for more experienced people.
The first line is pretty easy. It just extracts the numbers from file1.txt. The second line uses the very nice tools head and tail. Usually, they are used with lines instead of characters. Nevertheless, I print the first pos + 10 characters with head. The result is piped into tail which prints the last 10 characters.
Thanks to #CharlesDuffy for improvements.

Different output for pipe in script vs. command line

I have a directory with files that I want to process one by one and for which each output looks like this:
==== S=721 I=47 D=654 N=2964 WER=47.976% (1422)
Then I want to calculate the average percentage (column 6) by piping the output to AWK. I would prefer to do this all in one script and wrote the following code:
for f in $dir; do
echo -ne "$f "
process $f
done | awk '{print $7}' | awk -F "=" '{sum+=$2}END{print sum/NR}'
When I run this several times, I often get different results although in my view nothing really changes. The result is almost always incorrect though.
However, if I only put the for loop in the script and pipe to AWK on the command line, the result is always the same and correct.
What is the difference and how can I change my script to achieve the correct result?
Guessing a little about what you're trying to do, and without more details it's hard to say what exactly is going wrong.
for f in $dir; do
unset TEMPVAR
echo -ne "$f "
TEMPVAR=$(process $f | awk '{print $7}')
ARRAY+=($TEMPVAR)
done
I would append all your values to an array inside your for loop. Now all your percentages are in $ARRAY. It should be easy to calculate the average value, using whatever tool you like.
This will also help you troubleshoot. If you get too few elements in the array ${#ARRAY[#]} then you will know where your loop is terminating early.
# To get the percentage of all files
Percs=$(sed -r 's/.*WER=([[:digit:].]*).*/\1/' *)
# The divisor
Lines=$(wc -l <<< "$Percs")
# To change new lines into spaces
P=$(echo $Percs)
# Execute one time without the bc. It's easier to understand
echo "scale=3; (${P// /+})/$Lines" | bc

sed: replace a character only between two positions

Sorry for this apparently simple question, but spent too long trying to find the solution everywhere and trying different sed options.
I just need to replace all dots by commas in a text file, but just between two positions.
As an example, from:
1.3.5.7.9
to
1.3,5,7.9
So, replace . by , between positions 3 to 7.
Thanks!
EDITED: sorry, I pretended to simplify the problem, but as none of the first 3 answers work due to a lack of details in my question, let me go a bit deeper. The important point is replacing all dots by comas in an interval of positions without knowing the rest of the string:
Here some text. I don't want to change. 10.000 usd 234.566 usd Continuation text.
More text. No need to change this part. 345 usd 76.433 usd Text going on. So on.
This is a fixed width text file, in columns, and I need to change the international format of numbers, replacing dots by commas. I just know the initial and final positions where I need to search and eventually replace the dots. Obviously, not all figures have dots (only those over 1000).
Thanks.
Rewriting the answer after the clarification of the question:
This is hard to handle with sed only, but can be simplified with other standard utilities like cut and paste:
$ start=40
$ end=64
$ paste -d' ' <(cut -c -$((start-1)) example.txt) \
> <(cut -c $((start+1))-$((end-1)) example.txt | sed 'y/./,/') \
> <(cut -c $((end+1))- example.txt)
Here some text. I don't want to change. 10,000 usd 234,566 usd Continuation text.
More text. No need to change this part. 345 usd 76,433 usd Text going on. So on.
(> just mean continuation of the previous line. < are real). This of course is very inefficient, but conceptually simple.
I used all the +1 and -1 stuff to get rid of extra spaces. Not sure if you need it.
A pure sed solution (brace yourself):
$ sed "s/\(.\{${start}\}\)\(.\{$((end-start))\}\)/\1\n\2\n/;h;s/.*\n\(.*\)\n.*/\1/;y/./,/;G;s/^\(.*\)\n\(.*\)\n\(.*\)\n\(.*\)$/\2\1\4/" example.txt
Here some text. I don't want to change. 10,000 usd 234,566 usd Continuation text.
More text. No need to change this part. 345 usd 76,433 usd Text going on. So on.
GNU sed:
$ sed -r "s/(.{${start}})(.{$((end-start))})/\1\n\2\n/;h;s/.*\n(.*)\n.*/\1/;y/./,/;G;s/^(.*)\n(.*)\n(.*)\n(.*)$/\2\1\4/" example.txt
Here some text. I don't want to change. 10,000 usd 234,566 usd Continuation text.
More text. No need to change this part. 345 usd 76,433 usd Text going on. So on.
I try to simplify the regex, but it more permissive.
echo 1.3.5.7.9 | sed -r "s/^(...).(.).(..)/\1,\2,\3/"
1.3,5,7.9
PS: It doesn't work with BSD sed.
$ echo "1.3.5.7.9" |
gawk -v s=3 -v e=7 '{
print substr($0,1,s-1) gensub(/\./,",","g",substr($0,s,e-s+1)) substr($0,e+1)
}'
1.3,5,7.9
This is rather awkward to do in pure sed. If you're not strictly constrained to sed, I suggest using another tool to do this. Ed Morton's gawk-based solution is probably the least-awkward (no pun intended) way to solve this.
Here's an example of using sed to do the grunt work, but wrapped in a bash function for simplicity:
function transform () {
line=$1
start=$2
end=$3
# Save beginning and end of line
front=$(echo $line | sed -e "s/\(^.\{$start\}\).*$/\1/")
back=$(echo $line | sed -e "s/^.\{$end\}//")
# Translate characters
line=$(echo $line | sed -e 'y/\./,/')
# Restore unmodified beginning/end
echo $line | sed -e "s/^.\{$start\}/$front/" -e "s/\(^.\{$end\}\).*$/\1$back/"
}
Call this function like:
$ transform "1.3.5.7.9" 3 7
1.3,5,7.9
Thank you all.
What I found around (not my merit) as simple solutions are:
For fixed width files:
awk -F "" 'OFS="";{for (j=2;j<= 5;j++) if ($j==".") $j=","}'1
Will change all dots into commas from the 2nd position to the 5th.
For tab delimited fields files:
awk -F'\t' 'OFS="\t" {for (j=2;j<=5;j++) gsub(/\./,",",$j)}'1
Will change all dots into comas from the 2nd field to the 5th.
Hope that can help someone: couldn't imagine it would be so tough in the begining.

BashScripting: Reading out a specific variable

my question is actually rather easy, but I suck at bash scripting and google was no help either. So here is the problem:
I have an executable that writes me a few variables to stdout. Something like that:
MrFoo:~$ ./someExec
Variable1=5
Another_Weird_Variable=12
VARIABLENAME=42
What I want to do now is to read in a specific one of these variables (I already know its name), store the value and use it to give it as an argument to another executable.
So, a simple call like
./function2 5 // which comes from ./function2 Variable1 from above
I hope you understand the problem and can help me with it
With awk you can do something like this (this is for passing value of 1st variable)
./someExec | awk -F= 'NR==1{system("./function2 " $2)}'
or
awk -F= 'NR==1{system("./function2 " $2)}' <(./someExec)
Easiest way to go is probably to use a combination of shell and perl or ruby. I'll go with perl since it's what I cut my teeth on. :)
someExec.sh
#!/bin/bash
echo Variable1=5
echo Another_Weird_Variable=12
echo VARIABLENAME=42
my_shell_script.sh
#!/bin/bash
myVariable=`./someExec | perl -wlne 'print $1 if /Variable1=(.*)/'`
echo "Now call ./function2 $myVariable"
[EDIT]
Or awk, as Jaypal pointed out 58 seconds before I posted my answer. :) Basically, there are a lot of good solutions. Most importantly, though, make sure you handle both security and error cases properly. In both of the solutions so far, we're assuming that someExec will provide guaranteed well-formed and innocuous output. But, consider if someExec were compromised and instead provided output like:
./someExec
5 ; rm -rf / # Uh oh...
You can use awk like this:
./function2 $(./someExec | awk -F "=" '/Variable1/{print $2}')
which is equivalent to:
./function2 5
If you can make sure someExec's output is safe you can use eval.
eval $(./someExec)
./function2 $Variable1
You can use this very simple and straight forward way:
./exp1.sh | grep "Variable1" | awk -F "=" '{print $2}'
If you want to use only one variable from the file use the below
eval $(grep 'Variable1' ./someExec )
./function2 $Variable1
And, if you want to use all the variables of a file, use
eval $(./someExec)
./function2 $<FILE_VARIBALE_NAME>

Bash substring with pipes and stdin

My goal is to cut the output of a command down to an arbitrary number of characters (let's use 6). I would like to be able to append this command to the end of a pipeline, so it should be able to just use stdin.
echo "1234567890" | your command here
# desired output: 123456
I checked out awk, and I also noticed bash has a substr command, but both of the solutions I've come up with seem longer than they need to be and I can't shake the feeling I'm missing something easier.
I'll post the two solutions I've found as answers, I welcome any critique as well as new solutions!
Solution found, thank you to all who answered!
It was close between jcollado and Mithrandir - I will probably end up using both in the future. Mithrandir's answer was an actual substring and is easier to view the result, but jcollado's answer lets me pipe it to the clipboard with no EOL character in the way.
Do you want something like this:
echo "1234567890" | cut -b 1-6
What about using head -c/--bytes?
$ echo t9p8uat4ep | head -c 6
t9p8ua
I had come up with:
echo "1234567890" | ( read h; echo ${h:0:6} )
and
echo "1234567890" | awk '{print substr($0,1,6)}'
But both seemed like I was using a sledgehammer to hit a nail.
This might work for you:
printf "%.6s" 1234567890
123456
If your_command_here is cat:
% OUTPUT=t9p8uat4ep
% cat <<<${OUTPUT:0:6}
t9p8ua

Resources