Multiplying all values in a txt file with another value - bash

My aim is to multiply all values in a text file with a number. In my case it is 1000.
Original text in file:
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
I want the output to look like:
(so, changing the contents of the file to...)
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
Or even rather:
4.9
43.8
149.7
443.1
882.0
975.7
995.7
1000
I am using bash on macOS in the terminal.

If you have dc :
cat infile | dc -f - -e '1k1000sa[la*Sdz0!=Z]sZzsclZx[Ld1/psblcd1-sc1<Y]sYlYx'

Using Perl
perl -lpe ' $_=$_*1000 '
with inputs and inline replacing
$ cat andy.txt
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
$ perl -i -lpe ' $_=$_*1000 ' andy.txt
$ cat andy.txt
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
$
One decimal place
perl -lpe ' $_=sprintf("%0.1f",$_*1000 ) '
Zero decimal place and rounding off
perl -lpe ' $_=sprintf("%0.0f",$_*1000 ) '
Zero decimal place and Truncating
perl -lpe ' $_=sprintf("%0.0f",int($_*1000) ) '

awk to the rescue!
$ awk '{printf "%.1f\n", $1*1000}' file > tmp && mv tmp file

Using num-utils. For answers to 8 decimal places:
numprocess '/*1000/' n.txt
For rounded answers to 1 decimal place:
numprocess '/*1000/' n.txt | numround -n '.1'

Use sed to prefix each line with 1000*, then process the resulting mathematical expressions with bc. To show only the first digit after the decimal point you can use sed again.
sed 's/^/1000*/' yourFile | bc | sed -E 's/(.*\..).*/\1/'
This will print the latter of your expected outputs. Just as you wanted, decimals are cut rather than rounded (1.36 is converted to 1.3).
To remove all decimal digits either replace the last … | sed … with sed -E 's/\..*//' or use the following command
sed 's:^.*$:1000*&/1:' yourFile | bc
With these commands overwriting the file directly is not possible. You have to write to a temporary file (append > tmp && mv tmp yourFile) or use the sponge command from the package moreutils (append | sponge yourFile).
However, if you want to remove all decimal points after the multiplication there is a trick. Instead of actually multiplying by 1000 we can syntactically shift the decimal point. This can be done in one single sed command. sed has the -i option to overwrite input files.
sed -i.bak -E 's/\..*/&000/;s/^[^.]*$/&.000/;s/\.(...).*/\1/;s/^(-?)0*(.)/\1\2/' yourFile
The command changes yourFile's content to
4
43
149
443
882
975
995
1000
A backup yourFile.bak of the original is created.
The single sed command should work with every input number format too (even for things like -.1 → -100).

Related

How to insert a random string into the sed 's//' replace string for each line actioned?

This:
echo " " | tr ' ' '\n' | sed "s|.*|$RANDOM|"
Or this:
echo " " | tr ' ' '\n' | sed "s|.*|$(echo $RANDOM)|"
Will print a list of 5 numbers (space is replaced by newline, sed replaces each line with $RANDOM), and those 5 numbers will all be the same:
$ echo " " | tr ' ' '\n' | sed "s|.*|$(echo $RANDOM)|"
21590
21590
21590
21590
21590
This is because the replace of $RANDOM to a random value happens only once.
What is the easiest and shortest way, preferably by only using sed, to actually print a different random number for each line. i.e. to provide some "secondary input stream" which differs for each line that sed handles?
I could do it with xargs, but I wonder if there is a way to do it with sed only.
No, it is impossible to do this with sed only as sed has no capability to generate random numbers. It's not even sed that's doing it in your posted example, it's the shell generating a random number before sed is called. All sed sees is "s|.*|21590|".
If you want to do this using standard tools in any shell on any UNIX box then you can do this with any awk:
$ echo " " | tr ' ' '\n' | awk '{sub(/.*/,rand())}1'
0.924046
0.593909
0.306394
0.578941
0.740133
See https://www.gnu.org/software/gawk/manual/gawk.html#Numeric-Functions for more info on rand() and I can provide C&V if you're interested in pursuing an awk solution.
One way to address your comment below:
$ seq 5 | awk -v seed="$RANDOM" 'BEGIN{srand(seed)} {print rand()}'
0.0873814
0.536876
0.535788
0.881146
0.354652
Depends on your version of sed, but -
$: cat x
a
b
c
d
e
$: sed 's/.*/printf "%05d\n" $RANDOM/e' x
30181
30514
24742
28555
26267
The e means execute, and spawns a subshell, so be wary of the cost.
addendum
You can format output. printf "%07.7s\n" $(($RANDOM%10000))$(($RANDOM%10000)) will always print 7 digits with leading zero pad, and ameliorate the constraints from the default range, which makes 10000 numbers that starts with 1, 10000 numbers that start with 2, and 2,767 that start with 3 out of a total of 32,767 aside from those in the normal 0-9999 range - not an ideal distribution. You might also just use awk for better random numbers.
This might work for you (GNU sed and Bash):
seq 5 | sed 's/.*/echo $RANDOM/e'
This replaces the numbers 1 to 5 by echo $RANDOM and evaluates each line.
N.B. GNU sed uses /bin/sh which may be directed via a symbolic link to /bin/dash in which case the environmental variable $RANDOM will not be found. There are various solutions to get around this but probably the easiest (though possibly dangerous) is:
sudo ln -sf /bin/bash /bin/sh
# run sed
sudo ln -sf /bin/dash /bin/sh
Another alternaive:
shuf -i 1-100000 -n 5
Question:
What is the easiest and shortest way, preferably by only using sed, to actually print a different random number for each line?
Answer:
echo " " | tr ' ' '\n' | perl -pe 's|.*|int(rand(89999))+10000|e'
or simply:
perl -e 'printf "%05s\n",int(rand(100000)) for (1..5)'
Explanation:
I don't see why one shoudn't use Perl, or any tool that's possibly more suitable for the job. In the OP there's no hint of a reason to stick to sed, which is on the contrary only denoted as a preference.
I wonder why one would go through a substitution in order to print five random numbers. No details are given about the bigger picture, but it certainly makes me curious.
If you want the same random number
random=$(cat /dev/urandom | tr -cd '0-9' | head -c 5)
echo -e "$random\n$random\n$random\n$random\n$random"
If you want a different number each time. The problem here is its harder to set how many characters you get
echo -e "$RANDOM\n$RANDOM\n$RANDOM\n$RANDOM\n$RANDOM"
Or more logically for the same number
random=$(cat /dev/urandom | tr -cd '0-5' | head -c 8)
x=0
while (($x < 5)); do echo "$random";((x++));done
or more logically for a different number
x=0
while (($x < 5)); do random=$(cat /dev/urandom | tr -cd '0-5' | head -c 8); echo "$random";((x++));done
I found a solution in the following:
$ seq 5 | sed "s/.*/date +%N | grep -o '[0-9][0-9][0-9][0-9][0-9]$'/e"
16794
76358
17143
60690
70506
It uses the last 5 numbers of the nanosecond timer to generate a reasonably random entropy quality random number.

Grep - Getting the character position in the line of each occurrence

According to the manual, the option -b can give the byte offset of a given occurence, but it seems to start from the beginning of the parsed content.
I need to retrieve the position of each matching content returned by grep. I used this line, but it's quite ugly:
grep '<REGEXP>' | while read -r line ; do echo $line | grep -bo '<REGEXP>' ; done
How to get it done in a more elegant way, with a more efficient use of GNU utils?
Example:
$ echo "abcdefg abcdefg" > test.txt
$ grep 'efg' | while read -r line ; do echo $line | grep -bo 'efg' ; done < test.txt
4:efg
12:efg
(Indeed, this command line doesn't output the line number, but it's not difficult to add it.)
With any awk (GNU or otherwise) in any shell on any UNIX box:
$ awk -v re='efg' -v OFS=':' '{
end = 0
while( match(substr($0,end+1),re) ) {
print NR, end+=RSTART, substr($0,end,RLENGTH)
end+=RLENGTH-1
}
}' test.txt
1:5:efg
1:13:efg
All strings, fields, array indices in awk start at 1, not zero, hence the output not looking like yours since to awk your input string is:
123456789012345
abcdefg abcdefg
rather than:
012345678901234
abcdefg abcdefg
Feel free to change the code above to end+=RSTART-1 and end+=RLENGTH if you prefer 0-indexed strings.
Perl is not a GNU util, but can solve your problem nicely:
perl -nle 'print "$.:$-[0]" while /efg/g'

bash - how do I use 2 numbers on a line to create a sequence

I have this file content:
2450TO3450
3800
4500TO4560
And I would like to obtain something of this sort:
2450
2454
2458
...
3450
3800
4500
4504
4508
..
4560
Basically I would need a one liner in sed/awk that would read the values on both sides of the TO separator and inject those in a seq command or do the loop on its own and dump it in the same file as a value per line with an arbitrary increment, let's say 4 in the example above.
I know I can use several one temp file, go the read command and sorts, but I would like to do it in a one liner starting with cat filename | etc. as it is already part of a bigger script.
Correctness of the input is guaranteed so always left side of TOis smaller than bigger side of it.
Thanks
Like this:
awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}' file
or, if you like starting with cat:
cat file | awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}'
Something like this might work:
awk -F TO '{system("seq " $1 " 4 " ($2 ? $2 : $1))}'
This would tell awk to system (execute) the command seq 10 4 10 for lines just containing 10 (which outputs 10), and something like seq 10 4 40 for lines like 10TO40. The output seems to match your example.
Given:
txt="2450TO3450
3800
4500TO4560"
You can do:
echo "$txt" | awk -F TO '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i++) print i}'
If you want an increment greater than 1:
echo "$txt" | awk -F TO -v p=4 '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i+=p) print i}'
Give a try to this:
sed 's/TO/ /' file.txt | while read first second; do if [ ! -z "$second" ] ; then seq $first 4 $second; else printf "%s\n" $first; fi; done
sed is used to replace TO with space char.
read is used to read the line, if there are 2 numbers, seq is used to generate the sequence. Otherwise, the uniq number is printed.
This might work for you (GNU sed):
sed -r 's/(.*)TO(.*)/seq \1 4 \2/e' file
This evaluates the RHS of the substitution command if the LHS contains TO.

how to delete a large number of lines from a file

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks
A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'
perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines
You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.
If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile
You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!

kind of tranpose needed of a file with inconsistent number of columns in each row

I have a tab delimited file (in which number of columns in each row is not fixed) which looks like this:
chr1 92536437 92537640 NM_024813 NM_053274
I want to have a file from this in following order (first three columns are identifiers which I need it while splitting it)
chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274
Suggestions for a shell script.
#!/bin/bash
{
IFS=' '
while read a b c rest
do
for fld in $rest
do
echo -e "$a\t$b\t$c\t$fld"
done
done
}
Note that you should enter a real tab there (IFS)
I also thought I should do a perl version:
#!/bin/perl -n
($a,$b,$c,#r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for #r
To do it all from the commandline, reading from in.txt and outputting to out.txt:
perl -ne '($a,$b,$c,#r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for #r' in.txt > out.txt
Of course if you save the perl script (say as script.pl)
perl script.pl in.txt > out.txt
If you also make the script file executable (chmod +x script.pl):
./script.pl in.txt > out.txt
HTH
Not shell, and the other answer is perfectly fine, but i onelined it in perl :
perl -F'/\s/' -lane '$,="\t"; print #F,$_ for splice #F,3' $FILE
Edit: New (even more unreadable ;) version, inspired by the other answers. Abusing perl's command line parameters and special variables for autosplitting and line ending handling.
Means: For each of the fields after the three first (for splice #F,3), print the first three and it (print #F,$_).
-F sets the field separator to \s (should be \t) for -a autosplitting into #F.
-l turns on line ending handling for -n which runs the -e code for each line of the input.
$, is the output field separator.
[Edited]
So you want to duplicate the first three columns for each remaining item?
$ cat File | while read X
do PRE=$(echo "$X" | cut -f1-3 -d ' ')
for Y in $(echo "$X" | cut -f4- -d ' ')
do echo $PRE $Y >> OutputFilename
done
done
Returns:
chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR
This cuts the first three space delimited columns as a prefix, and then abuses the fact that a for loop will step through a space delimited list to call echo.
Enjoy.
This is just a subset of your data comparison in two files question.
Extracting my slightly hacky solution from there:
for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'

Resources