Windows utility to find numbers and do operations?

Windows utility to find numbers and do operations? - windows

Is there such a thing that finds numbers using regex and can perform simple arithmetic operations to it?
Imagine you have a source/config file storing positions and later changed the code which requires an offset now. How do you normally go about this without doing it manually?
Edit: I knew I should've added this bit with the orignal post. I'd prefer something small and easily acquired from anywhere. I am aware of Cygwin and the wonderful util sets of linux which is why I explicitly put Windows in the title.

Get yourself a copy of Cygwin and train yourself up on bash, awk, sed, grep and their brethren.
The Windows cmd language has come a long way since the brain-dead days of MSDOS 3.3 but it's still not a wart on the rear end of the UNIX tools. Cygwin gives you all those tools and more.
A way of doing your specific task (if I understand it correctly) is to change:
a b 70 into offset 60
c d 82 a b 10
e f 90 c d 22
e f 30
The following command shows how to use awk to acheive that:
$ echo 'a b 70
c d 82
e f 90' | awk '
BEGIN {
print "offset 60"
}
{
print $1, $2, $3-60
}'
That's formatted for readability - I would tend to do it all on one line and get my input from a file rather than echoing it, but this is just for demo purposes.
If you want something a little more lightweight (in terms of what you have to install - it's still very powerrful), GnuWin32 can give you individual packagaes. Just install gawk or whatever you need.

Related

Octal dump in Linux shell without od or hd

I'm trying to control a Kasa Smartplug from an Axis embedded Linux product by following what George Georgovassilis has done here: https://blog.georgovassilis.com/2016/05/07/controlling-the-tp-link-hs100-wi-fi-smart-plug/
I've managed to switch the plug on and off from the Axis box but I've come unstuck trying to query the on/off status of the Smartplug because I don't have od (or hd, hexdump or xxd) and the Smartplug output is binary. The snippet of George's code which does this is:
decode(){
code=171
input_num=`od $ODOPTS`
IFS=' ' read -r -a array <<< "$input_num"
args_for_printf=""
for element in "${array[#]}"
do
output=$(( $element ^ $code ))
args_for_printf="$args_for_printf\x$(printf %x $output)"
code=$element
done
printf "$args_for_printf"
}
Is there a way I can do this using basic shell commands instead of using od please?
The Axis box says it's Linux 2.6.29 on a crisv32
I used to use Unix about 30 years ago so I'm struggling...

Octal dump in Linux shell without od or hd
Seems simple enough with awk. Borrowing code from this answer and splitting from this answer it's simple to:
awk -v ORS='' -v OFS='' 'BEGIN{ for(n=0;n<256;n++)ord[sprintf("%c",n)]=n }
{ split($0, chars, "");
chars[length($0)+1]=RS; # add the newline to output. One could check if RS is empty.
# and print
for (i=1;i<=length($0)+1;++i) printf("%o\n", ord[chars[i]]) }'

I managed to solve this in the end:
I couldn't work out how to replicate od using just the shell commands available on the target machine, so I created a very simple C program to read each byte from the binary and print it out as a readable character. That included replicating the weird XOR, which seems to be something to do with rudimentary encryption (probably). I then pulled out the value which I needed using sed. Cross-compiling this C on my Lubuntu machine for the target CRIS architecture wasn't too difficult for a simple program.
Everything was much easier once I'd reduced the model code to a minimal reproducible example for myself. Thanks all.

Pipe input redirection to random spot in my new function

Sorry for the bad wording of the question, couldn't really think about a decent way to say this.
Problem:
I want a certain sequence to show up, and given is a number, 19 for example.
$ echo 19 | seq 1 2 [INPUT FROM PIPE HERE]
So i want the sequence to go from 1, with an increment of 2 till it reaches the input number 19.
And i don't know how to do this, although it's probably very easy, i'm still very new to the shell.
PS: Sorry if this is a duplicate, i couldn't find what i was looking for after 15min searching.

I'm not sure whether you can pipe parameters into seq. But it might be possible to get the output of a command and use it in seq.
You might be able to use this though
e=$(echo 19) && seq 1 2 "$e"
A more straightforward way is:
seq 1 2 $(echo 19)
This will run a command, echo 19 in this case, and assigns it to variable e. Only if this assignment was successful the next command will run (this is ensured by &&). The next program will then use that variable as a parameter via "$e". The double quotation marks are not necessary in this case, but they might be useful in some other cases of this method.

Trying to get an average using the contents of two files

So I have two files in my directory that contain a number in each of them. I want to make a script that calculates the average of these two numbers. How would I write it? Would this be correct?
avg=$((${<file1.txt}-${<file2.txt})/2)

Your example does not work. Furthermore, your formula is probably incorrect. Here are two options without unnecessary cat:
avg=$(( (`<file1.txt` + `<file2.txt`) / 2 ))
or
avg=$(( ($(<file1.txt) + $(<file2.txt)) / 2 ))
I find the first one more readable though. Also be warned: this trivial approach will cause problems when your files contain more than just the plain numbers.
EDIT:
I should have noted that the first syntactical/legacy option which uses the backticks (` `) is no longer recommended and should be avoided. You can read more about the WHY here. Thanks at mklement0 for the link!
EDIT2:
According to Eric, the values are floating point numbers. You can't do this directly in bash because only integer numbers are supported. You have to use a little helper:
avg=$(bc <<< "( $(<file1.txt) + $(<file2.txt) ) / 2")
or maybe easier to understand
avg=$(echo "( $(<file1.txt) + $(<file2.txt) ) / 2" | bc)
For those who might wonder what bc is (see man bc):
bc is a language that supports arbitrary precision numbers with
interactive execution of statements.
Here is another alternative since perl is usually installed by default:
avg=$(perl -e 'print( ($ARGV[0] + $ARGV[1]) / 2 )' -- $(<file1.txt) $(<file2.txt))

You'll want to use a command substitution:
avg=$(($(cat file1.txt)-$(cat file2.txt)/2))
However, Bash is a pretty bad language for doing maths (at least unless it's completely integer maths). You might want to look into bc or a "real" language like Python.

Bash: Trying to append to a variable name in the output of a function

this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?

Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.

Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk

Smart split file with gzipping each part?

I have a very long file with numbers. Something like output of this perl program:
perl -le 'print int(rand() * 1000000) for 1..10'
but way longer - around hundreds of gigabytes.
I need to split this file into many others. For test purposes, let's assume that 100 files, and output file number is taken by taking module of number with 100.
With normal files, I can do it simply with:
perl -le 'print int(rand() * 1000000) for 1..1000' | awk '{z=$1%100; print > z}'
But I have a problem when I need to compress splitted parts. Normally, I could:
... | awk '{z=$1%100; print | "gzip -c - > "z".txt.gz"}'
But, when ulimit is configured to allow less open files than number of "partitions", awk breaks with:
awk: (FILENAME=- FNR=30) fatal: can't open pipe `gzip -c - > 60.txt.gz' for output (Too many open files)
This doesn't break with normal file output, as GNU awk is apparently smart enough to recycle file handles.
Do you know any way (aside from writing my own stream-splitting-program, implementing buffering, and some sort of pool-of-filehandles management) to handle such case - that is: splitting to multiple files, where access to output files is random, and gzipping all output partitions on the fly?

I didn't write it in question itself, but since the additional information is together with solution, I'll write it all here.
So - the problem was on Solaris. Apparently there is a limitation, that no program using stdio on Solaris can have more than 256 open filehandles ?!
It is described in here in detail. The important point is that it's enough to set one env variable before running my problematic program, and the problem is gone:
export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio