How is simple math done to only part of stdin? - bash

I am working on a Automator service and in my situation I have stdin as
B-Funny Flash Nonfiction 202105131635 and I want to get to B-Funny Flash Nonfiction 202105131636 incriminating the "5" by 1 to "6".
I'd think I'd first want to separate the text from the number before doing the add 1 then rejoin them?
Would egrep or sed or awk be best?
Tips?

Bash has simple integer arithmetic built in.
str='B-Funny Flash Nonfiction 202105131635'
# parse into prefix and number
num=${str##*[!0-9]}
prefix=${str%$num}
echo "$prefix$((num+1))"
The parameter expansion ${var#pat} produces the value of the variable var with any prefix matching pat removed; % does the same for suffixes, and doubling the operator changes to matching the longest possible pattern match instead of the shortest. The pattern *[!0-9] matches a string which ends on a character which isn't a number; in this context, it retrieves the prefix, i.e. everything up to just before the first digit. (If your prefix could contain numbers, too, this needs tweaking. Probably switch to removing all digits from the end, then extracting the removed numbers; but I guess this will require an unattractive temporary variable.)
Finally, the secret sauce which evaluates an arithmetic expression is the $((...)) arithmetic context.
For more involved number crunching, try bc or Awk. In fact, this could be a one-liner in Awk:
awk '{ $NF +=1 }1' <<<"$str"
The here string passes the value as standard input to Awk, which increments the last field $NF. The final 1 is a common Awk shorthand for "print all input lines to output".

I don't know the bash tools well enough to give a cool one-line answer, so here is a python script instead.
Usage
Save the code in a file increment.py;
Make the file executable with chmod +x increment.py;
Run the script with ./increment.py blablabla 123.
Code
#!/usr/bin/env python3
import sys
def print_help(argv0):
print('increment numbers by 1')
print('example usage:')
print(' {} B-Funny Flash Nonfiction 202105131635'.format(argv0))
print(' B-Funny Flash Nonfiction 202105131636')
def main(argv):
if len(argv) < 2:
print_help(argv[0])
else:
for s in argv[1:]:
if s.isnumeric():
print(int(s) + 1, end=' ')
else:
print(s, end=' ')
print()
if __name__=='__main__':
main(sys.argv)
Explanation
In a python program called from the command-line, the command-line arguments are stored in the array sys.argv.
The first element of the array, with index 0, is the name that was used to call the program, most likely "./increment.py" in our case.
The remaining elements are the parameters that were passed to the program; the words "B-Funny", "Flash", "Nonfiction", "202105131635" in our case.
The for-loop for s in argv[1:]: iterates on the elements of argv, but starting with the element 1 (thus ignoring the element 0). Each of these elements is a string; the method .isnumeric is used to check whether this string represents a number or not. Refer to the documentation on .isnumeric.
If the string is not numeric, we print is as-is. If the string is numeric, we compute the number it represents by calling int(s), then we add 1, and we print the result.
Apart from that, the line if len(argv): checks whether argv contains at least two elements; if it doesn't, that means it only contains its element 0, which is "./increment.py"; in this case, instead of printing the arguments, the script calls the function print_help which explains how to use the program.
Finally, the bit about if __name__ == '__main__': is a python idiom to check whether the file increment.py was run as a program or as a module imported by another file. Refer to this question.

Related

What does the # symbol mean in this bash for loop? [duplicate]

I know that one can get the length of an array in bash by doing ${#arrayname[#]}.
My question is: is this just something that I have to memorize, or can this syntax be broken down into understandable parts? For instance, what does the # symbol mean where one would expect to find the index? Why the #?
# at the beginning of a variable reference means to get the length of the variable's value. For a normal variable this means its length in characters. # is the "number" sign, so you can remember this as meaning "the number of things in the variable".
# or * in an array index means to use the whole array, not a specific element, and instead of returning the number of characters, it returns the number of array elements. * is used as a wildcard in many contexts, so this should be easy to remember. Also, $* and $# are used to mean all the arguments to a shell script, so the parallel with all the array elements should be obvious.
You can't just write ${#arrayname} because when you use an array variable without a subscript, it's equivalent to element 0 of the array. So ${#arrayname} is the same as ${#arrayname[0]}, which is the number of characters in the first element of the array.
You should memorize. :) The # usually means number. e.g. the
$# - is the number of arguments
${#str} - length of the string $str
${#arr[#]}" - length (number of elements) of the array arr
${#arr} - the length of the 1st element of the array (like the str above)
Unfortunately the ${parameter#word} or ${parameter##word} has nothing with numbers. (it removes the shortest/longest word from the beginning of the parameter.
And also, the # .... is comment ;)
In general usage of form ${#PARAMETER} returns the length in number of characters and NOT bytes of the parameter's value.
myString="Hello StackOverflow!"
printf "%s\n" "${#myString}"
20
But for arrays, this expansion type has two meanings:
For individual elements, it reports the string length of the element
(as for every "normal" parameter)
For the mass subscripts # and * it
reports the number of set elements in the array
Consider an example over arrays,
myArray=(1 2 3 4 15)
printf "%s\n" "${myArray[#]}" # <-- Gives me list of elements
1
2
3
4
15
printf "%s\n" "${#myArray[#]}" # <-- Gives me number of elements
5
It gets interesting now, the length of the last element 2 can be obtained by doing
printf "%s\n" "${#myArray[4]}"
2
The '#' acts the same way as '*'. Instead of providing a specific index this references the full thing.
The '#' is telling bash you want the length
https://www.cyberciti.biz/faq/finding-bash-shell-array-length-elements/

Ruby command line script: Trying to pass variable in switch case

I'm creating a ruby command line tool which has a switch case statement, I'd like to pass through variables on this switch case statement for example:
input = gets.chomp
case input
when 'help'
display_help
when 'locate x, y' # this is the bit i'm stuck on
find_location(x, y)
when 'disappear s'
disappear_timer(s)
when 'exit'
exit
else
puts "incorrect input"
end
Essentially I want the user to be able to type in locate 54, 30 or sleep 5000 and then call a function which handles the number they passed. I was wondering how I can either pass arguments from the user in a switch statement like this for my command line tool like this?
Use Regexp matcher inside when:
when /locate \d+, \d+/
find_location *input.scan(/\d+/).map(&:to_i)
Here we basically match whatever is locate followed by digits, comma, space, digits. If matched, we extract the digits from the string with String#scan and then convert to Integers, finally passing them as an argument to find_location method.

Find lines that have partial matches

So I have a text file that contains a large number of lines. Each line is one long string with no spacing, however, the line contains several pieces of information. The program knows how to differentiate the important information in each line. The program identifies that the first 4 numbers/letters of the line coincide to a specific instrument. Here is a small example portion of the text file.
example text file
1002IPU3...
POIPIPU2...
1435IPU1...
1812IPU3...
BFTOIPD3...
1435IPD2...
As you can see, there are two lines that contain 1435 within this text file, which coincides with a specific instrument. However these lines are not identical. The program I'm using can not do its calculation if there are duplicates of the same station (ie, there are two 1435* stations). I need to find a way to search through my text files and identify if there are any duplicates of the partial strings that represent the stations within the file so that I can delete one or both of the duplicates. If I could have BASH script output the number of the lines containing the duplicates and what the duplicates lines say, that would be appreciated. I think there might be an easy way to do this, but I haven't been able to find any examples of this. Your help is appreciated.
If all you want to do is detect if there are duplicates (not necessarily count or eliminate them), this would be a good starting point:
awk '{ if (++seen[substr($0, 1, 4)] > 1) printf "Duplicates found : %s\n",$0 }' inputfile.txt
For that matter, it's a good starting point for counting or eliminating, too, it'll just take a bit more work...
If you want the count of duplicates:
awk '{a[substr($0,1,4)]++} END {for (i in a) {if(a[i]>1) print i": "a[i]}}' test.in
1435: 2
or:
{
a[substr($0,1,4)]++ # put prefixes to array and count them
}
END { # in the end
for (i in a) { # go thru all indexes
if(a[i]>1) print i": "a[i] # and print out the duplicate prefixes and their counts
}
}
Slightly roundabout but this should work-
cut -c 1-4 file.txt | sort -u > list
for i in `cat list`;
do
echo -n "$i "
grep -c ^"$i" file.txt #This tells you how many occurrences of each 'station'
done
Then you can do whatever you want with the ones that occur more than once.
Use following Python script(syntax of python 2.7 version used)
#!/usr/bin/python
file_name = "device.txt"
f1 = open(file_name,'r')
device = {}
line_count = 0
for line in f1:
line_count += 1
if device.has_key(line[:4]):
device[line[:4]] = device[line[:4]] + "," + str(line_count)
else:
device[line[:4]] = str(line_count)
f1.close()
print device
here the script reads each line and initial 4 character of each line are considered as device name and creates a key value pair device with key representing device name and value as line numbers where we find the string(device name)
following would be output
{'POIP': '2', '1435': '3,6', '1002': '1', '1812': '4', 'BFTO': '5'}
this might help you out!!

Ruby: Difference between these two?

I was wondering what the difference between print x and print "#{x}", in Ruby was. Does it really matter which one we use?
The expression print "#{foo}" roughly translates to print foo.to_s.
Kernel#print is a thin wrapper around IO#print which ultimatively calls IO#write. From write's documentation:
[...] If the argument is not a string, it will be converted to a string using to_s. [...]
So in the end, there is close to no difference. print "#{foo}" will however first create a String representation of foo and secondly interpolate that result into an otherwise empty string—but I think that could (should) easily be optimized by the interpreter.
print "#{foo}" - here you are doing string interpolation.Whatever object will be referenced by foo(if it is a local variable), returned from foo(if it is a method), on that result String#to_s will be applied.
print foo will output the object will be referenced by foo(if it is a local variable), returned from foo(if it is a method), on that result #to_s will be applied.
There is no difference, they both apply to_s implicitly at some point. You should use print x and not print "#{x}". Why would you wonder which to use? print "#{x}" is obviously less simple than print x.
The print name is usually used if you only need to print that thing and nothing more.
String interpolation is used when you want to insert the values in other strings.
print "My name is #{my_name} and I am currently #{my_age} years old."
It is even possible to insert some logic:
print "My name is #{my_name.capitalize} and"
print "I am currently #{my_age} year#{my_age>1 ? 's':''} old." #print years instead of year if age is greater than 1.

Explain Ruby Commify large integer method

I found this snippet online and the purpose is to commify any number including numbers with decimals ... 99999999 => 99,999,999. I can see that it uses regex but I am confused by "$1.reverse, $2"
def commify(n)
n.to_s =~ /([^\.]*)(\..*)?/
int, dec = $1.reverse, $2 ? $2 : ""
while int.gsub!(/(,|\.|^)(\d{3})(\d)/, '\1\2,\3')
end
int.reverse + dec
end
Can anyone explain what is going on in this code?
$1, $2, $3 ... are Perl legacy. They are capture group variables, that is, they capture the groups inside the regular expression.
A named group is indicated by parentheses. So, the first capture group matches ([^\.]), which is any non dot character, and (\..*) matches a dot character \. and any other characters after it.
Note that the second group is optional, so in the line below you have the ternary expression $2 ? $2 : "", which is a crypty-ish way to get either the value of the capture of a blank string.
The int, dec = $1, $2_or_blank_string is a parallel assignment. Ruby supports assigning more than one variable at once, it's not different than doing int = $1.reversed then dec = $2 So int now holds the integer part (reversed) and dec the decimal part of the number. We are interested in the first one for now.
The next empty while does a string substitution. The method gsub! replaces all occurences of the regular expression for the value in the seconf argument. But it returns nil if no change happened, which ends the while.
The /(,|\.|^)(\d{3})(\d)/ expression matches:
(,|\.|^) A comma, a point or the beginning of the string
(\d{3}) Three digits
(\d) A fourth digit
Then replaces it for \1\2,\3. The \n in a string substitution mean the nth capture group, just as the $n variables do. So, it basically does: if I have four digits, just add a comma after the third one. Repeat until no group of four digits is found
Then, just reverse the integer part again and append the decimal part.
n.to_s =~ /([^\.]*)(\..*)?/ takes the number as a string and stores everything before the decimal point (or simply everything if there is no decimal point) in $1 and everything after and including it in $2.
int, dec = $1.reverse, $2 ? $2 : "" stores the reverse of $1 in int and $2, or "" if $2 is nil, in dec. In other words int now contains the part before the decimal point reversed and dec contains the part after the point (not reversed).
The next line inserts a comma every three places into int. So by reversing int again we get the original integral part of the number with commas inserted every three places from the end. Now we add dec again at the end and get the original number with commas at the right places.
Another way:
class Integer
def commify
self.to_s.gsub(/(\d)(?=(\d{3})+$)/,'\1,')
end
end
Then you can do 12345678.commify and get the string 12,345,678
And here's one that handles floating point numbers:
class Float
def commify
self.to_s.reverse.gsub(/(\d\d\d)(?=\d)(?!\d*\.)/,'\1,').reverse
end
end

Resources