How to separate variables without spaces - bash

My question is similar to this one, but not really.
The issue is that I have variables in my script that will echo/printf control characters directly next to the previous. Unfortunately I have to put spaces between the variables or everything gets misinterpreted, but that's not going to work either, as I can't have spaces between them.
str="25 cents"
one=1
two=2
printf "\x3${one},${two}${str}\x30"\
(without spaces this string messes up)
printf "\x3${one},${two}%s\x30" "${str}" # outputs "5 cents"
So it ends up being either " 25 cents " (wrong), or "5 cents" (wrong x 2)... It should be:
25 cents
I've tried just about everything, escaping the variables, putting them in quotes and no luck. Evidently there's a correct way to handle this that I'm unaware of, so any help is great - thanks.

If what you are trying to do is insert mIRC colour codes into a string -- and you would have made it easier to be helped if you had said so -- then you need to be aware of two things:
The C-style hexadecimal escapes interpreted by Gnu printf have the format \x followed by two hexadecimal digits. (You can use just one digit, but only if the next character is not a hexadecimal digit. So it's better to think of it as always being two digits.) A control-C (character code 3) is written \x03. x30 through \x39 are the character codes for the digits 0 through 9. The translation of the escape code is done by printf, not by the shell, so parameter substitution happens first. So if the value of $one is one, printf "\x3${one}" will be expanded to printf "\x31" by the shell, and then printf will print the digit 1. I presume that is not what you want, since there are obviously much less round-about ways to insert the value of a variable, which don't limit the variable to be a single decimal digit.
Not all printf implementations handle hexadecimal escapes, and not all shells have a built-in printf. So while you can use \x03 with bash, you might find that it is not portable. All printf implementations should handle octal escapes, though, and 3 is still 3 in octal, but now you need three digits: \003.
The mIRC colour codes have the form control-C followed by up to two numbers separated by a comma. These numbers have a maximum of two digits, and if the next character after the colour code is a digit, you must use the two-digit form. (Coincidentally similar to the hex escape codes above, but it is truly just a coincidence.) So if you wanted the text 25 with foreground colour 3 and background colour 1, you would need to send ^C1,0225^C; if you sent ^C1,225^C, that would be interpreted as foreground colour 1 and background colour 25 (which is not a valid colour code), with the text being 5.
This is mentioned in the mIRC documentation linked above:
Note: if you want to color text that begins with numbers, this syntax requires that you specify the color value as two digits.
So a better printf invocation might be:
printf "\003%02d,%02d%s\003" "$one" "$two" "$str"
Note: It is, of course, possible that my guess about what string you are seeking to produce is completely wrong; it is just a guess based on an off-hand comment which was not deleted. If so, and if you are serious about getting your question answered, I strongly suggest you provide a clearer explanation of precisely what byte-string you are attempting to produce with your printf statement.

Related

Finding number range with grep

I have a database in this format:
username:something:UID:something:name:home_folder
Now I want to see which users have a UID ranging from 1000-5000. This is what what I tried to do:
ypcat passwd | grep '^.*:.*:[1-5][0-9]\{2\}:'
My thinking is this: I go to the third column and find numbers that start with a number from 1-5, the next number can be any number - range [0-9] and that range repeats itself 2 more times making it a 4 digit number. In other words it would be something like [1-5][0-9][0-9][0-9].
My output, however, lists even UID's that are greater than 5000. What am I doing wrong?
Also, I realize the code I wrote could potentially lists numbers up to 5999. How can I make the numbers 1000-5000?
EDIT: I'm intentionally not using awk since I want to understand what I'm doing wrong with grep.
There are several problems with your regex:
As Sundeep pointed out in a comment, ^.*:.*: will match two or more columns, because the .* parts can match field delimiters (":") as well as field contents. To fix this, use ^[^:]*:[^:]*: (or, equivalently, ^\([^:]:\)\{2\}); see the notes on bracket expressions and basic vs extended RE syntax below)
[0-9]\{2\} will match exactly two digits, not three
As you realized, it matches numbers starting with "5" followed by digits other than "0"
As a result of these problems, the pattern ^.*:.*:[1-5][0-9]\{2\}: will match any record with a UID or GID in the range 100-599.
To do it correctly with grep, use grep -E '^([^:]*:){2}([1-4][0-9]{3}|5000):' (again, see Sundeep's comments).
[Added in edit:]
Concerning bracket expressions and what ^ means in them, here's the relevant section of the re_format man page:
A bracket expression is a list of characters enclosed in '[]'. It
normally matches any single character from the list (but see below).
If the list begins with '^', it matches any single character (but see
below) not from the rest of the list. If two characters in the list
are separated by '-', this is shorthand for the full range of
characters between those two (inclusive) in the collating sequence,
e.g. '[0-9]' in ASCII matches any decimal digit.
(bracket expressions can also contain other things, like character classes and equivalence classes, and there are all sorts of special rules about things like how to include characters like "^", "-", "[", or "]" as part of a character list, rather than negating, indicating a range, class, or end of the expression, etc. It's all rather messy, actually.)
Concerning basic vs. extended RE syntax: grep -E uses the "extended" syntax, which is just different enough to mess you up. The relevant differences here are that in a basic RE, the characters "(){}" are treated as literal characters unless escaped (if escaped, they're treated as RE syntax indicating grouping and repetition); in an extended RE, this is reversed: they're treated as RE syntax unless escaped (if escaped, they're treated as literal characters).
That's why I suggest ^\([^:]:\)\{2\} in the first bullet point, but then actually use ^([^:]*:){2} in the proposed solution -- the first is basic syntax, the second is extended.
The other relevant difference -- and the reason I switched to extended for the actual solution -- is that only extended RE allows | to indicate alternatives, as in this|that|theother (which matches "this" or "that" or "theother"). I need this capability to match a 4-digit number starting with 1-4 or the specific number 5000 ([1-4][0-9]{3}|5000). There's simply no way to do this in a basic RE, so grep -E and the extended syntax are required here.
(There are also many other RE variants, such as Perl-compatible RE (PCRE). When using regular expressions, always be sure to know which variant your regex tool uses, so you don't use syntax it doesn't understand.)
ypcat passwd |awk -F: '$3>1000 && $3 <5000{print $1}'
awk here can go the task in a simple manner. Here we made ":" as the delimiter between the fields and put the condition that third field should be greater than 1000 and less then 5000. If this condition meets print first field.

How to format a US currency string using python or sed

I have numerous invoices that I sent to clients with this string at the bottom:
Total: 1,000.00
or whatever the amount. Some are 2 figures, some 5 figures + the decimal part.
The thing is that the number's format is inconsistant accross all invoices. Sometimes its 1.000,00 and it keeps on switching the dot and the coma.
so with grep, awk and sed, i am able to only get the amount part from all invoices, without the dollar sign in order to sum them up to a grand total. But the dot and coma switching confuses python, obviously.
So in python (could be in sed as well), i am looking to convert the third char from the right to a dot and then from there on, every fourth char it finds, convert it to a coma.
In other words, it has to be able to separate the digits in groups of 3 from the right, add a coma in between each of them except for the first group at the far right which would be 2 digits separated by a dot.
Hope that is clear enough...
Try this:
yourstring = yourstring[:(len(yourstring)-3)].replace(".",",") + "." + yourstring[-2:]
I tried this on python and I think that works.
sed 's/$/ /
:coma
s/\([0-9]\)[.]\([0-9]\{3\}\)/\1,\2/g;t coma
:dot
s/\([0-9]\),\([0-9][0-9][^0-9]\)/\1.\2/g;t dot
s/ $//
' YourFile
use general and recursive modification for all number on each line.
change every dot number into coma structure then change last coma to a dot
need a trick to change number at end of string (add a space at start, remove it at the end [this could be optimized with a previous test])
posix compliant
Well, the simplest way i've found to handle this is using a bit of sed, some bash and for the final print, printf, which allow us easy currency formatting with "%'.2f" (note the ' character, it is mandatory):
# Get rid of every character that is not a digit
totals=$( echo "$totals" | sed 's/[^0-9]*//g' )
# Sum up the amounts
sum=0
for n in $totals; do
sum=$(($sum+$n))
done
# Put back the comas at each thousand, the dot at decimals and the $ sign in
sansdec=(${#sum}-2)
sum="${sum:0:$sansdec}.${sum: -2}"
printf "%s" "\$"
printf "%'.2f\n" "$sum"

How can I make my terminal prompt extend the width of the terminal?

I noticed in this video, that the terminal prompt extends the entire width of the terminal before breaking down to a new line. How can I set my PS1 variable to fill the remaining terminal space with some character, like the way this user did?
The issue is, I don't know how to update the PS1 variable per command. It seems to me, that the string value for PS1 is only read in once just as the .bashrc file is only read in once. Do I have to write some kind of hook after each command or something?
I should also point out, that the PS1 variable will be evaluated to a different length based on the escape characters that make up it. For example, \w print the path.
I know I can get the terminal width using $(COLUMNS), and the width of the current PS1 variable with ${#PS1}, do the math, and print the right amount of buffer characters, but how do I get it to update everytime. Is there a preferred way?
Let's suppose you want your prompt to look something like this:
left text----------------------------------------------------------right text
prompt$
This is pretty straight-forward provided that right text has a known size. (For example, it might be the current date and time.) What we do is to print the right number of dashes (or, for utf-8 terminals, the prettier \u2500), followed by right text, then a carriage return (\r, not a newline) and the left text, which will overwrite the dashes. The only tricky bit is "the right number of dashes", but we can use $(tput cols) to see how wide the terminal is, and fortunately bash will command-expand PS1. So, for example:
PS1='\[$(printf "%*s" $(($(tput cols)-20)) "" | sed "s/ /-/g") \d \t\r\u#\h:\w \]\n\$ '
Here, $(($(tput cols)-20)) is the width of the terminal minus 20, which is based on \d \t being exactly 20 characters wide (including the initial space).
PS1 does not understand utf-8 escapes (\uxxxx), and inserting the appropriate substitution into the sed command involves an annoying embedded quote issue, although it's possible. However, printf does understand utf-8 escapes, so it is easier to produce the sequence of dashes in a different way:
PS1='\[$(printf "\\u2500%.0s" $(seq 21 $(tput cols))) \d \t\r\u#\h:\w \]\n\$ '
Yet another way to do this involves turning off the terminal's autowrap, which is possible if you are using xterm or a terminal emulator which implements the same control codes (or the linux console itself). To disable autowrap, output the sequence ESC[?7l. To turn it back on, use ESC[?7h. With autowrap disabled, once output reaches the end of a line, the last character will just get overwritten with the next character instead of starting a new line. With this technique, it's not really necessary to compute the exact length of the dash sequence; we just need a string of dashes which is longer than any console will be wide, say the following:
DASHES="$(printf '\u2500%0.s' {1..1000})"
PS1='\[\e[?7l\u#\h:\w $DASHES \e[19D \d \t\e[?7h\]\n\$ '
Here, \e[19D is the terminal-emulator code for "move cursor backwards 19 characters". I could have used $(tput cub 19) instead. (There might be a tput parameter for turning autowrap on and off, but I don't know what it would be.)
The example in the video also involves inserting a right-aligned string in the actual command line. I don't know any clean way of doing this with bash; the console in the video is almost certainly using zsh with the RPROMPT feature. Of course, you can output right-aligned prompts in bash, using the same technique as above, but readline won't know anything about them, so as soon as you do something to edit the line, the right prompt will vanish.
Use PROMPT_COMMAND to reset the value of PS1 before each command.
PROMPT_COMMAND=set_prompt
set_prompt () {
PS1=...
}
Although some system script (or you yourself) may already use PROMPT_COMMAND for something, in which case you can simply add to it.
PROMPT_COMMAND="$PROMPT_COMMAND; set_prompt"

Ruby (on Rails) Regex: removing thousands comma from numbers

This seems like a simple one, but I am missing something.
I have a number of inputs coming in from a variety of sources and in different formats.
Number inputs
123
123.45
123,45 (note the comma used here to denote decimals)
1,234
1,234.56
12,345.67
12,345,67 (note the comma used here to denote decimals)
Additional info on the inputs
Numbers will always be less than 1 million
EDIT: These are prices, so will either be whole integers or go to the hundredths place
I am trying to write a regex and use gsub to strip out the thousands comma. How do I do this?
I wrote a regex: myregex = /\d+(,)\d{3}/
When I test it in Rubular, it shows that it captures the comma only in the test cases that I want.
But when I run gsub, I get an empty string: inputstr.gsub(myregex,"")
It looks like gsub is capturing everything, not just the comma in (). Where am I going wrong?
result = inputstr.gsub(/,(?=\d{3}\b)/, '')
removes commas only if exactly three digits follow.
(?=...) is a lookahead assertion: It needs to be possible to be matched at the current position, but it's not becoming part of the text that is actually matched (and subsequently replaced).
You are confusing "match" with "capture": to "capture" means to save something so you can refer to it later. You want to capture not the comma, but everything else, and then use the captured portions to build your substitution string.
Try
myregex = /(\d+),(\d{3})/
inputstr.gsub(myregex,'\1\2')
In your example, it is possible to tell from the number of digits after the last separator (either , or .) that it is a decimal point, since there are 2 lone digits. For most cases, if the last group of digits does not have 3 digits then you can assume that the separator in front is decimal point. Another sign is the multiple appearance of a separator in big numbers allows us to differentiate between decimal point and separators.
However, I can give a string 123,456 or 123.456 without any sort of context. It is impossible to tell whether they are "123 thousand 456" or "123 point 456".
You need to scan the document to look for clue whether , is used for thousand separator or decimal point, and vice versa for .. With the context provided, then you can safely apply the same method to remove the thousand separators.
You may also want to check out this article on Wikipedia on the less common ways to specify separators or decimal points. Knowing and deciding not to support is better than assuming things will work.

How to stop ANSI colour codes messing up printf alignment?

I discovered this while using ruby printf, but it also applies to C's printf.
If you include ANSI colour escape codes in an output string, it messes up the alignment.
Ruby:
ruby-1.9.2-head > printf "%20s\n%20s\n", "\033[32mGreen\033[0m", "Green"
Green # 6 spaces to the left of this one
Green # correctly padded to 20 chars
=> nil
The same line in a C program produces the same output.
Is there anyway to get printf (or something else) to align output and not add spaces for non-printed characters?
Is this is a bug, or is there a good reason for it?
Update: Since printf can't be relied upon to align data when there's ANSI codes and wide chars, is there a best practice way of lining up coloured tabular data in the console in ruby?
It's not a bug: there's no way ruby should know (at least within printf, it would be a different story for something like curses) that its stdout is going to a terminal that understands VT100 escape sequences.
If you're not adjusting background colours, something like this might be a better idea:
GREEN = "\033[32m"
NORMAL = "\033[0m"
printf "%s%20s%s\n", GREEN, "Green", NORMAL
I disagree with your characterization of '9 spaces after the green Green'. I use Perl rather than Ruby, but if I use a modification of your statement, printing a pipe symbol after the string, I get:
perl -e 'printf "%20s|\n%20s|\n", "\033[32mGreen\033[0m", "Green";'
Green|
Green|
This shows to me that the printf() statement counted 14 characters in the string, so it prepended 6 spaces to produce 20 characters right-aligned. However, the terminal swallowed 9 of those characters, interpreting them as colour changes. So, the output appeared 9 characters shorter than you wanted it to. However, the printf() did not print 9 blanks after the first 'Green'.
Regarding the best practices for aligned output (with colourization), I think you'll need to have each sized-and-aligned field surrounded by simple '%s' fields which deal with the colourization:
printf "%s%20.20s%s|%s%-10d%s|%s%12.12s%s|\n",
co_green, column_1_data, co_plain,
co_blue, column_2_data, co_plain,
co_red, column_3_data, co_plain;
Where, obviously, the co_XXXX variables (constants?) contain the escape sequences to switch to the named colour (and co_plain might be better as co_black). If it turns out that you don't need colourization on some field, you can use the empty string in place of the co_XXXX variables (or call it co_empty).
printf field width specifiers are not useful for aligning tabular data, interface elements, etc. Aside from the issue of control characters which you have already discovered, there are also nonspacing and double-width characters which your program will have to deal with if you don't want to limit things to legacy character encodings (which many users consider deprecated).
If you insist on using printf this way, you probably need to do something like:
printf("%*s\n%*s\n", bytestopad("\033[32mGreen\033[0m", 20), "\033[32mGreen\033[0m", bytestopad("Green", 20), "Green");
where bytestopad(s,n) is a function you write that computes how many total bytes are needed (string plus padding spaces) to result in the string s taking up n terminal columns. This would involve parsing escapes and processing multibyte characters and using a facility (like the POSIX wcwidth function) to lookup how many terminal columns each takes. Note the use of * in place of a constant field width in the printf format string. This allows you to pass an int argument to printf for runtime-variable field widths.
I would separate out any escape sequences from actual text to avoid the whole matter.
# in Ruby
printf "%s%20s\n%s%20s\n", "\033[32m", "Green", "\033[0m", "Green"
or
/* In C */
printf("%s%20s\n%s%20s\n", "\033[32m", "Green", "\033[0m", "Green");
Since ANSI escape sequences are not part of either Ruby or C neither thinks that they need to treat these characters special, and rightfully so.
If you are going to be doing a lot of terminal color stuff then you should look into curses and ncurses which provide functions to do color changes that work for many different types of terminals. They also provide much much more functionality, like text based windows, function keys, and sometimes even mouse interaction.
Here's a solution I came up with recently. This allows you to use color("my string", :red) in a printf statement. I like using the same formatting string for headers and the data -- DRY. This makes that possible. Also, I use the rainbow gem to generate the color codes; it's not perfect but gets the job done. The CPAD hash contains two values for each color, corresponding to left and right padding, respectively. Naturally, this solution should be extended to facilitate other colors and modifiers such as bold and underline.
CPAD = {
:default => [0, 2],
:green => [0, 3],
:yellow => [0, 2],
:red => [0, 1],
}
def color(text, color)
"%*s%s%*s" % [CPAD[color][0], '', text.color(color), CPAD[color][1], '']
end
Example:
puts "%-10s %-10s %-10s %-10s" % [
color('apple', :red),
color('pear', :green),
color('banana', :yellow)
color('kiwi', :default)
]

Resources