Bash can generate multiple strings from single, if you use {...,...} syntax. Like here:
$ echo pgdb{200,10{0,1}}
pgdb200 pgdb100 pgdb101
Is there any way to take a list of strings and produce (hopefully shorter) string that, upon processing via bash word expansion will produce original list (not necessarily in original order?
For example, I'd like this tool/algorithm, that given:
postgresql
mysql
postgres
miata
would produce (for example): {postgres{ql,},m{iata,ysql}}
I thought about using trie to represent input strings, but can't figure out how to process this trie to build output string.
use Compress::BraceExpansion;?
Related
I've got a massive file of hex encoded MD5 values that I'm using linux 'sort' utility to sort. The result is that the hashes come out in sequential order (which is what I need for the next stage of processing). E.g:
000001C35AE83CEFE245D255FFC4CE11
000003E4B110FE637E0B4172B386ACAC
000004AAD0EB3D896B654A960B0111FA
In the interest of speeding up the sort operation (and making the files smaller), I was considering encoding the data as base32 or base64.
The question is, would an alpha-sort of the base32/64 data get me the same result? My quick tests seem to indicate that it would work. For example, the above three hex strings correspond 1:1 to these base64 strings:
AAABw1roPO/iRdJV/8TOEQ==
AAAD5LEQ/mN+C0Fys4asrA==
AAAEqtDrPYlrZUqWCwER+g==
But I'm unsure as to the sort order when it comes to special characters used in Base64 like "/" and "+" and how those would be treated in the context of an alpha sort.
Note: I happen to be using the linux sort utility but the question still applies to other alpha-sorting tools. The tool used is not really part of the question.
I've since discovered that this isn't possible with the standard base32/64 implementations. There exists however a base32 variation called "base32hex" which preserves sort ordering, but there is no official "base64hex" equivalent.
Looks like that leaves creating a custom encoding like this.
EDIT:
This turned out to be very trivial to solve. Simply encode in base 64 then translate character to character with a custom table of characters that respects sort order.
Simply map from the standard Mime 64 characters:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
To something like this:
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz|~"
Then sorting will work.
I'm modifying an old bash file and am having some trouble manipulating strings. The problem is that the strings can be anything random to the left of _<date>.<num>. For example, from ThisIsAString-Sub_tag_150827.1, I need to extract _150827.1. In bash, this seems very difficult to do. In any other language, I would split on _, and just grab the last element of the list. How do I do this in bash? I've tried a few different ways (including with awk), but cannot seem to get it right.
With bash's Parameter Expansion:
a="ThisIsAString-Sub_tag_150827.1"
echo "${a##*_}"
Output:
150827.1
I want to parse the following word in shell script
VERSION=METER1.2.1
Here i want to split it as two words as
WORD1=METER
WORD2=1.2.1
Let me help how to parse it?
Far more efficient than using external tools such is sed is bash's built-in parameter expansion support. For instance, if you want the name variable to contain everything until the first number, and the numbers variable to contain everything after the last alpha character:
version=METER1.2.1
name=${version%%[0-9]*}
numbers=${version##*[[:alpha:]]}
To understand this, see the BashFAQ entry on string manipulation in general, or the BashFAQ entry on parameter expansion in particular.
I'm getting a string of few lines from the shell. Is it possible to get an Array with each line being its element?
Sure, depending on the output you could just split it. For example:
lines = `ls`.split
This solution is independent of the method you're using to execute the program. As long as you get the complete string you can split it.
The original question was splitting on lines, and the split function, by default, splits on white space. While that may be sufficient, you may want to pass in a regular expression, as in:
`ls -l`.split(/$/)
Which returns each line in a separate element in the array. However, it doesn't get rid of the initial carriage return or line feed. For that, you will want to use the map function to iterate over the array and apply strip to each, as in:
`ls -l`.split(/$/).map(&:strip)
Which form is most efficient?
1)
v=''
v+='a'
v+='b'
v+='c'
2)
v2='a'` `'b'` `'c'
Assuming readability were exactly the same to you, and that's a stretch, would 1) mean creating and throwing away a few string immutables (like in Python) or act as a Java "StringBuffer" with periodical expansion of the buffer capacity? How are string concatenations handled internally in Bash?
If 2) were just as readable to you as 1), would the backticks spawn subshells and would that be more costly, even as a potential 'no-op' than what is done in 1) ?
Well, the simplest and most efficient mechanism would be option 0:
v="abc"
The first mechanism involves four assignments.
The second mechanism is bizarre (and is definitely not readable). It (nominally) runs an empty command in two sub-shells (the two ` ` parts) and concatenates the outputs (an empty string) with the three constants. If the shell simply executes the back-tick commands without noting that they're empty (and it's not unreasonable that it won't notice; it is a weird thing to try — I don't recall seeing it done in my previous 30 years of shell scripting), this is definitely vastly slower.
So, given only options (1) and (2), use option (1), but in general, use option (0) shown above.
Why would you be building up the string piecemeal like that? What's missing from your example that makes the original code sensible but the reduced code shown less sensible.
v=""
x=$(...)
v="$v$x"
y=$(...)
v="$v$y"
z=$(...)
v="$v$z"
This would make more sense, especially if you use each of $x, $y and $z later, and/or use intermediate values of $v (perhaps in the commands represented by triple dots). The concatenation notation used will work with any Bourne-shell derivative; the alternative += shell will work with fewer shells, but is probably slightly more efficient (with the emphasis on 'slightly').
The portable and straight forward method would be to use double quotes and curly brackets for variables:
VARA="beginning text ${VARB} middle text ${VARC}..."
you can even set default values for empty variables this way
VARA="${VARB:-default text} substring manipulation 1st 3 characters ${VARC:0:3}"
using the curly brackets prevents situations where there is a $VARa and you want to write ${VAR}a but end up getting the contents of ${VARa}