Store and query a mapping in a file, without re-inventing the wheel - shell

If I were using Python, I'd use a dict. If I were using Perl, I'd use a hash. But I'm using a Unix shell. How can I implement a persistent mapping table in a text file, using shell tools?
I need to look up mapping entries based on a string key, and query one of several fields for that key.
Unix already has colon-separated records for mappings like the system passwd table, but there doesn't appear to be a tool for reading arbitrary files formatted in this manner. So people resort to:
key=foo
fieldnum=3
value=$(cat /path/to/mapping | grep "^$key:" | cut -d':' -f$fieldnum)
but that's pretty long-winded. Surely I don't need to make a function to do that? Hasn't this wheel already been invented and implemented in a standard tool?

Given the conditions, I don't see anything hairy in the approach. But maybe consider awk to extract data. awk approach allows for picking only the first, or the last entry, or imposing any arbitrary additional conditions:
value=$(awk -F: "/^$key:/{print \$$fieldnum}" /path/to_mapping)
Once bundled in a function it's not that scary:)
I'm afraid there's no better way at least within POSIX. But you may also have a look at join command.

Bash supports arrays, which is not exactly the same. See for example this guide.
area[11]=23
area[13]=37
area[51]=UFOs
echo ${area[11]}

See this LinuxJournal article for Bash >= 4.0. For other versions of Bash you can fake it:
hput () {
eval hash"$1"='$2'
}
hget () {
eval echo '${hash'"$1"'#hash}'
}
# then
hput a blah
hget a # yields blah

Your example is one of several ways to do this using shell tools. Note that cat is unnecessary.
key=foo
fieldnum=3
filename=/path/to/mapping
value=$(grep "^$key:" "$filename" | cut -d':' -f$fieldnum)
Sometimes join comes in handy, too.
AWK, Python, Perl, sed and various XML, JSON and YAML tools as well as databases such as MySQL and SQLite can also be used, of course.
Without using them, everything else can sometimes be convoluted. Unfortunately, there isn't any "standard" utility. I would say that the answer posted by pooh comes closest. AWK is especially adept at dealing with plain-text fields and records.

The answer in this case appears to be: no, there's no widely-available implementation of the ‘passwd’ file format for the general case, and wheel re-invention is necessary in each case.

Related

How to read text between two particular text in unix shell scripting

I want to read text between two particular words from a text file in unix shell scripting.
For example in the following:
"My name is Sasuke Uchiha."
I want to get Sasuke.
This is one of the many ways it can be done:
To capture text between "is" and "Uchiha":
sed -n "s/^.*is \(.*\)Uchiha.*/\1/p" inFile
I'm tempted to add a "let me google that for you" link, but it seems like you're having a hard enough time as is.
What's the best way to find a string/regex match in files recursively? (UNIX)
Take a look at that. It's similar to what you're looking for. Regex is the go to tool for matching strings and such. And Grep is the easiest way to use it from shell in unix.
Take a look at this as well: http://www.robelle.com/smugbook/regexpr.html

bash command splitting elements of xpath

I'm searching for some linux shell command that help me to examine the elements of an XPath.
For example, given:
/a[#e=2]/bee/cee[#e<9]/#dee
I would need three commands returning, respectively:
a
[#e=2]
/bee/cee[#e<9]/#dee
Later I can repeat the process, in order to analyse all the Xpath.
I have tried using sed and regular expressions, but I was not able.
Strange request ... but you can do this pretty easily with sed with the pattern ^/\([-a-zA-Z0-9_]*\)\(\[[^]]*\]\)\(.*\)$. The segments that you requested are the three captures.
zsh% data='/a[#e=2]/bee/cee[#e<9]/#dee'
zsh% pattern='^/\([-a-zA-Z0-9_]*\)\(\[[^]]*\]\)\(.*\)$'
zsh% sed "s#$pattern#\1#" <($data)
a
zsh% sed "s#$pattern#\2#" <($data)
[#e=2]
zsh% sed "s#$pattern#\3#" <($data)
/bee/cee[#e<9]/#dee
zsh%
The pattern is very much specialized for the request. XPath expressions are pretty difficult to rip apart in shell - both cumbersome and expensive. Depending one what you are trying to accomplish, you would probably be better off translating the shell script into either Python, Ruby, or Perl depending on language preference. You might also want to take a look at using zsh instead of Bash. I've found it to be much more capable for advanced scripting.

Text editor to view giant log files

As I have not yet setup some log rotating solution, I have a 3gb (38-million line) log file which I need to find some information in from a certain date. As using cat | grep is horribly slow, and using my current editor (Large Text File Viewer) is equally slow, I was wondering: Is there any text editor that works well with viewing >35-million line log files? I could just use the cat | grep solution and leave it running overnight, but with millions of errors to sort through there has to be a better way.
You might want to try using grep by itself:
grep 2011-04-09 logfile.txt
instead of needlessly using cat:
cat logfile.txt | grep 2011-04-09
When dealing with large amounts of data, this can make a difference.
Interesting reading is a Usenet posting from last year: why GNU grep is fast.
Since you are on Windows, you should really try multiple implementations of grep. Not all implementations of grep are equal. There are some truly awful implementations.
It is not necessary to use cat: Grep can read directly from the log file, unless it is locked against being shared with readers.
grep pattern logfile > tmpfile
should do the trick. Then you can use most any editor to examine the selected records, assuming it is quite selective.
I don't think you're going to get any faster than grep alone (as others have noted, you don't need the cat).
I personally find "more" and "less" are useful (for smaller files). The reason is that sometimes a pattern will get you in the general vicinity of where you want (i.e. a date and time) and then you can scroll through the file at that point.
the "/" is the search command for regular expressions in more.

inserting information into a colon-delimited file

similar to /etc/passwd, I'd like a 6-field, colon-separeted file to store some textual and numerical informaiton. Primarily using BASH, how can I read/write to this file efficiently, by index (0-5, or 1-6)? I will be writing to the file using >, awk, sed, tee and other similar textual manipulation tools.
I'm assuming that I can use read with -d\: to bring the information from a file back into script varaibles ... Any ideas appreciated.
A comment from a helpful reader states:
"You should provide more concrete examples of what you are trying to do..."
I have a backup script that records
starttime
end time
backup type (H/C) hot/cold
backup success (Y/C)
Error Reason (reason backup failed)
extra field
Ideally I'd like to have (empty fields still delimited)
e.g.
16.20:17.55:H:Y::
or
e.g.
17.30:18.45:H:N:files not found:
After wards, I have a script that imports this information to a database, into the correct locations. It's easier to just use indexes to import these data.
you can just use awk for reason of efficiency instead of Bash (or others with ability to split on fields)
To read:
awk -F":" '{print $1,$2,$3,$4,$5,$6}' file
You should provide more concrete examples of what you are trying to do...

How do you convert character case in UNIX accurately? (assuming i18N)

I'm trying to get a feel for how to manipulate characters and character sets in UNIX accurately given the existance of differing locales - and doing so without requiring special tools outside of UNIX standard items.
My research has shown me the problem of the German sharp-s character: one character changes into two - and other problems. Using tr is apparently a very bad idea. The only alternative I see is this:
echo StUfF | perl -n -e "print lc($_);"
but I'm not certain that will work, and it requires Perl - not a bad requirement necessarily, but a very big hammer...
What about awk and grep and sed and ...? That, more or less, is my question: how can I be sure that text will be lower-cased in every locale?
Perl lc/uc works fine for most languages but it won't work with Turkish correctly, see this bug report of mine for details. But if you don't need to worry about Turkish, Perl is good to go.
You can't be sure that text will be correct in every locale. That's not possible, there are always some errors in software libraries regarding implementation of i18n related staff.
If you're not afraid of using C++ or Java, you may take a look at ICU which implement broad set of collation, normalization, etc. rules.

Resources