Hi what is the meaning of IFS with single quote in next line? - ifs

Hi IFS=' ' is for space, but what is
IFS='
'

It means you are specifying the IFS to use newline for splitting. This would be similar to doing:
IFS=$'\n'
The difference being is that your way is POSIX compliant.
My sources for this answer are here and here
You may find that the different methods are preferred depending on which shell implementation you are using (I think that's the right term?)
NOTE: My answer is based purely on the last 10 minutes of research, I have no prior experience or knowledge with this.

Related

sed find and replace variation using ^ instead of / [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 2 years ago.
I stumbled upon this variation of sed today in my company's codebase.
Essentially we want to swap out ssh://git# for https:// for our CI.
I found this syntax pretty strange and have not been able to find documentation for it on google... Could someone link me to some documentation or provide some insight?
I see it written this way, with ^ separating the search and replace strings. This has removed the need for the / separator as well as any escapes. Is this maybe a regex trick?
sed 's^ssh://git#^https://^g' -i terraform.tfvars
It performs the same as this. Which is what I would have written originally. But the one above just seems so much cleaner and more readable.
sed 's/ssh:\/\/git#/https:\/\//g' -i terraform.tfvars
Some insight is greatly appreciated! Thanks in advance.
sed manual:
The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character.

Unix : Optimized command for subsituting words in large file

This question is not related to any code issue. Just need your suggestions.
We have a file which is ~ 100GB and we are applying sed to substitute a few parameters.
This process is taking long time and eating up CPU as well
Can the replacement of sed with awk/tr/perl or any other unix utilities will help in this scenario.
Note:
Any suggestion other than time command.
You can do a couple of things to speed it up:
use fixed pattern matching instead of regexes wherever you can
run sed for example as LANG=C sed '...'
These two are likely to help a lot. Anything else will lead to just minor improvements, even different tools.
About LANG=C - normally the matching is done in whatever encoding your environment is set to which can likely be UTF-8 which causes additional lookups of the UTF-8 characters. If your patterns use just ascii, then definitely go for LANG=C.
Other things that you can try:
if you have to use regexes then use the longest fixed character strings you can - this will allow the regex engine to skip non-matching parts of the file faster (it will skip bigger chunks)
avoid line by line processing if possible - the regex engine will not have to spend time looking for the newline character
Try different AWK's: mawk has been particularly fast for me.

Store and query a mapping in a file, without re-inventing the wheel

If I were using Python, I'd use a dict. If I were using Perl, I'd use a hash. But I'm using a Unix shell. How can I implement a persistent mapping table in a text file, using shell tools?
I need to look up mapping entries based on a string key, and query one of several fields for that key.
Unix already has colon-separated records for mappings like the system passwd table, but there doesn't appear to be a tool for reading arbitrary files formatted in this manner. So people resort to:
key=foo
fieldnum=3
value=$(cat /path/to/mapping | grep "^$key:" | cut -d':' -f$fieldnum)
but that's pretty long-winded. Surely I don't need to make a function to do that? Hasn't this wheel already been invented and implemented in a standard tool?
Given the conditions, I don't see anything hairy in the approach. But maybe consider awk to extract data. awk approach allows for picking only the first, or the last entry, or imposing any arbitrary additional conditions:
value=$(awk -F: "/^$key:/{print \$$fieldnum}" /path/to_mapping)
Once bundled in a function it's not that scary:)
I'm afraid there's no better way at least within POSIX. But you may also have a look at join command.
Bash supports arrays, which is not exactly the same. See for example this guide.
area[11]=23
area[13]=37
area[51]=UFOs
echo ${area[11]}
See this LinuxJournal article for Bash >= 4.0. For other versions of Bash you can fake it:
hput () {
eval hash"$1"='$2'
}
hget () {
eval echo '${hash'"$1"'#hash}'
}
# then
hput a blah
hget a # yields blah
Your example is one of several ways to do this using shell tools. Note that cat is unnecessary.
key=foo
fieldnum=3
filename=/path/to/mapping
value=$(grep "^$key:" "$filename" | cut -d':' -f$fieldnum)
Sometimes join comes in handy, too.
AWK, Python, Perl, sed and various XML, JSON and YAML tools as well as databases such as MySQL and SQLite can also be used, of course.
Without using them, everything else can sometimes be convoluted. Unfortunately, there isn't any "standard" utility. I would say that the answer posted by pooh comes closest. AWK is especially adept at dealing with plain-text fields and records.
The answer in this case appears to be: no, there's no widely-available implementation of the ‘passwd’ file format for the general case, and wheel re-invention is necessary in each case.

How do you convert character case in UNIX accurately? (assuming i18N)

I'm trying to get a feel for how to manipulate characters and character sets in UNIX accurately given the existance of differing locales - and doing so without requiring special tools outside of UNIX standard items.
My research has shown me the problem of the German sharp-s character: one character changes into two - and other problems. Using tr is apparently a very bad idea. The only alternative I see is this:
echo StUfF | perl -n -e "print lc($_);"
but I'm not certain that will work, and it requires Perl - not a bad requirement necessarily, but a very big hammer...
What about awk and grep and sed and ...? That, more or less, is my question: how can I be sure that text will be lower-cased in every locale?
Perl lc/uc works fine for most languages but it won't work with Turkish correctly, see this bug report of mine for details. But if you don't need to worry about Turkish, Perl is good to go.
You can't be sure that text will be correct in every locale. That's not possible, there are always some errors in software libraries regarding implementation of i18n related staff.
If you're not afraid of using C++ or Java, you may take a look at ICU which implement broad set of collation, normalization, etc. rules.

What is the best character to use as a delimiter in a custom batch syntax?

I've written a little program to download images to different folders from the web. I want to create a quick and dirty batch file syntax and was wondering what the best delimiter would be for the different variables.
The variables might include urls, folder paths, filenames and some custom messages.
So are there any characters that cannot be used for the first three? That would be the obvious choice to use as a delimiter. How about the good old comma?
Thanks!
You can use either:
A Control character: Control characters don't appear in files. Tab (\t) is probably the best choice here.
Some combination of characters which is unlikely to occur in your files. For e.g. #s# etc.
Tab is the generally preferred choice though.
Why not just use something that exists already? There are one or two choices, perl, python, ruby, bash, sh, csh, Groovy, ECMAscript, heavens for forbid windows scripting files.
I can't see what you'd gain by writing yet another batch file syntax.
Tabs. And then expand or compress any tabs found in the text.
Choose a delimiter that has the least chance of collision with the names of any variable that you may have (which precludes #, /, : etc). The comma (,) looks good to me (unless your custom message has a few) or < and > (subject to previous condition).
However, you may also need to 'escape' delimiter characters occurring as part of the variables you want to delimit.
This sounds like a really bad idea. There is no need to create yet another (data-representation) language, there are plenty ones which might fit your needs. In addition to Ruby, Perl, etc., you may want to consider YAML.
Designing good syntax for these sort of this is difficult and fraught with peril. Does reinventing the wheel ring a bell?
I would use '|'
It's one of the rarest characters.
How about String.fromCharCode(1) ?

Resources