bash: setting tab field separator with variable - bash

Some bash tools such as sort, join, cut (all coreutils?) require field separator to be passed in a somewhat peculiar way for tabs: sort -t $'\t' .... There are many questions here that address this behavior.
My problem is I am trying to pass the field separator as a variable, such as:
SEP="\t"
sort -t $SEP ...
With normal characters, that works, but not with tabs. I tried a few variations, but none of them work. How can this be done?

Declare it using ANSI-C quoting:
sep=$'\t'
And call it as "$sep", quotes are important to preserve the literal meaning:
sort -t "$sep" file.txt
Example:
$ cat file.txt
foo bar
spam egg
abc def
$ sep=$'\t'
$ sort -t $sep file.txt
sort: multi-character tab ‘file.txt’
$ sort -t "$sep" file.txt
abc def
foo bar
spam egg
Also note that, to get rid of the ambiguity with the environment variables i have used lowercase characters for the variable name, unless you have a very good reason you should do so too.

Use the keys [CONTROL]+[V] before pressing [TAB] to introduce the tab char.
echo "a b c" |cut -d" " -f2
b
Be careful if you copy paste the code, as the tabs may be lost, as they are lost in fact, in this post :-)

Related

Colorize specific strings in a text

I would like to highlight a few strings while outputting a text file. For example the literals [2], quick and lazy in:
... => any number of lines with non-matching content
He’s quick but lazy.
...
• The future belongs to those who believe in the beauty of their dreams [2].
...
I’m lazy but quick (2 times faster); is there a difference when "lazy" comes before "quick"?
...
My intuitive approach would be to use grep for the colorization (in fact I'm not fixed on any specific tool):
grep -F -e '[2]' -e 'quick' -e 'lazy' --color file.txt
But it has two problems:
It filters out the lines that don't match while I want to include them in the output.
It doesn't highlight all the matching strings; it seems like the order in which the -e expressions are provided matters (problem noticed with macOS grep).
My expected output (with <...> standing for the colorization) would be:
... => any number of lines with non-matching content
He’s <quick> but <lazy>.
...
• The future belongs to those who believe in the beauty of their dreams <[2]>.
...
I’m <lazy> but <quick> (2 times faster); is there a difference when "<lazy>" comes before "<quick>"?
...
grep -n -F -e '[2]' -e 'quick' -e 'lazy' --color=always file.txt |
awk -F':' '
FILENAME==ARGV[1] { n=substr($1,9,length($1)-22); sub(/[^:]+:/,""); a[n]=$0; next }
{ print (FNR in a ? a[FNR] : $0) }
' - file.txt
would use grep to find and highlight the strings, and then awk would print the grep output for those lines and the original lines from the input file otherwise.
UPDATE
I found a way using grep -E instead of grep -F. As a side-effect, matching a literal string will require its ERE-escaping.
The method is to build a single regex composed of the union of the search strings plus an additional $ anchor (for selecting the "non-matching" lines).
Hence, for highlighting the literals [2], quick and lazy in the sample text, you can use:
grep -E '\[2]|quick|lazy|$' --color file.txt
edit: I replaced the ^ anchor with the $ one because on macOS:
grep -E '\[2]|quick|lazy|^' --color doesn't highlight any word
grep -E -e '\[2]|quick|lazy' -e '^' --color SEGFAULTS !!!

Using sed with a regex to replace strings

I want to replace some string which contain specific words with another word.
Here is my code
#!/bin/bash
arr='foo/foo/baz foo/bar/baz foo/baz/baz';
for i in ${arr[#]}; do
echo $i | sed -e 's|foo/(bar\|baz)/baz|test|g'
done
Result
foo/foo/baz
foo/bar/baz
foo/baz/baz
Expected
foo/foo/baz
foo/test/baz
foo/test/baz
There are several things you can improve. The reason you are using the alternate delimiters '|' for the sed substitution expression (to avoid the "picket fence" appearance of \/\/\/ complicates the use of '|' as the OR (alternative) regex component. Choose an alternative delimiter that does not also server as part of the regular expression, '#' works fine.
Next there is no reason to loop, simply use a here string to redirect the contents of arr to sed and place it all in a command substitution with the "%s\n" format specifier to provide the newline separated output. (that's a mouthful, but it is actually nothing more than)
arr='foo/foo/baz foo/bar/baz foo/baz/baz'
printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr))
Example Use/Output
To test it out, just select the expressions above and middle-mouse paste the selection into your terminal, e.g.
$ arr='foo/foo/baz foo/bar/baz foo/baz/baz'
> printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr)
foo/foo/baz
foo/test/baz
foo/test/baz
Look things over and let me know if you have further questions.
How about something like this:
sed -e 's/\(bar\|baz\)\//test\//g'

Replace tab for newline and remove newlines together with SED

I have tabs (five) between these words:
cat dog
and I want this output:
cat
dog
I tried this: sed 's/\t/\n/g; /^$/d' pets
and the output was the same as with sed 's/\t/\n/g' pets:
cat
dog
I had to execute sed two times to get what I wanted. Like this sed 's/\t/\n/g' pets>temp after sed '/^$/d' temp
Is there a way to get the desired output with one command?
Continuing from my comment. The problem with sed 's/\t/\n/g is you will replace each '\t' with a '\n'. You want to replace a sequence of tabs with a single newline. For that you need:
sed 's/\t\t*/\n/g'
or if you like explicitly enclosing the '\t' in [ ] that's fine as well.
The expression '\t' matches a single tab, when followed by '\t\t*' it matches a tab and zero or more tabs that follow replacing the sequence with a single '\n', resulting in your desired output:
cat
dog
The g (globally) at the end will just replace each sequence with a single newline, e.g.
"cat dog fish"
(separated by tabs), becomes
cat
dog
fish
Let me know if you have any questions.
If the number of tabs is important, BRE will let you specify bounds by escaping the curly braces, like this:
$ sed $'s/\t\{5\}/\\\n/' <<<$'one\t\t\t\t\ttwo'
one
two
Note that you haven't specified an operating system so it's unknown whether you're using GNU sed, which would let you include things like \t in your regex. (I use FreeBSD and macOS, where sed does not have this capability.) But you HAVE mentioned that you're using bash, which supports "format expansion". You can use this bash feature to insert "literal" special characters into your script.

Remove certain lines in string? Shell

Lets say I have a variable containing the following string (separated by blank spaces but when echoed it looks like lines):
animal:whale
animal:dog
animal:2_mice
animal:cat
And I have another variable containing the names of the animals I want to delete from the first string separated by spaces.
Lets say names="cat dog"
How would I go about deleting the lines that contain cat and dog from the first string?
I've tried looking up sed and grep approaches but haven't found any info that would work so far.
I should also point out that the expression needs to match exactly, so if the second variable contained just ca instead of cat, cat should not get deleted.
echo "$variable" | grep -Ev ":($(echo "$names" | sed 's/ /|/g'))$"
Using grep inverse search, assumes values in names variable is space separated and one word each:
$ echo "$var"
animal:whale
animal:dog
animal:2_mice
animal:cat
$ echo "$names"
cat dog
$ echo "$var" | grep -v $(printf ' -e :%s$' $names)
animal:whale
animal:2_mice

How can I strip first X characters from string using sed?

I am writing shell script for embedded Linux in a small industrial box. I have a variable containing the text pid: 1234 and I want to strip first X characters from the line, so only 1234 stays. I have more variables I need to "clean", so I need to cut away X first characters and ${string:5} doesn't work for some reason in my system.
The only thing the box seems to have is sed.
I am trying to make the following to work:
result=$(echo "$pid" | sed 's/^.\{4\}//g')
Any ideas?
The following should work:
var="pid: 1234"
var=${var:5}
Are you sure bash is the shell executing your script?
Even the POSIX-compliant
var=${var#?????}
would be preferable to using an external process, although this requires you to hard-code the 5 in the form of a fixed-length pattern.
Here's a concise method to cut the first X characters using cut(1). This example removes the first 4 characters by cutting a substring starting with 5th character.
echo "$pid" | cut -c 5-
Use the -r option ("use extended regular expressions in the script") to sed in order to use the {n} syntax:
$ echo 'pid: 1234'| sed -r 's/^.{5}//'
1234
Cut first two characters from string:
$ string="1234567890"; echo "${string:2}"
34567890
pipe it through awk '{print substr($0,42)}' where 42 is one more than the number of characters to drop. For example:
$ echo abcde| awk '{print substr($0,2)}'
bcde
$
Chances are, you'll have cut as well. If so:
[me#home]$ echo "pid: 1234" | cut -d" " -f2
1234
Well, there have been solutions here with sed, awk, cut and using bash syntax. I just want to throw in another POSIX conform variant:
$ echo "pid: 1234" | tail -c +6
1234
-c tells tail at which byte offset to start, counting from the end of the input data, yet if the the number starts with a + sign, it is from the beginning of the input data to the end.
Another way, using cut instead of sed.
result=`echo $pid | cut -c 5-`
I found the answer in pure sed supplied by this question (admittedly, posted after this question was posted). This does exactly what you asked, solely in sed:
result=\`echo "$pid" | sed '/./ { s/pid:\ //g; }'\``
The dot in sed '/./) is whatever you want to match. Your question is exactly what I was attempting to, except in my case I wanted to match a specific line in a file and then uncomment it. In my case it was:
# Uncomment a line (edit the file in-place):
sed -i '/#\ COMMENTED_LINE_TO_MATCH/ { s/#\ //g; }' /path/to/target/file
The -i after sed is to edit the file in place (remove this switch if you want to test your matching expression prior to editing the file).
(I posted this because I wanted to do this entirely with sed as this question asked and none of the previous answered solved that problem.)
Rather than removing n characters from the start, perhaps you could just extract the digits directly. Like so...
$ echo "pid: 1234" | grep -Po "\d+"
This may be a more robust solution, and seems more intuitive.
This will do the job too:
echo "$pid"|awk '{print $2}'

Resources