How to use tab separators with grep in ash or dash script? - bash

Task at hand:
I have a file with four tab separated values:
peter 123 five apples
jane 1234 four rubberducks
jimmy 01234 seven nicknames
I need to get a line out of this file based on second column, and the value is in a variable. Let's assume I have number 123 stored in a variable foo. In bash I can do
grep $'\s'$foo$'\s'
and I get out of peter's info and nothing else. Is there a way to achieve the same on dash or ash?

You can use awk here:
var='1234'
awk -v var="$var" '$2 == var ""' f
jane 1234 four rubberducks
PS: I am doing var "" to make sure var is treated as a string instead of as a number.

If your file is small enough that the inefficiency of doing iteration in a shell doesn't matter, you don't actually need grep for this at all. The following is valid in any POSIX-compliant shell, including ash or dash:
var=123
while read -r first second rest; do
if [ "$second" = "$var" ]; then
printf '%s\t' "$first" "$second"; printf '%s\n' "$rest"
fi
done
(In practice, I'd probably use awk here; consider the demonstration just that).

Related

grep all lines that have at least X words in them

I have a script named a.sh that produces an output - some lines of text.
I have another script named b.sh and I'd like to take the output of a.sh and hold it in a variable.
or even better to pipe it immidiately and remove all lines that are too short - meaning all lines that have less than X amount of words.
each word is seperated by a space(or multiple spaces)
how can I do that?
I would pipe the script awk and let it count words: awk '{ if (NF>4) { print }}'
Awk's default field separator separates the line into words. This means that if the number of fields (NF) is more than (>) 4 awk will prints the line.
It can be shortened to awk 'NF>4' since awk's default action is to print.
An alternative approach would be to use wc (since it literally stands for word count). You could use it in the b script like this:
while read line; do
if [[ $(wc -w <<< "$line") -gt 4 ]]
then
echo $line
fi
done
inside b.sh:
./a.sh "$1" "$2" "$3" | awk -v COUNT=$4 'NF>=COUNT'
I still haven't been able to hold the output in a variable but this worked for the parsing

print lines where the third character is a digit

for example our bash script's name is masodik and there is a text.txt with these lines:
qwer
qw2qw
12345
qwert432
Then I write ./masodik text.txt and i got
qw2qw
12345
I tried it many ways and I dont know why this is not working
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
$ grep -E '^.{2}[0-9]' text.txt
qw2qw
12345
, and in script it could be something like:
#!/bin/sh
grep -E '^.{2}[0-9]' "$1"
To print lines whose third character is a digit:
grep ^..[0-9] text.txt
^ matches the start of the line. The dot . matches any character. [0-9] matches any digit.
You can do it with awk quite easily as well:
awk '/^..[0-9]/' file
Result
With your input in file:
$ awk '/^..[0-9]/' file
qw2qw
12345
(sed works as well, sed -n '/^..[0-9]/p' file)
The problem with the code here:
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
...is that the for syntax is wrong:
read u is treated as a word list. So the $u variable is never set, so $u stays empty.
The for loop will run twice -- the 1st time $i will be set to the string "read", the 2nd time $i will be set to the string "u". Since neither string contains a number, the grep returns nothing.
The code never reads text.txt.
See Sasha Khapyorsky's answer for actual working code.
If for some odd reason all external utils, (grep, awk, etc.), are forbidden, this pure POSIX code would work:
#!/bin/sh
while read u ; do
case "$u" in
[a-zA-Z0-9][a-zA-Z0-9][0-9]*) echo "$u" ;;
esac
done
If perl is installed into the system then shell script will look like
#!/bin/bash
perl -e 'print if /^.{2}\d/' text.txt

Error in assigning awk variable to bash variable

Variable b has a string. Awk retrieves a substring which I want to assign to variable c. This is what I did:
#!/bin/bash
b=$(llsubmit multiple.cmd)
echo $b | c=$(awk '{
ret=match($0,".in.")
rwt=match($0,"\" has")
rqt=rwt-(ret+4)
subs=substr($0,(ret+4),rqt)
}')
... but I get a blank output for echo $c:
You can't pipe into an assignment.
c=$(echo "$b" | awk '{
ret=match($0,".in.")
rwt=match($0,"\" has")
rqt=rwt-(ret+4)
subs=substr($0,(ret+4),rqt)
}')
(Notice also the quoting around $b.)
But your Awk script looks rather complex. And it doesn't produce any output. Should it print something at the end? Without access to sample output from llsubmit this is mildly speculative, but I'm guessing something like this could work:
c=$(echo "b" | sed -n 's/.*\(\.in\.[^"]*\)" has .*/\1/p')
(Notice also the backslashes to make the dots match literally.)
You should properly then use double quotes in echo "$c" too (unless you are completely sure that the output cannot contain any shell metacharacters).
... And, of course, very often you don't want or need to store results in a variable in shell scripts if you can refactor your code into a pipeline. Perhaps you are really looking for something like
llsubmit multiple.cmd |
sed -n 's/.*\(\.in\.[^"]\)" has .*/p' |
while read -r job; do
: things with "$job"
done
It's hard to tell from your question since you didn't provide sample input and expected output but is this what you're trying to do:
$ b='foo .in.bar has'
$ c="${b% has*}"
$ c="${c#*.in.}"
$ echo "$c"
bar

Extract first three elements from an URL with a Regular expression

Given the following URL:
http://www.example.com/path1/path2/page
Is there a simple way to extract the first three blocks of it with a regular expression, that is:
http://www.example.com/path1/path2
I've found some examples how to do it with some coding (perl/javascript) however I'd really appreciate if somebody pointed me to a sed/awk example which uses a regular expression to do it.
Thanks
Solution 1st: With simple parameter expansion.
echo "${val%/*}"
Solution 2nd: with awk.
echo "$val" | awk 'match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}'
Solution 3rd: With one more awk.
echo "$val" | awk -F"/" 'NF--;1' OFS="/"
Solution 4th: With sed.
echo "$val" | sed 's/\(.*\/\).*/\1/;s/\/$//'
to extract the first three blocks (as opposed to for example remove last block) of it with a regular expression using Bash regex:
$ [[ "$var" =~ ^(https?://)?([^/]+/){0,3} ]] && echo $BASH_REMATCH
http://www.example.com/path1/path2/
Explained:
^(https?://)? Don't worry about that
([^/]+/){0,3} 0 to 3 blocks matched to output
It supports for example:
$ var=https://www.example.com/path1/path2/page
https://www.example.com/path1/path2/
$ var=www.example.com/path1/path2/page
www.example.com/path1/path2/
$ var=www.example.com/path1/
www.example.com/path1/

How can I strip first X characters from string using sed?

I am writing shell script for embedded Linux in a small industrial box. I have a variable containing the text pid: 1234 and I want to strip first X characters from the line, so only 1234 stays. I have more variables I need to "clean", so I need to cut away X first characters and ${string:5} doesn't work for some reason in my system.
The only thing the box seems to have is sed.
I am trying to make the following to work:
result=$(echo "$pid" | sed 's/^.\{4\}//g')
Any ideas?
The following should work:
var="pid: 1234"
var=${var:5}
Are you sure bash is the shell executing your script?
Even the POSIX-compliant
var=${var#?????}
would be preferable to using an external process, although this requires you to hard-code the 5 in the form of a fixed-length pattern.
Here's a concise method to cut the first X characters using cut(1). This example removes the first 4 characters by cutting a substring starting with 5th character.
echo "$pid" | cut -c 5-
Use the -r option ("use extended regular expressions in the script") to sed in order to use the {n} syntax:
$ echo 'pid: 1234'| sed -r 's/^.{5}//'
1234
Cut first two characters from string:
$ string="1234567890"; echo "${string:2}"
34567890
pipe it through awk '{print substr($0,42)}' where 42 is one more than the number of characters to drop. For example:
$ echo abcde| awk '{print substr($0,2)}'
bcde
$
Chances are, you'll have cut as well. If so:
[me#home]$ echo "pid: 1234" | cut -d" " -f2
1234
Well, there have been solutions here with sed, awk, cut and using bash syntax. I just want to throw in another POSIX conform variant:
$ echo "pid: 1234" | tail -c +6
1234
-c tells tail at which byte offset to start, counting from the end of the input data, yet if the the number starts with a + sign, it is from the beginning of the input data to the end.
Another way, using cut instead of sed.
result=`echo $pid | cut -c 5-`
I found the answer in pure sed supplied by this question (admittedly, posted after this question was posted). This does exactly what you asked, solely in sed:
result=\`echo "$pid" | sed '/./ { s/pid:\ //g; }'\``
The dot in sed '/./) is whatever you want to match. Your question is exactly what I was attempting to, except in my case I wanted to match a specific line in a file and then uncomment it. In my case it was:
# Uncomment a line (edit the file in-place):
sed -i '/#\ COMMENTED_LINE_TO_MATCH/ { s/#\ //g; }' /path/to/target/file
The -i after sed is to edit the file in place (remove this switch if you want to test your matching expression prior to editing the file).
(I posted this because I wanted to do this entirely with sed as this question asked and none of the previous answered solved that problem.)
Rather than removing n characters from the start, perhaps you could just extract the digits directly. Like so...
$ echo "pid: 1234" | grep -Po "\d+"
This may be a more robust solution, and seems more intuitive.
This will do the job too:
echo "$pid"|awk '{print $2}'

Resources