grep all lines that have at least X words in them - bash

I have a script named a.sh that produces an output - some lines of text.
I have another script named b.sh and I'd like to take the output of a.sh and hold it in a variable.
or even better to pipe it immidiately and remove all lines that are too short - meaning all lines that have less than X amount of words.
each word is seperated by a space(or multiple spaces)
how can I do that?

I would pipe the script awk and let it count words: awk '{ if (NF>4) { print }}'
Awk's default field separator separates the line into words. This means that if the number of fields (NF) is more than (>) 4 awk will prints the line.
It can be shortened to awk 'NF>4' since awk's default action is to print.
An alternative approach would be to use wc (since it literally stands for word count). You could use it in the b script like this:
while read line; do
if [[ $(wc -w <<< "$line") -gt 4 ]]
then
echo $line
fi
done

inside b.sh:
./a.sh "$1" "$2" "$3" | awk -v COUNT=$4 'NF>=COUNT'
I still haven't been able to hold the output in a variable but this worked for the parsing

Related

print lines where the third character is a digit

for example our bash script's name is masodik and there is a text.txt with these lines:
qwer
qw2qw
12345
qwert432
Then I write ./masodik text.txt and i got
qw2qw
12345
I tried it many ways and I dont know why this is not working
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
$ grep -E '^.{2}[0-9]' text.txt
qw2qw
12345
, and in script it could be something like:
#!/bin/sh
grep -E '^.{2}[0-9]' "$1"
To print lines whose third character is a digit:
grep ^..[0-9] text.txt
^ matches the start of the line. The dot . matches any character. [0-9] matches any digit.
You can do it with awk quite easily as well:
awk '/^..[0-9]/' file
Result
With your input in file:
$ awk '/^..[0-9]/' file
qw2qw
12345
(sed works as well, sed -n '/^..[0-9]/p' file)
The problem with the code here:
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
...is that the for syntax is wrong:
read u is treated as a word list. So the $u variable is never set, so $u stays empty.
The for loop will run twice -- the 1st time $i will be set to the string "read", the 2nd time $i will be set to the string "u". Since neither string contains a number, the grep returns nothing.
The code never reads text.txt.
See Sasha Khapyorsky's answer for actual working code.
If for some odd reason all external utils, (grep, awk, etc.), are forbidden, this pure POSIX code would work:
#!/bin/sh
while read u ; do
case "$u" in
[a-zA-Z0-9][a-zA-Z0-9][0-9]*) echo "$u" ;;
esac
done
If perl is installed into the system then shell script will look like
#!/bin/bash
perl -e 'print if /^.{2}\d/' text.txt

Extract first word in colon separated text file

How do i iterate through a file and print the first word only. The line is colon separated. example
root:01:02:toor
the file contains several lines. And this is what i've done so far but it does'nt work.
FILE=$1
k=1
while read line; do
echo $1 | awk -F ':'
((k++))
done < $FILE
I'm not good with bash-scripting at all. So this is probably very trivial for one of you..
edit: variable k is to count the lines.
Use cut:
cut -d: -f1 filename
-d specifies the delimiter
-f specifies the field(s) to keep
If you need to count the lines, just
count=$( wc -l < filename )
-l tells wc to count lines
awk -F: '{print $1}' FILENAME
That will print the first word when separated by colon. Is this what you are looking for?
To use a loop, you can do something like this:
$ cat test.txt
root:hello:1
user:bye:2
test.sh
#!/bin/bash
while IFS=':' read -r line || [[ -n $line ]]; do
echo $line | awk -F: '{print $1}'
done < test.txt
Example of reading line by line in bash: Read a file line by line assigning the value to a variable
Result:
$ ./test.sh
root
user
A solution using perl
%> perl -F: -ane 'print "$F[0]\n";' [file(s)]
change the "\n" to " " if you don't want a new line printed.
You can get the first word without any external commands in bash like so:
printf '%s' "${line%%:*}"
which will access the variable named line and delete everything that matches the glob :* and do so greedily, so as close to the front (that's the %% instead of a single %).
Though with this solution you do need to do the loop yourself. If this is the only thing you want to do with the variable the cut solution is better so you don't have to do the file iteration yourself.

How to use tab separators with grep in ash or dash script?

Task at hand:
I have a file with four tab separated values:
peter 123 five apples
jane 1234 four rubberducks
jimmy 01234 seven nicknames
I need to get a line out of this file based on second column, and the value is in a variable. Let's assume I have number 123 stored in a variable foo. In bash I can do
grep $'\s'$foo$'\s'
and I get out of peter's info and nothing else. Is there a way to achieve the same on dash or ash?
You can use awk here:
var='1234'
awk -v var="$var" '$2 == var ""' f
jane 1234 four rubberducks
PS: I am doing var "" to make sure var is treated as a string instead of as a number.
If your file is small enough that the inefficiency of doing iteration in a shell doesn't matter, you don't actually need grep for this at all. The following is valid in any POSIX-compliant shell, including ash or dash:
var=123
while read -r first second rest; do
if [ "$second" = "$var" ]; then
printf '%s\t' "$first" "$second"; printf '%s\n' "$rest"
fi
done
(In practice, I'd probably use awk here; consider the demonstration just that).

Use sed te extract ascii hex string from a single line in a file

I have a file that looks like this:
some random
text
00ab46f891c2emore random
text
234324fc234ba253069
and yet more text
only one line in the file contains only hex characters (234324fc234ba253069), how do I extract that? I tried sed -ne 's/^\([a-f0-9]*\)$/\1/p' file I used line start and line end (^ and &) as delimiters, but I am obviously missing something...
Grep does the job,
$ grep '^[a-f0-9]\+$' file
234324fc234ba253069
Through awk,
$ awk '/^[a-f0-9]+$/{print}' file
234324fc234ba253069
Based on the search pattern given, awk and grep prints the matched line.
^ # start
[a-f0-9]\+ # hex characters without capital A-F one or more times
$ # End
sed can make it:
sed -n '/^[a-f0-9]*$/p' file
234324fc234ba253069
By the way, your command sed -ne 's/^\([a-f0-9]*\)$/\1/p' file is working to me. Note, also, that it is not necessary to use \1 to print back. It is handy in many cases, but now it is too much because you want to print the whole line. Just sed -n '/pattern/p' does the job, as I indicate above.
As there is just one match in the whole file, you may want to exit once it is found (thanks NeronLeVelu!):
sed -n '/^[a-f0-9]*$/{p;q}' file
Another approach is to let printf decide when the line is hexadecimal:
while read line
do
printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"
done < file
Based on Hexadecimal To Decimal in Shell Script, printf "%f" 0xNUMBER executes successfully if the number is indeed hexadecimal. Otherwise, it returns an error.
Hence, using printf ... >/dev/null 2>&1 && echo "$line" does not let printf print anything (redirects to /dev/null) but then prints the line if it was hexadecimal.
For your given file, it returns:
$ while read line; do printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"; done < a
234324fc234ba253069
Using egrep you can restrict your regex to select lines that only match valid hex characters i.e. [a-fA-F0-9]:
egrep '^[a-fA-F0-9]+$' file
234324fc234ba253069

variable reference in a sed expression in a while loop

I have been working on a KornShell (ksh) script where I am struck with an error with sed expression.
I have a file named abc with 100 entries and I want to assign every 8th argument in every line of abc file to a variable.
I have used something like this.
#!/bin/ksh
typeset -i x=1
while read line ; do
var1=$(sed -n '$xp' abc.txt | awk '{print $8}')
print $var1
x="$x+1"
done < abc.txt
exit
I want to refer to variable x as the line number, but I am getting an error with sed expression in referencing x variable. Please help me out.
Your quotes are wrong. Anything in single quotes is a verbatim string; if you want variable interpolation, you need to use double quotes (or, in very special circumstances, no quoting at all).
You might as well refactor everyting into Awk, too. Trivially,
var1=$(awk -v n="$x" 'NR==n{ print $8 }' abc.txt)
However, the main loop reading the whole file again just to get one line out of it is highly inefficient. Maybe you want something like
awk '{ print NR, $8 }' abc.txt |
while read x var1; do
print "$var1"
# presumably do something with $x too?
done

Resources