How can I combine multiple regular expression conditions in a single awk command? [closed] - bash

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Type the shell command, which says how many times it has a whitespace in a particular file, and that it consists of at least four numbers (for example: "" 1945 "").
When I tried to solve the above exercise, I could not reach the result that I wanted, I want your help in this subject.
First of all, I created a txt file and filled it with random numbers. - sign represents spaces.
---234352432-
-123---
-12342---
-1-
-12345-
122333
I made a code to find the count of the numbers with more than 4 digits and has spaces in front of and behind of numbers.
cat text1.txt | awk '/^[[:space:]]&&[0-9]{4,}&&[[:space:]]$/' | awk 'END {print NR}'
returned 0
cat text1.txt | awk '/^" "/' | awk '/[0-9] {4, }/' | awk '/" "$/' | awk '{print NR}'
returned 6

this might be easier with grep
$ grep -Ec '\s[0-9]{4,}\s' file
3
to verify the matches
$ grep -E '\s[0-9]{4,}\s' file | tr ' ' '-'
---234352432--
-12342---
-12345-

To match a line that starts with white space then has 4 or more contiguous digits then white space to the end of the line:
$ awk '/^[[:space:]]+[0-9]{4,}[[:space:]]+$/{c++} END{print c+0}' file
3
To match a line that starts with white space, ends with white space and contains 4 or more contiguous digits somewhere on the line:
$ awk '/^[[:space:]]+/ && /[0-9]{4,}/ && /[[:space:]]+$/{c++} END{print c+0}' file
3
They'll behave the same with the input you provided but try them both with:
3 foo 12345 bar 7
for example (where that line has blanks at the start and end).
You never need to cat a file into a pipe to awk (or any other command), nor do you need a pipeline of multiple awk commands (nor pipes of awk+sed+grep, etc.) so if you ever find yourself doing any of that know you're using the wrong approach.

$ awk '{for(i=1; i<=NF; i++) {if($i ~ /^[0-9]/&&$i>999) {print $i}}}' text1.txt >> text2.txt | awk 'END {print NR}' text2.txt
That worked on my case. Thank u for everything

Related

I need to filter only duplicated lines from many files using bash [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 2 years ago.
I have the following three files
filea
a
bc
cde
fileb
a
bc
cde
frtdff
filec
a
bc
cddeeer
erer34
I am able to filter by the duplicated lines from these three files.
I am using the following command
ls file* | wc -l
which returns 3. Then, I am launching
sort file* | uniq --count --repeated | awk '{ if ($1 == 3) { print $2} }'
The last command returns precisely what I need, only in case I am not creating more files starting with "file".
But, in case I have thousands of files that need to be created during the time a script is running , I should get an exact number of files coming retrieved from this command
n=`ls file* | wc -l`
sort file* | uniq --count --repeated | awk '{ if ($1 == $n) { print $2} }'
Unfortunately, variable n is not accepted inside the if condition of the awk command.
My issue is that I am not able to use the value of the variable n as a comparison criteria inside an if conditional that is part of awk command.
You can use:
awk '!line[$0]++' file*
This will print only once any string even if present in several files and or in same file.

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

read values of txt file from bash [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 5 years ago.
I'm trying to read values from a text file.
I have test1.txt which looks like
sub1 1 2 3
sub8 4 5 6
I want to obtain values '1 2 3' when I specify 'sub1'.
The closest I get is:
subj="sub1"
grep "$subj" test1.txt
But the answer is:
sub8 4 5 6
I've read that grep gives you the next line to the match, so I've tried to change the text file to the following:
test2.txt looks like:
sub1
1 2 3
sub8
4 5 6
However, when I type
grep "$subj" test2.txt
The answer is:
sub1
It should be something super simple but I've tried awk, seg, grep,egrep, cat and none is working...I've also read some posts somehow related but none was really helpful
Awk works: awk '$1 == "'"$subj"'" { print $2, $3, $4 }' test1.txt
The command outputs fields two, three, and four for all lines in test1.txt where the first field is $subj (i.e.: the contents of the variable named subj).
With your original text file format:
target=sub1
while IFS=$' \t\n' read -r key values; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
This requires no external tools, using only functionality built into bash itself. See BashFAQ #1.
As has come up during debugging in comments, if you have a traditional Apple-format text file (CR newlines only), then you might want something more like:
target=sub1
while IFS=$' \t\n' read -r -d $'\r' key values || [[ $key ]]; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
Alternately, using awk (for a standard UNIX text file):
target="sub1"
awk -v target="$target" '$1 == target { $1 = ""; print; }' <test1.txt
...or, for a file with CR-only newlines:
target="sub1"
tr '\r' '\n' <test1.txt | awk -v target="$target" '$1 == target { $1 = ""; print; }'
This version will be slower if the text file being read is small (since awk, like any other external tool, takes time to start up); but faster if it's large (since awk's operation is much faster than that of bash's built-ins once it's done starting up).
grep "sub1" test1.txt | cut -c6-
or
grep -A 1 "sub1" test2.txt | tail -n 1
You doing it right, but it seems like test1.txt has a wrong value in it.
with grep foo you get all lines with foo in it. use grep -m1 foo to find the first line with foo in it only.
then you can use cut -d" " -f2- to get all the values behind foo, while seperated by empty spaces.
In the end the command would look like this ...
$ subj="sub1"
$ grep -m1 "$subj" test1.txt | cut -d" " -f2-
But this doenst explain why you could not find sub1 in the first place.
Did you read the proper file ?
There's a bunch of ways to do this (and shorter/more efficient answers than what I'm giving you), but I'm assuming you're a beginner at bash, and therefore I'll give you something that's easy to understand:
egrep "^$subj\>" file.txt | sed "s/^\S*\>\s*//"
or
egrep "^$subj\>" file.txt | sed "s/^[^[:blank:]]*\>[[:blank:]]*//"
The first part, egrep, will search for you subject at the beginning of the line in file.txt (that's what the ^ symbol does in the grep string). It also is looking for a whole word (the \> is looking for an end of word boundary -- that way sub1 doesn't match sub12 in the file.) Notice you have to use egrep to get the \>, as grep by default doesn't recognize that escape sequence. Once done finding the lines, egrep then passes it's output to sed, which will strip the first word and trailing whitespace off of each line. Again, the ^ symbol in the sed command, specifies it should only match at the beginning of the line. The \S* tells it to read as many non-whitespace characters as it can. Then the \s* tells sed to gobble up as many whitespace as it can. sed then replaces everything it matched with nothing, leaving the other stuff behind.
BTW, there's a help page in Stack overflow that tells you how to format your questions (I'm guessing that was the reason you got a downvote).
-------------- EDIT ---------
As pointed out, if you are on a Mac or something like that you have to use [:alnum:] instead of \S, and [:blank:] instead of \s in your sed expression (as these are portable to all platforms)
awk '/sub1/{ print $2,$3,$4 }' file
1 2 3
What happens? After regexp /sub1/ the three following fields are printed.
Any drawbacks? It affects the space.
Sed also works: sed -n -e 's/^'"$subj"' *//p' file1.txt
It outputs all lines matching $subj at the beginning of a line after having removed the matching word and the spaces following. If TABs are used the spaces should be replaced by something like [[:space:]].

Searching a file (grep/awk) for 2 carriage return/line-feed characters

I'm trying to write a script that'll simply count the occurrences of \r\n\r\n in a file. (Opening the sample file in vim binary mode shows me the ^M character in the proper places, and the newline is still read as a newline).
Anyway, I know there are tons of solutions, but they don't seem to get me what I want.
e.g. awk -e '/\r/,/\r/!d' or using $'\n' as part of the grep statement.
However, none of these seem to produce what I need. I can't find the \r\n\r\n pattern with grep's "trick", since that just expands one variable. The awk solution is greedy, and so gets me way more lines than I want/need.
Switching grep to binary/Perl/no-newline mode seems to be closer to what I want,
e.g. grep -UPzo '\x0D', but really what I want then is grep -UPzo '\x0D\x00\x0D\x00', which doesn't produce the output I want.
It seems like such a simple task.
By default, awk treats \n as the record separator. That makes it very hard to count \r\n\r\n. If we choose some other record separator, say a letter, then we can easily count the appearance of this combination. Thus:
awk '{n+=gsub("\r\n\r\n", "")} END{print n}' RS='a' file
Here, gsub returns the number of substitutions made. These are summed and, after the end of the file has been reached, we print the total number.
Example
Here, we use bash's $'...' construct to explicitly add newlines and linefeeds:
$ echo -n $'\r\n\r\n\r\n\r\na' | awk '{n+=gsub("\r\n\r\n", "")} END{print n}' RS='a'
2
Alternate solution (GNU awk)
We can tell it to treat \r\n\r\n as the record separator and then return the count (minus 1) of the number of records:
cat file <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
In awk, RS is the record separator and NR is the count of the number of records. Since we are using a multiple-character record separator, this requires GNU awk.
If the file ends with \r\n\r\n, the above would be off by one. To avoid that, the echo -n 1 statement is used to assure that there are always at least one character after the last \r\n\r\n in the file.
Examples
Here, we use bash's $'...' construct to explicitly add newlines and linefeeds:
$ echo -n $'abc\r\n\r\n' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
1
$ echo -n $'abc\r\n\r\ndef' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
1
$ echo -n $'\r\n\r\n\r\n\r\n' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
2
$ echo -n $'1\r\n\r\n2\r\n\r\n3' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
2

Print everything on line after match [duplicate]

This question already has answers here:
Get string after character [duplicate]
(5 answers)
Closed 7 years ago.
I have an large textfile that contains an unique string in the middle. What i want to do is to print everything AFTER the string by using grep.
cat textfile | grep "target_string"
This highlights target_string but prints the whole file
cat textfile | grep -o "target_string"
This prints only target_string
cat textfile | grep -o "target_string*"
This prints only target_string
How can i print everything after target_string and nothing before?
Strangely, the accepted answer printed out the whole line, where I just wanted all the info after the target string. This worked for me:
sed -n 's/target_string//p' filename
Adapted from this post
With GNU grep, try -B0 -A999999999 or similar. A better choice might be awk:
awk '/target_string/ {seen = 1}
seen {print}'
If (your problem specification is slightly unclear) you don't also need to print the matching line, sed is even shorter:
sed '1,/target_string/d'
You forgot the '.':
cat textfile | grep -o "target_string.*"
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /target_string(.*)/' textfile
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} else{if (/target_string(.*)/){print $1; $found++}}' textfile

Resources