Delete lines where 3rd character equals a number - bash

I have a consistent file with numbers like
0123456
0234566
.
.
.
etc
With bash tools, command line preferable, how can I remove each line if the third digit equals 2 .
eg, with cut -c3 I can get the correct digit but I cannot combine it effectively with sed or something similar. I am not looking for a pattern, only the 3rd digit.
(I have done it in a script in python but I was wondering how its done through a one-line bash command). Thank you!
EDIT: Additionally, if I want to delete the lines where the third digit NOT equals to 2 (opposite question)

You can just do this with sed
sed -i '/^..2/d' file
If you want to do the opposite you can do:
sed -i '/^..[^2]/d' file
since you are dealing with a specific character.

I would use awk:
$ awk -F "" '$3!=2' file
0234566
by setting the field separator to "" (empty, just valid on GNU-awk), every character is stored in a different field. Then, saying $3 != 2 checks if the 3rd character is not 2 and, if so, the line is printed.
Or with pure bash, using Using shell parameter expansion ${parameter:offset:length}:
while IFS= read -r line
do
[ "${line:2:1}" != "2" ] && echo "$line"
done < file

Related

Grep lines between two patterns, one unique and one repeated

I have a text file which looks like this
1
bbbbb
aaa
END
2
ttttt
mmmm
uu
END
3
....
END
The number of lines between the single number patterns (1,2,3) and END is variable. So the upper delimiting pattern changes, but the final one does not. Using some bash commands, I would like to grep lines between a specified upper partner and the corresponding END, for example a command that takes as input 2 and returns
2
ttttt
mmmm
uu
END
I've tried various solutions with sed and awk, but still can't figure it out. The main problem is that I may need to grep a entry in the middle of the file, so I can't use sed with /pattern/q...Any help will be greatly appreciated!
With awk we set a flag f when matching the start pattern, which is an input argument. After that row, the flag is on and it prints every line. When reaching "END" (AND the flag is on!) it exits.
awk -v p=2 '$0~p{f=1} f{print} f&&/END/{exit}' file
Use sed and its addresses to only print a part of the file between the patterns:
#!/bin/bash
start=x
while [[ $start = *[^0-9]* ]] ; do
read -p 'Enter the start pattern: ' start
done
sed -n "/^$start$/,/^END$/p" file
You can use the sed with an address range. Modify the first regular expression (RE1) in /RE1/,/RE2/ as your convenience:
sed -n '/^[[:space:]]*2$/,/^[[:space:]]*END$/p' file
Or,
sed '
/^[[:space:]]*2$/,/^[[:space:]]*END$/!d
/^[[:space:]]*END$/q
' file
This quits upon reading the END, thus may be more efficient.
Another option/solution using just bash
#!/usr/bin/env bash
start=$1
while IFS= read -r lines; do
if [[ ${lines##* } == $start ]]; then
print=on
elif [[ ${lines##* } == [0-9] ]]; then
print=off
fi
case $print in on) printf '%s\n' "$lines";; esac
done < file.txt
Run the script with the number as the argument, 1 can 2 or 3 or ...
./myscript 1
This might work for you (GNU sed):
sed -n '/^\s*2$/{:a;N;/^\s*END$/M!ba;p;q}' file
Switch off implicit printing by setting the -n option.
Gather up the lines beginning with a line starting with 2 and ending in a line starting with END, print the collection and quit.
N.B. The second regexp uses the M flag, which allows the ^ and $ to match start and end of lines when multiple lines are being matched. Another thing to bear in mind is that using a range i.e. sed -n '/start/,/end/p' file, will start printing lines the moment the first condition is met and if the second match does not materialise, it will continue printing to the end of the file.

Looping and grep writes output for the last line only

I am looping through the lines in a text file. And performing grep on each lines through directories. like below
while IFS="" read -r p || [ -n "$p" ]
do
echo "This is the field: $p"
grep -ilr $p * >> Result.txt
done < fields.txt
But the above writes the results for the last line in the file. And not for the other lines.
If i manually execute the command with the other lines, it works (which mean the match were found). Anything that i am missing here? Thanks
The fields.txt looks like this
annual_of_measure__c
attached_lobs__c
apple
When the file fields.txt
has DOS/Windows lineending convention consisting of two character (Carriage-Return AND Linefeed) and
that file is processed by Unix-Tools expecting Unix lineendings consisting of only one character (Linefeed)
then the line read by the read command and stored in the variable $p is in the first line annual_of_measure__c\r (note the additional \r for the Carriage-Return). Then grep will not find a match.
From your description in the question and the confirmation in the comments, it seems that the last line in fields.txt has no lineending at all, so the variable $p is the ordinary string apple and grep can find a match on the last line of the file.
There are tools for converting lineendings, e.g. see this answer or even more options in this answer.

Delete everything before a pattern

I'm trying to clean a text file.
I want to delete everything start before the first 12 numbers.
1:0:135103079189:0:0:2:0::135103079189:000011:00
A:908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Output desired:
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Here's my command but seems not working.
sed '/:\([0-9]\{12\}\)/d' t.txt
the d command in sed will delete entire line on matching the given regex, you need to use s command to search and replace only part of line... however, for given problem, sed is not suitable as it doesn't support non-greedy regex
you can use perl instead
$ perl -pe's/^.*?(?=\d{12}:)//' ip.txt
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
.*? match zero or more characters as minimally as possible
(?=\d{12}:) only if it is followed by 12-digits ending with :
use perl -i -pe for in-place editing
some possible corner cases
$ # this is matching part of field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe's/^.*?(?=\d{12}:)//'
135103079189:23:603307102606:1
$ # this is not matching 12-digit field at end of line
$ echo 'foo:123:135103079189' | perl -pe's/^.*?(?=\d{12}:)//'
foo:123:135103079189
$ # so, add start/end of line matching cases and restrict 12-digits to whole field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe 's/^(?:.*?:)?(?=\d{12}(:|$))//'
603307102606:1
$ echo 'foo:123:135103079189' | perl -pe's/^(?:.*?:)?(?=\d{12}(:|$))//'
135103079189
Could you please try following.
awk --re-interval 'match($0,/[0-9]{12}/){print substr($0,RSTART)}' Input_file
Since I have OLD version of awk so I am using --re-interval you could remove it in case you have new version of it.
This might work for you (GNU sed):
sed -n 's/[0-9]\{12\}/\n&/;s/.*\n//p' file
We only want to print specific lines so use the -n option to turn off automatic printing. If a line contains a 12 digit number, insert a newline before it. Remove any characters before and including a newline and print the result.
If you want to print lines that do not contain a 12 digit number as is, use:
sed 's/[0-9]\{12\}/\n&/;s/.*\n//' file
The crux of the problem is to identify the start of a multi-character string, insert a unique marker and delete all characters before and including the unique marker. As sed uses the newline to delimit lines, only the user can introduce newlines into the pattern space and as a result, newlines will always be unique.
Taking the nice answer from #Sundeep, in case you would like to use grep or pcregrep (macOS/BSD) you could give a try to:
$ grep -oP '^(?:.*?:)?(?=\d{12})\K.*' file
or
$ pcregrep -o '^(?:.*?:)?(?=\d{12})\K.*' file
The \K will ignore everything after the pattern
Alternative thoughts - I almost think your data is too dirty for a quick sed fix but if generally it's all similar to your sample set of data then certainly pick one of the answers with sed etc. However if you wanted to be more particular about it you could build up a set of commands to ensure the values. I like doing this for debugging and when speed isn't urgent.
Take this tiny sample of code, you could do this other ways but I'm getting the value for each part of the string and I know the order because it contiguous. You could then set up controls on which parts to keep and such as it builds out say a new string per line. Overwrought for sure, but sometimes that is a better long term approach.
#!/bin/bash
while IFS= read -r line ;do
IFS=':' read -r -a array <<< "$line"
for ((i=0; i<${#array[#]}; i++)) ;do
echo "part : ${array[$i]}"
done
done < "test_data.txt"
You could then build the data back up how you wanted and more easily understand what's happening every step of the way ..
part : 1
part : 0
part : 135103079189
part : 0
part : 0
part : 2
part : 0
part :
part : 135103079189
part : 000011
part : 00
part : A
part : 908529896240
part : 0

Output lines of a file, specified by line number, with only bash (no sed, grep, or awk)

Without using tools such as sed, grep, or awk, only standard shell, I need to retrieve line numbers of lines containing a pattern then, then for each line number retrieved, output lines:
[(line_retrieved) + 1 - (line_retrieved % 6)] to [line_retrieved + 6]
while skipping duplicate blocks of lines.
I was able to output lines of a file, specified by line number with sed "${START_LINE}, ${END_LINE}" file. I haven't found how to retrieve line numbers of lines containing a pattern yet.
Here you go.
Assuming your pattern is "giga.*" (for example if you want to catch "gigaaks") and the file is /tmp/1.txt then you'd do something like:
you would do:
pattern="giga.*"; linenum=1; while read line; do for word in `echo $line`; do if [[ "$word" =~ "$string_to_search" ]]; then echo $linenum:$line; fi; done; ((linenum++)); done < /tmp/1.txt | sort -u
Overview:
1) I set my pattern in pattern variable. Value can be exact string or an actual pattern.
2) You read the file line by line in outer loop and read that whole line into a variable $line.
3) Then you read the whole line to read all the words in that line, into a variable $word.
4) Then you use IF statement with =~ SHELL level direct regex match (using =~ operator)
5) If 4th statement is true, then you echo the $linenum (containing the line#) and the whole $line line.
6) You increment $linenum per every outer loop run (i.e. for each line).
7) sort -u will be used to print the line only ONCE if pattern/string occurs more than one time in a line's word(s).
It'll spit all the patterns with the full line, prefixed with the LINE# of each line from the file.
NOTE: if you want to do grep -i (ignore case), then you may try the actual grep -i or similar capability.

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Resources