Regexp in bash for number between "quotes" - bash

Input:
hello world "22" bye world
I need a regex that will work in bash that can get me the numbers between the quotes. The regex should match 22.
Thanks!

Hmm have you tried \"([0-9]+)\" ?

In Bash >= 3.2:
while read -r line
do
[[ $line =~ .*\"([0-9]+)\".* ]]
echo "${BASH_REMATCH[1]}"
done < inputfile.txt
Same thing using sed so it's more portable:
while read -r line
do
result=$(sed -n 's/.*\"\([0-9]\+\)\".*/\1/p')
echo "$result"
done < inputfile.txt

Pure Bash, no Regex. Number is in array element 1.
IFS=\" # input field separator is a double quote now
while read -a line ; do
echo -e "${line[1]}"
done < "$infile"

There are not really regexes in bash itself. There are however some programs that can use regexes, amongst them grep and sed.
grep's main functionality is to filter lines that match a given regex, ie you give it some data to stdin or a file and it prints the lines that match the regex.
sed does transform data. It doesn't just return the matching lines, you can tell it what to return with the s/regex/replacement/ command. The output part can contain references to groups (\x where x is the number of the group), if you specify the -r option.
So what we need is sed. Your input contains some stuff (^.*), a ", some digits ([0-9]+), a ", and some stuff (.*$). We later need to reference the digits, so we need to make the digits a group. So our complete matching regex is: ^.*"([0-9]+)".*$. We want to replace that with only the digits, so the replacement part is just \1.
Building the complete sed command is left as an exercise to you :-)
(Note that sed does not transform lines that don't match. If your input is only the line you provided above, that's fine. If there are other lines you'd like to silently skip, you need to specify the option -n (no automatic printing) and add a n to the end of the sed expression, which instructs it to print the line. That way it only prints the matching line(s).)

Related

SED commandd to check DATE is palindromic

I have file with dates in format MM/D/YYYY, called dates.txt
02/02/2020
08/25/1998
03/02/2030
12/02/2021
06/19/1960
01/10/2010
03/07/2100
I need single-line SED command to print just palindromic. For example 02/02/2020 is palindromic while 08/25/2020 is not. Expected output is:
02/02/2020
03/02/2030
12/02/2021
What I did till now is to remove / from date format. How to check is that output the same reading from start and from end?
sed -E "s|([0-9]{2})/([0-9]{2})/([0-9]{4})|\3\2\1|" dates.txt
Here is what I get:
20200202
19982508
20300203
20210212
19601906
20101001
21000703
You can backreference in the pattern match:
sed -n '/\([0-9]\)\([0-9]\)\/\([0-9]\)\([0-9]\)\/\4\3\2\1/p'
Using extended regex and dots looks just nice:
sed -rn '/(.)(.)\/(.)(.)\/\4\3\2\1/p'
sed -rn '\#(.)(.)/(.)(.)/\4\3\2\1#p' # means the same
You may delete any line that does not match the d1d2/M1M2/M2M1d2d1 pattern. To check that, match and capture each day and month digits separately:
sed -E '/^([0-9])([0-9])\/([0-9])([0-9])\/\4\3\2\1$/!d' file > outfile
Or, with GNU sed:
sed -i -E '/^([0-9])([0-9])\/([0-9])([0-9])\/\4\3\2\1$/!d' file
The ^ stands for start of string position and $ means the end of string.
The !d at the end tells sed to "drop" the lines that do not follow this pattern.
See the online demo.
Alternatively, when you have more complex cases, you may read the file line by line, swap the digits in days and months and concatenate them, and compare the value with the year part. You may perform more operations there if need be:
while IFS= read -r line; do
p1="$(sed -En 's,([0-9])([0-9])/([0-9])([0-9])/.*,\4\3\2\1,p' <<< "$line")";
p2="${line##*/}";
if [[ "$p1" == "$p2" ]]; then
echo "$line"
fi
done < file > outfile
See the online demo
The sed -En 's,([0-9])([0-9])/([0-9])([0-9])/.*,\4\3\2\1,p part gets the first four digits and reorders them. The "${line##*/}" uses parameter expansion to remove as many chars as possible from the start till the last / (including it).

Text processing in bash - extracting information between multiple HTML tags and outputting it into CSV format [duplicate]

I can't figure how to tell sed dot match new line:
echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'
I expect to get:
one
three
instead I get original:
one
two
three
sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).
$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three
Maybe you should try vim
:%s/one\_.*two/one/g
If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :
.
Matches any character, including newline.
All you need to use is a -z option:
echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
# three
See the online sed demo.
However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:
perl -i -0 -pe 's/one.*?two//sg' file # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file
The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.
You can use python this way:
$ echo -e "one\ntwo\nthree" | python -c 'import re, sys; s=sys.stdin.read(); s=re.sub("(?s)one.*two", "one", s); print s,'
one
three
$
This reads the entire python's standard input (sys.stdin.read()), then substitutes "one" for "one.*two" with dot matches all setting enabled (using (?s) at the start of the regular expression) and then prints the modified string (the trailing comma in print is used to prevent print from adding an extra newline).
This might work for you:
<<<$'one\ntwo\nthree' sed '/two/d'
or
<<<$'one\ntwo\nthree' sed '2d'
or
<<<$'one\ntwo\nthree' sed 'n;d'
or
<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'
Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.
Only certain commands (N,H and G) preserve newlines in the pattern/hold space.
N appends a newline to the pattern space and then appends the next line.
H does exactly the same except it acts on the hold space.
G appends a newline to the pattern space and then appends whatever is in the hold space too.
The hold space is empty until you place something in it so:
sed G file
will insert an empty line after each line.
sed 'G;G' file
will insert 2 empty lines etc etc.
How about two sed calls:
(get rid of the 'two' first, then get rid of the blank line)
$ echo -e 'one\ntwo\nthree' | sed 's/two//' | sed '/^$/d'
one
three
Actually, I prefer Perl for one-liners over Python:
$ echo -e 'one\ntwo\nthree' | perl -pe 's/two\n//'
one
three
Below discussion is based on Gnu sed.
sed operates on a line by line manner. So it's not possible to tell it dot match newline. However, there are some tricks that can implement this. You can use a loop structure (kind of) to put all the text in the pattern space, and then do the operation.
To put everything in the pattern space, use:
:a;N;$!ba;
To make "dot match newline" indirectly, you use:
(\n|.)
So the result is:
root#u1804:~# echo -e "one\ntwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
Note that in this case, (\n|.) matches newline and all characters. See below example:
root#u1804:~# echo -e "oneXXXXXX\nXXXXXXtwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#

Bash sed replace with exact match of a text in a file

I have a file pattern.txt which is composed of one very long line of complicated code (~8200 chars).
This code can be found in multiple files inside multiple directories.
I can easily identify a list of these files using
grep -rli 'uniquepartofthecode' *
My concern is how do I replace it with the exact text from within the file ?
I tried to do:
var=$(cat pattern.txt)
sed -i "s/$var//g" targetfile.txt
but I got the following error :
sed: -e expression #1, char 96: unknown option to `s'
sed is interpreting my $var content as a regular expression, I would like it to just match the exact text.
The pattern.txt content could be more or less any combination of characters so I'm afraid I cannot escape every characters efficiently.
Is there a solution using sed ? Or should I use another tool for that ?
EDIT:
I tried using this solution to make a proper regex pattern from my text file.
Is it possible to escape regex metacharacters reliably with sed
the overall process is:
var=$(cat pattern.txt)
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$var")
sed -n "s/$searchEscaped/foo/p" <<<"$var" # if ok, echoes 'foo'
This last command displays "foo". $searchEscaped seems to be properly escaped.
Though, this is not returning anything (it should display foo + the rest of the file without the matched part):
sed -n "s/$searchEscaped/foo/p" targetfile.txt
I think that the best solution is to not use regular expressions at all and resort to string replacement.
One way to do this is using perl:
$ echo "$string_to_replace"
some other stuff abc$^%!# some more
$ echo "$search"
abc$^%!#
$ perl -spe '$len = length $search;
while (($pos = index($_, $search, $n)) > -1) {
substr($_, $pos, $len) = "replacement";
$n = $pos + $len;
}' <<<"$string_to_replace" -- -search="$search"
some other stuff replacement some more
The -p switch tells perl to loop through each line of the variable $string_to_replace (which could easily be replaced by a file). -s allows options to be passed to the script - in this case, I've passed a shell variable containing the search string.
For each line of the file, the while loop runs through all of the matches of the search string. substr is used on the left hand of the assignment to replace a substring of $_, which refers to the current line being processed.

How can I read words (instead of lines) from a file?

I've read this question about how to read n characters from a text file using bash. I would like to know how to read a word at a time from a file that looks like:
example text
example1 text1
example2 text2
example3 text3
Can anyone explain that to me, or show me an easy example?
Thanks!
The read command by default reads whole lines. So the solution is probably to read the whole line and then split it on whitespace with e.g. for:
#!/bin/sh
while read line; do
for word in $line; do
echo "word = '$word'"
done
done <"myfile.txt"
The way to do this with standard input is by passing the -a flag to read:
read -a words
echo "${words[#]}"
This will read your entire line into an indexed array variable, in this case named words. You can then perform any array operations you like on words with shell parameter expansions.
For file-oriented operations, current versions of Bash also support the mapfile built-in. For example:
mapfile < /etc/passwd
echo ${MAPFILE[0]}
Either way, arrays are the way to go. It's worth your time to familiarize yourself with Bash array syntax to make the most of this feature.
Ordinarily, you should read from a file using a while read -r line loop. To do this and parse the words on the lines requires nesting a for loop inside the while loop.
Here is a technique that works without requiring nested loops:
for word in $(<inputfile)
do
echo "$word"
done
In the context given, where the number of words is known:
while read -r word1 word2 _; do
echo "Read a line with word1 of $word1 and word2 of $word2"
done
If you want to read each line into an array, read -a will put the first word into element 0 of your array, the second into element 1, etc:
while read -r -a words; do
echo "First word is ${words[0]}; second word is ${words[1]}"
declare -p words # print the whole array
done
In bash, just use space as delimiter (read -d ' '). This method requires some preprocessing to translate newlines into spaces (using tr) and to merge several spaces into a single one (using sed):
{
tr '\n' ' ' | sed 's/ */ /g' | while read -d ' ' WORD
do
echo -n "<${WORD}> "
done
echo
} << EOF
Here you have some words, including * wildcards
that don't get expanded,
multiple spaces between words,
and lines with spaces at the begining.
EOF
The main advantage of this method is that you don't need to worry about the array syntax and just work as with a for loop, but without wildcard expansion.
I came across this question and the proposed answers, but I don't see listed this simple possibile solution:
for word in `cat inputfile`
do
echo $word
done
This can be done using AWK too:
awk '{for(i=1;i<=NF;i++) {print $i}}' text_file
You can combine xargs which reads word delimited by space or newline and echo to print one per line:
<some-file xargs -n1 echo
some-command | xargs -n1 echo
That also works well for large or slow streams of data because it does not need to read the whole input at once.
I’ve used this to read 1 table name at a time from SQLite which prints table names in a column layout:
sqlite3 db.sqlite .tables | xargs -n1 echo | while read table; do echo "1 table: $table"; done

How to ignore all lines before a match occurs in bash?

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql
Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.
How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.
This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.
awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'
you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END
Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file
This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

Resources