SED commandd to check DATE is palindromic - bash

I have file with dates in format MM/D/YYYY, called dates.txt
02/02/2020
08/25/1998
03/02/2030
12/02/2021
06/19/1960
01/10/2010
03/07/2100
I need single-line SED command to print just palindromic. For example 02/02/2020 is palindromic while 08/25/2020 is not. Expected output is:
02/02/2020
03/02/2030
12/02/2021
What I did till now is to remove / from date format. How to check is that output the same reading from start and from end?
sed -E "s|([0-9]{2})/([0-9]{2})/([0-9]{4})|\3\2\1|" dates.txt
Here is what I get:
20200202
19982508
20300203
20210212
19601906
20101001
21000703

You can backreference in the pattern match:
sed -n '/\([0-9]\)\([0-9]\)\/\([0-9]\)\([0-9]\)\/\4\3\2\1/p'
Using extended regex and dots looks just nice:
sed -rn '/(.)(.)\/(.)(.)\/\4\3\2\1/p'
sed -rn '\#(.)(.)/(.)(.)/\4\3\2\1#p' # means the same

You may delete any line that does not match the d1d2/M1M2/M2M1d2d1 pattern. To check that, match and capture each day and month digits separately:
sed -E '/^([0-9])([0-9])\/([0-9])([0-9])\/\4\3\2\1$/!d' file > outfile
Or, with GNU sed:
sed -i -E '/^([0-9])([0-9])\/([0-9])([0-9])\/\4\3\2\1$/!d' file
The ^ stands for start of string position and $ means the end of string.
The !d at the end tells sed to "drop" the lines that do not follow this pattern.
See the online demo.
Alternatively, when you have more complex cases, you may read the file line by line, swap the digits in days and months and concatenate them, and compare the value with the year part. You may perform more operations there if need be:
while IFS= read -r line; do
p1="$(sed -En 's,([0-9])([0-9])/([0-9])([0-9])/.*,\4\3\2\1,p' <<< "$line")";
p2="${line##*/}";
if [[ "$p1" == "$p2" ]]; then
echo "$line"
fi
done < file > outfile
See the online demo
The sed -En 's,([0-9])([0-9])/([0-9])([0-9])/.*,\4\3\2\1,p part gets the first four digits and reorders them. The "${line##*/}" uses parameter expansion to remove as many chars as possible from the start till the last / (including it).

Related

How to properly validate a part of the output of a command in BASH [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

Use sed te extract ascii hex string from a single line in a file

I have a file that looks like this:
some random
text
00ab46f891c2emore random
text
234324fc234ba253069
and yet more text
only one line in the file contains only hex characters (234324fc234ba253069), how do I extract that? I tried sed -ne 's/^\([a-f0-9]*\)$/\1/p' file I used line start and line end (^ and &) as delimiters, but I am obviously missing something...
Grep does the job,
$ grep '^[a-f0-9]\+$' file
234324fc234ba253069
Through awk,
$ awk '/^[a-f0-9]+$/{print}' file
234324fc234ba253069
Based on the search pattern given, awk and grep prints the matched line.
^ # start
[a-f0-9]\+ # hex characters without capital A-F one or more times
$ # End
sed can make it:
sed -n '/^[a-f0-9]*$/p' file
234324fc234ba253069
By the way, your command sed -ne 's/^\([a-f0-9]*\)$/\1/p' file is working to me. Note, also, that it is not necessary to use \1 to print back. It is handy in many cases, but now it is too much because you want to print the whole line. Just sed -n '/pattern/p' does the job, as I indicate above.
As there is just one match in the whole file, you may want to exit once it is found (thanks NeronLeVelu!):
sed -n '/^[a-f0-9]*$/{p;q}' file
Another approach is to let printf decide when the line is hexadecimal:
while read line
do
printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"
done < file
Based on Hexadecimal To Decimal in Shell Script, printf "%f" 0xNUMBER executes successfully if the number is indeed hexadecimal. Otherwise, it returns an error.
Hence, using printf ... >/dev/null 2>&1 && echo "$line" does not let printf print anything (redirects to /dev/null) but then prints the line if it was hexadecimal.
For your given file, it returns:
$ while read line; do printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"; done < a
234324fc234ba253069
Using egrep you can restrict your regex to select lines that only match valid hex characters i.e. [a-fA-F0-9]:
egrep '^[a-fA-F0-9]+$' file
234324fc234ba253069

How to filter text with several parentheses in bash?

I have a bash script that creates a text file and then manipulates it with sed commands. However, on occasion there is a line which contains multiple parentheses.
For example:
fileInfo: (2014) (b2b) (analog) (digital) (some-text)
This line could be as few a 1 set of () but usually at least 2. In the end I am only interested in extracting the last set of ()
fileInfo: (some-text)
I can get it to work if there is a set number of (), but not when it varies from each file.
Until I encountered a file that had more than 2 sets of () the following has worked:
if grep -q "textInfo: (.*) (.*)" "$TXT"; then
SG=`egrep textInfo "$TXT" | sed "s/.*) (//"| sed "s/)$//"`
else
SG=`egrep textInfo "$TXT" | sed "s/.* (//"| sed "s/)$//"`
fi
Try this gnu sed command,
sed -r 's/^([^ ]+)( )+.*\((.*)\)/\1\2(\3)/g' file
Example:
$ echo 'fileInfo: (2014) (b2b) (analog) (digital) (some-text)' | sed -r 's/^([^ ]+)( )+.*\((.*)\)/\1\2(\3)/g'
fileInfo: (some-text)
^([^ ]+) -Matches and stores one or more characters which is not to be a space and stored it into the first group.(Once it finds a space, sed stops fetching the characters)
( )+ - Matches one or more space characters and stored it into the second group.Once the sed finds a character which is not to space character, it suddenly stops fetching.
.*\( - Matches any character upto the literal (. Normally sed matches the last (, if a line contains more than one (.
(.*)\) - Fetches the characters inside the last () brackets and stored it into the third group.
\1\2(\3) - Finally using back reference, sed replaces the whole line with these fetched groups.
Regular expressions can do this
I am not an expert in sed but probably this code catches the text in last paranthesis. You only need to add the other fixed text that you need.
sed -n '/\(([^)]+)\)$/p'
Using BASH regex:
s='fileInfo: (2014) (b2b) (analog) (digital) (some-text)'
[[ "$s" =~ ^([^:]+:).*(\([^()]*\))[^()]*$ ]] && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"
fileInfo: (some-text)
This might work for you (GNU sed):
sed 's/:.*(/:(/' file
Delete everything from : to the last ( and then replace the : and (.
N.B. .* is greedy and always aims for the longest match.
Using sed:
$ sed -r 's/([^ ]+ +).*(\(.*)/\1 \2/' file
fileInfo: (some-text)

How to ignore all lines before a match occurs in bash?

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql
Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.
How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.
This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.
awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'
you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END
Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file
This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

Regexp in bash for number between "quotes"

Input:
hello world "22" bye world
I need a regex that will work in bash that can get me the numbers between the quotes. The regex should match 22.
Thanks!
Hmm have you tried \"([0-9]+)\" ?
In Bash >= 3.2:
while read -r line
do
[[ $line =~ .*\"([0-9]+)\".* ]]
echo "${BASH_REMATCH[1]}"
done < inputfile.txt
Same thing using sed so it's more portable:
while read -r line
do
result=$(sed -n 's/.*\"\([0-9]\+\)\".*/\1/p')
echo "$result"
done < inputfile.txt
Pure Bash, no Regex. Number is in array element 1.
IFS=\" # input field separator is a double quote now
while read -a line ; do
echo -e "${line[1]}"
done < "$infile"
There are not really regexes in bash itself. There are however some programs that can use regexes, amongst them grep and sed.
grep's main functionality is to filter lines that match a given regex, ie you give it some data to stdin or a file and it prints the lines that match the regex.
sed does transform data. It doesn't just return the matching lines, you can tell it what to return with the s/regex/replacement/ command. The output part can contain references to groups (\x where x is the number of the group), if you specify the -r option.
So what we need is sed. Your input contains some stuff (^.*), a ", some digits ([0-9]+), a ", and some stuff (.*$). We later need to reference the digits, so we need to make the digits a group. So our complete matching regex is: ^.*"([0-9]+)".*$. We want to replace that with only the digits, so the replacement part is just \1.
Building the complete sed command is left as an exercise to you :-)
(Note that sed does not transform lines that don't match. If your input is only the line you provided above, that's fine. If there are other lines you'd like to silently skip, you need to specify the option -n (no automatic printing) and add a n to the end of the sed expression, which instructs it to print the line. That way it only prints the matching line(s).)

Resources