Use sed te extract ascii hex string from a single line in a file - bash

I have a file that looks like this:
some random
text
00ab46f891c2emore random
text
234324fc234ba253069
and yet more text
only one line in the file contains only hex characters (234324fc234ba253069), how do I extract that? I tried sed -ne 's/^\([a-f0-9]*\)$/\1/p' file I used line start and line end (^ and &) as delimiters, but I am obviously missing something...

Grep does the job,
$ grep '^[a-f0-9]\+$' file
234324fc234ba253069
Through awk,
$ awk '/^[a-f0-9]+$/{print}' file
234324fc234ba253069
Based on the search pattern given, awk and grep prints the matched line.
^ # start
[a-f0-9]\+ # hex characters without capital A-F one or more times
$ # End

sed can make it:
sed -n '/^[a-f0-9]*$/p' file
234324fc234ba253069
By the way, your command sed -ne 's/^\([a-f0-9]*\)$/\1/p' file is working to me. Note, also, that it is not necessary to use \1 to print back. It is handy in many cases, but now it is too much because you want to print the whole line. Just sed -n '/pattern/p' does the job, as I indicate above.
As there is just one match in the whole file, you may want to exit once it is found (thanks NeronLeVelu!):
sed -n '/^[a-f0-9]*$/{p;q}' file
Another approach is to let printf decide when the line is hexadecimal:
while read line
do
printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"
done < file
Based on Hexadecimal To Decimal in Shell Script, printf "%f" 0xNUMBER executes successfully if the number is indeed hexadecimal. Otherwise, it returns an error.
Hence, using printf ... >/dev/null 2>&1 && echo "$line" does not let printf print anything (redirects to /dev/null) but then prints the line if it was hexadecimal.
For your given file, it returns:
$ while read line; do printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"; done < a
234324fc234ba253069

Using egrep you can restrict your regex to select lines that only match valid hex characters i.e. [a-fA-F0-9]:
egrep '^[a-fA-F0-9]+$' file
234324fc234ba253069

Related

How to properly validate a part of the output of a command in BASH [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

Bash Print between two matches [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

print lines where the third character is a digit

for example our bash script's name is masodik and there is a text.txt with these lines:
qwer
qw2qw
12345
qwert432
Then I write ./masodik text.txt and i got
qw2qw
12345
I tried it many ways and I dont know why this is not working
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
$ grep -E '^.{2}[0-9]' text.txt
qw2qw
12345
, and in script it could be something like:
#!/bin/sh
grep -E '^.{2}[0-9]' "$1"
To print lines whose third character is a digit:
grep ^..[0-9] text.txt
^ matches the start of the line. The dot . matches any character. [0-9] matches any digit.
You can do it with awk quite easily as well:
awk '/^..[0-9]/' file
Result
With your input in file:
$ awk '/^..[0-9]/' file
qw2qw
12345
(sed works as well, sed -n '/^..[0-9]/p' file)
The problem with the code here:
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
...is that the for syntax is wrong:
read u is treated as a word list. So the $u variable is never set, so $u stays empty.
The for loop will run twice -- the 1st time $i will be set to the string "read", the 2nd time $i will be set to the string "u". Since neither string contains a number, the grep returns nothing.
The code never reads text.txt.
See Sasha Khapyorsky's answer for actual working code.
If for some odd reason all external utils, (grep, awk, etc.), are forbidden, this pure POSIX code would work:
#!/bin/sh
while read u ; do
case "$u" in
[a-zA-Z0-9][a-zA-Z0-9][0-9]*) echo "$u" ;;
esac
done
If perl is installed into the system then shell script will look like
#!/bin/bash
perl -e 'print if /^.{2}\d/' text.txt

SED commandd to check DATE is palindromic

I have file with dates in format MM/D/YYYY, called dates.txt
02/02/2020
08/25/1998
03/02/2030
12/02/2021
06/19/1960
01/10/2010
03/07/2100
I need single-line SED command to print just palindromic. For example 02/02/2020 is palindromic while 08/25/2020 is not. Expected output is:
02/02/2020
03/02/2030
12/02/2021
What I did till now is to remove / from date format. How to check is that output the same reading from start and from end?
sed -E "s|([0-9]{2})/([0-9]{2})/([0-9]{4})|\3\2\1|" dates.txt
Here is what I get:
20200202
19982508
20300203
20210212
19601906
20101001
21000703
You can backreference in the pattern match:
sed -n '/\([0-9]\)\([0-9]\)\/\([0-9]\)\([0-9]\)\/\4\3\2\1/p'
Using extended regex and dots looks just nice:
sed -rn '/(.)(.)\/(.)(.)\/\4\3\2\1/p'
sed -rn '\#(.)(.)/(.)(.)/\4\3\2\1#p' # means the same
You may delete any line that does not match the d1d2/M1M2/M2M1d2d1 pattern. To check that, match and capture each day and month digits separately:
sed -E '/^([0-9])([0-9])\/([0-9])([0-9])\/\4\3\2\1$/!d' file > outfile
Or, with GNU sed:
sed -i -E '/^([0-9])([0-9])\/([0-9])([0-9])\/\4\3\2\1$/!d' file
The ^ stands for start of string position and $ means the end of string.
The !d at the end tells sed to "drop" the lines that do not follow this pattern.
See the online demo.
Alternatively, when you have more complex cases, you may read the file line by line, swap the digits in days and months and concatenate them, and compare the value with the year part. You may perform more operations there if need be:
while IFS= read -r line; do
p1="$(sed -En 's,([0-9])([0-9])/([0-9])([0-9])/.*,\4\3\2\1,p' <<< "$line")";
p2="${line##*/}";
if [[ "$p1" == "$p2" ]]; then
echo "$line"
fi
done < file > outfile
See the online demo
The sed -En 's,([0-9])([0-9])/([0-9])([0-9])/.*,\4\3\2\1,p part gets the first four digits and reorders them. The "${line##*/}" uses parameter expansion to remove as many chars as possible from the start till the last / (including it).

Extract first word in colon separated text file

How do i iterate through a file and print the first word only. The line is colon separated. example
root:01:02:toor
the file contains several lines. And this is what i've done so far but it does'nt work.
FILE=$1
k=1
while read line; do
echo $1 | awk -F ':'
((k++))
done < $FILE
I'm not good with bash-scripting at all. So this is probably very trivial for one of you..
edit: variable k is to count the lines.
Use cut:
cut -d: -f1 filename
-d specifies the delimiter
-f specifies the field(s) to keep
If you need to count the lines, just
count=$( wc -l < filename )
-l tells wc to count lines
awk -F: '{print $1}' FILENAME
That will print the first word when separated by colon. Is this what you are looking for?
To use a loop, you can do something like this:
$ cat test.txt
root:hello:1
user:bye:2
test.sh
#!/bin/bash
while IFS=':' read -r line || [[ -n $line ]]; do
echo $line | awk -F: '{print $1}'
done < test.txt
Example of reading line by line in bash: Read a file line by line assigning the value to a variable
Result:
$ ./test.sh
root
user
A solution using perl
%> perl -F: -ane 'print "$F[0]\n";' [file(s)]
change the "\n" to " " if you don't want a new line printed.
You can get the first word without any external commands in bash like so:
printf '%s' "${line%%:*}"
which will access the variable named line and delete everything that matches the glob :* and do so greedily, so as close to the front (that's the %% instead of a single %).
Though with this solution you do need to do the loop yourself. If this is the only thing you want to do with the variable the cut solution is better so you don't have to do the file iteration yourself.

Resources