Sed: get lines beginning with some prefix - bash

I have file Blackberry jad file:
RIM-COD-URL-12: HelloWorld-12.cod
RIM-COD-Size: 68020
RIM-MIDlet-Icon-2-1: ____HOVER_ICON_res/icon/blackberry/icon-68.png,focused
RIM-COD-URL-11: HelloWorld-11.cod
RIM-MIDlet-Icon-Count-2: 1
RIM-COD-URL-10: HelloWorld-10.cod
RIM-MIDlet-Icon-Count-1: 1
MIDlet-Vendor: Vasia Pupkin
RIM-MIDlet-Icon-1-1: res\icon\blackberry\____HOVER_ICON_icon-68.png,focused
Manifest-Version: 1.0
RIM-MIDlet-Flags-1: 0
RIM-COD-SHA1-38: 9a c8 b3 35 72 de 34 5e 7a 0a 5b 9e c3 3a 65 4c 20 0f 8e 50
I just want to get lines begin with RIM-COD-.
Can you provide me solutions for awk or sed?

Use sed -n and only print lines that match RIM-COD.
sed -n -e '/^RIM-COD-/p' yourfile.txt

Try doing this :
awk '/^RIM-COD/' file.txt
Or
grep "^RIM-COD" file.txt
Or
sed -n '/^RIM-COD/p' file.txt

Related

how to grep a hex data area

I have a hex file, I need to extract a range of it to a text file
From range:
To Range:
I need Output: AC:E4:B5:9A:53:1C
i tried many but it not really correct requirements, Output: Binary file filehex matches
grep "["'\x9f\x87\x6f\x11'"-"'\x9f\x87\x70\x11'"]" filehex > test.txt
hope someone can help me
Use -a to force the text interpretation of the input.
Use -o to only output the matching part.
The expression you used doesn't make much sense. It matches any characters in the set \x9, \x87, \x6f, and then the range \x11-\x9f, etc.
You are rather interested in something that starts with \x9\x87\x6f\x11 and ends in \x9f\x87\x70\x11, and there can be anything in between.
You can use cut to remove the leading and trailing 4 bytes.
grep -oa $'\x9f\x87\x6f\x11.*\x9f\x87\x70\x11' hexfile | cut -b5-21
If you know the length of the string will always be 17 bytes, you can use .\{17\} instead of .*.
Ok I've build randomly one binary $file
with your string at a location making hd command to split them.
Note: regarding k314159' comment, I use hd to produce hexdump output similarto CentOS's hexdump tool.
One shoot using sed:
hd $file |sed -e 'N;/ 9f \+\(|.*\n[0-9a-f]\+ \+\|\)87 \+\(|.*\n[0-9a-f]\+ \+\|\)6f \+\(|.*\n[0-9a-f]\+ \+\|\)11 /p;D;'
000161c0 96 7a b2 21 28 f1 b3 32 63 43 93 ff 50 a6 9f 87 |.z.!(..2cC..P...|
000161d0 6f 11 0d 7a a5 a9 81 9e 32 9d fb 71 27 6d 60 f2 |o..z....2..q'm`.|
0002c3a0
Explanation:
N merge next line in current buffer
\(|.*\n[0-9a-f]\+ \+\|\) match a | followed by anything and a newline (\n), then immediately an hexadecimal number and a space OR nothing.
p print current buffer (two lines)
D Delete upto newline in current buffer, keep last line for next sed loop.
The last hexadecimal 00028d2a correspond to the size of my binary $file:
printf "%x\n" $(stat -c %s $file)
Using bash + grep:
printf -v var "\x9f\x87\x6f\x11"
IFS=: read -r offset _ < <(grep -abo "$var" $file)
hd $file | sed -ne "$((offset/16-1)),+4p"
000161a0 b7 8f 4a 4d ed 89 6c 0b 25 f9 e7 c9 8c 99 6e 23 |..JM..l.%.....n#|
000161b0 3c ba 80 ec 2e 32 dd f3 a4 a2 09 bd 74 bf 66 11 |<....2......t.f.|
000161c0 96 7a b2 21 28 f1 b3 32 63 43 93 ff 50 a6 9f 87 |.z.!(..2cC..P...|
000161d0 6f 11 0d 7a a5 a9 81 9e 32 9d fb 71 27 6d 60 f2 |o..z....2..q'm`.|
000161e0 15 86 c2 bd 11 d0 08 90 c4 84 b9 80 04 4e 17 f1 |.............N..|
Where you could read your string:
000161c0 9f 87 | ..|
000161d0 6f 11 |o. |
For testing, I've built my test file by:
dd if=/vmlinuz bs=90574 count=1 of=/tmp/testfile
printf '\x9f\x87\x6f\x11' >>/tmp/testfile
dd if=/vmlinuz bs=90574 count=1 >>/tmp/testfile
file=/tmp/testfile
Use grep to search for the original binary file, not the hex dump. Extending choroba's answer, I think you may have problems with grep trying to interpret your search pattern as UTF-8 or some other encoding. You should temporarily set the environment variable LC_ALL=C for grep to treat each byte individually. Also, you can use the -P option to enable use of lookbehind and lookahead in your pattern. So your command becomes:
LANG=C grep -oaP $'(?<=\x9f\x87\x6f\x11).*(?=\x9f\x87\x70\x11)' binary-file > test.txt
Proof that it works:
$ echo $'BEFORE\x9f\x87\x6f\x11AC:E4:B5:9A:53:1C\x9f\x87\x70\x11AFTER' | LANG=C grep -oaP $'(?<=\x9f\x87\x6f\x11).*(?=\x9f\x87\x70\x11)'
AC:E4:B5:9A:53:1C
$

Attempting to replace two character string with /dev/random hex

Example string
CA DA 00 17 11 38 88 C5 03
Desired output
AB 3C 6C 8F DA 88 24 78 6C
Commands attempted
$ tr -dc 0-9A-F < /dev/urandom filename ## prints too many chars
awk '{gsub(length($1)==2,{printf "%02")}}' filename ## syntax doesn't work, unsure how to add hex
$ sed 's/[a-z0-9]\{2\}//g' filename ## only replaces digits, unsure how to add hex as a replacement
I ended up using vim to do a partial conversion for some level of randomization.
:s/\d\d/AA/g
Can anyone provide a working solution?
It would be nice to see solutions (and explanations) leveraging tr/awk/sed for knowledge sharing purposes.
Thanks.
To replace each field with a random 2-digit hex number with awk is just:
$ awk -v seed="$RANDOM" 'BEGIN{srand(seed)} {for (i=1; i<=NF; i++) $i=sprintf("%02X",rand()*256)} 1' file
C7 A1 02 1A 4A 94 95 A0 1E
$ awk -v seed="$RANDOM" 'BEGIN{srand(seed)} {for (i=1; i<=NF; i++) $i=sprintf("%02X",rand()*256)} 1' file
1C 50 A9 D3 8B B0 24 9C 14
Hopefully it's very obvious what it's doing.
Here is an idea on how to get a random hex (mac address?)
awk -v seed=$RANDOM '
BEGIN{
srand(seed);
split("0 1 2 3 4 5 6 7 8 9 A B C D E F",hex," ");
for (i=1; i<=6; i++)
printf "%s%s ",hex[int(rand()*16)+1],hex[int(rand()*16)+1];
print ""
}'
D8 D9 BA 00 6A C6

Grep not matching certain parts of man page

Grep doesn't seem to match certain strings from man output. It seems to be random in that I can't work out any rhyme or reason as to whether a string will match or not.
man sed | head -7:
SED(1) BSD General Commands Manual SED(1)
NAME
sed -- stream editor
SYNOPSIS
$ man sed | head -7 | grep sed # no match
$ man sed | head -7 | grep stream # match on "stream"
sed -- stream editor
$ man sed | head -7 | grep '\-\-' # match on "--"
sed -- stream editor
$ man sed | head -7 | grep NAME # no match
$ man sed | head -7 | grep SYNOPSIS # no match
This also happens when redirecting the output to a file and grepping that
$ man sed | head -7 > /tmp/sed.man
$ cat /tmp/sed.man | grep sed # no match
$ cat /tmp/sed.man | grep stream # match on "stream"
sed -- stream editor
$ grep sed /tmp/sed.man # no match
$ grep stream /tmp/sed.man # match on "stream"
sed -- stream editor
grep: grep (BSD grep) 2.5.1-FreeBSD
man: version 1.6c
macOS: 10.14.6 Beta
bash: GNU bash, version 5.0.7(1)-release (x86_64-apple-darwin18.5.0)
$ man sed | head -7 | hexdump -C
00000000 0a 53 45 44 28 31 29 20 20 20 20 20 20 20 20 20 |.SED(1) |
00000010 20 20 20 20 20 20 20 20 20 20 20 42 53 44 20 47 | BSD G|
00000020 65 6e 65 72 61 6c 20 43 6f 6d 6d 61 6e 64 73 20 |eneral Commands |
00000030 4d 61 6e 75 61 6c 20 20 20 20 20 20 20 20 20 20 |Manual |
00000040 20 20 20 20 20 20 20 20 20 53 45 44 28 31 29 0a | SED(1).|
00000050 0a 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 |.N.NA.AM.ME.E. |
00000060 20 20 20 73 08 73 65 08 65 64 08 64 20 2d 2d 20 | s.se.ed.d -- |
00000070 73 74 72 65 61 6d 20 65 64 69 74 6f 72 0a 0a 53 |stream editor..S|
00000080 08 53 59 08 59 4e 08 4e 4f 08 4f 50 08 50 53 08 |.SY.YN.NO.OP.PS.|
00000090 53 49 08 49 53 08 53 0a |SI.IS.S.|
00000098
Googling is hard for this problem as any combination of "man" or "grep" doesn't mention my problem that strings (with no special characters) are not matching.
man-pages are using the roff-format (https://man.openbsd.org/roff). Do the following:
man sed > sed.man
vi sed.man
so you see:
SED(1) BSD General Commands Manual SED(1)
N^HNA^HAM^HME^HE
s^Hse^Hed^Hd -- stream editor
to convert a man-page to text without the ^H-stuff. have a look on http://www.schweikhardt.net/man_page_howto.html#q10
create a perl-Skript called strip-headers with the content:
#!/usr/bin/perl -wn
# make it slurp the whole file at once:
undef $/;
# delete first header:
s/^\n*.*\n+//;
# delete last footer:
s/\n+.*\n+$/\n/g;
# delete page breaks:
s/\n\n+[^ \t].*\n\n+(\S+).*\1\n\n+/\n/g;
# collapse two or more blank lines into a single one:
s/\n{3,}/\n\n/g;
# see what is left...
print;
change the rights on the perl-script chmod 750 strip-headers and run it with:
man sed | ./strip-headers | col -bx > sed.man
or
man sed | ./strip-headers | col -bx | head -7 | grep sed
macOS man doesn't support the --ascii flag, so I used col -bx to strip the annoying formatting from man for piping into other commands.
man sed | col -bx | grep SYNOPSIS
col -b: Do not output any backspaces, printing only the last character written to each column position.
col -x: Output multiple spaces instead of tabs.
Notes:
I've read that man is meant to detect whether you're piping to another command or into a file, etc, but that was not my experience. At least for man 1.6c, the default for macOS.
Solution using col: https://unix.stackexchange.com/a/15866
Thanks #Cyrus - I didn't know about hexdump
Thanks #Oliver Gaida - I didn't know cat and vi would show display differently

Grep from file fails but grep with individual lines from the file works

I am trying to extract lines from file genome.gff that contain a line from file suspicious.txt. suspicious.txt was derived from genome.gff and every line should match.
Using grep on a single line from suspicious.txt works as expected:
grep 'gene10002' genome.gff
NC_007082.3 Gnomon gene 1269632 1273520 . + . ID=gene10002;Dbxref=BEEBASE:GB54789,GeneID:409846;Name=bur;gbkey=Gene;gene=bur;gene_biotype=protein_coding
NC_007082.3 Gnomon mRNA 1269632 1273520 . + . ID=rna21310;Parent=gene10002;Dbxref=GeneID:409846,Genbank:XM_393336.5,BEEBASE:GB54789;Name=XM_393336.5;gbkey=mRNA;gene=bur;product=burgundy;transcript_id=XM_393336.5
But every variation on using grep from a file that I've been able to think of or find online produces no output or an empty file:
grep -f suspicious.txt genome.gff
grep -F -f suspicious.txt genome.gff
while read line; do grep "$line" genome.gff; done<suspicious.txt
while read line; do grep '$line' genome.gff; done<suspicious.txt
while read line; do grep "${line}" genome.gff; done<suspicious.txt
cat suspicious.txt | while read line; do grep '$line' genome.gff; done
cat suspicious.txt | while read line; do grep '$line' genome.gff >> suspicious.gff; done
cat suspicious.txt | while read line; do grep -e "${line}" genome.gff >> suspicious.gff; done
cat "$(cat suspicious_bee_geneIDs_test.txt)" | while read line; do grep -e "${line}" genome.gff >> suspicious.gff; done
Running it as a script also produces an empty file:
#!/bin/bash
SUSP=$1
GFF=$2
while read -r line; do
grep -e "${line}" $GFF >> suspicious_bee_genes.gff
done<$SUSP
This is what the files look like:
head genome.gff
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build Amel_4.5
#!genome-build-accession NCBI_Assembly:GCF_000002195.4
##sequence-region NC_007070.3 1 29893408
##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7460
NC_007070.3 RefSeq region 1 29893408 . + . ID=id0;Dbxref=taxon:7460;Name=LG1;gbkey=Src;genome=chromosome;linkage- group=LG1;mol_type=genomic DNA;strain=DH4
NC_007070.3 Gnomon gene 181 211962 . - . ID=gene0;Dbxref=BEEBASE:GB42164,GeneID:726912;Name=cort;gbkey=Gene;gene=cort;gene_biotype=protein_coding
NC_007070.3 Gnomon mRNA 181 71559 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:726912,Genbank:XM_006557348.1,BEEBASE:GB42164;Name=XM_006557348.1;gbkey=mRNA;gene=cort;product=cortex%2C transcript variant X2;transcript_id=XM_006557348.1
wc -l genome.gff
457742
head suspicious.txt
gene10002
gene1001
gene1003
gene10038
gene10048
gene10088
gene10132
gene10134
gene10181
gene10209
wc -l suspicious.txt
928
Does anyone know what's going wrong here?
This can happen when the input file is in DOS format: each line will have a trailing CR character at the end, which will break the matching.
One way to check if this is the case is using hexdump, for example (just the first few lines):
$ hexdump -C suspicious.txt
00000000 67 65 6e 65 31 30 30 30 32 0d 0a 67 65 6e 65 31 |gene10002..gene1|
00000010 30 30 31 0d 0a 67 65 6e 65 31 30 30 33 0d 0a 67 |001..gene1003..g|
00000020 65 6e 65 31 30 30 33 38 0d 0a 67 65 6e 65 31 30 |ene10038..gene10|
In the ASCII representation at the right, notice the .. after each gene. These dots correspond to 0d and 0a. The 0d is the CR character.
Without the CR character, the output should look like this:
$ hexdump -C <(tr -d '\r' < suspicious.txt)
00000000 67 65 6e 65 31 30 30 30 32 0a 67 65 6e 65 31 30 |gene10002.gene10|
00000010 30 31 0a 67 65 6e 65 31 30 30 33 0a 67 65 6e 65 |01.gene1003.gene|
00000020 31 30 30 33 38 0a 67 65 6e 65 31 30 30 34 38 0a |10038.gene10048.|
Just one . after each gene, corresponding to 0a, and no 0d.
Another way to see the DOS line endings in the vi editor. If you open the file with vi, the status line would show [dos], or you could run the ex command :set ff? to make it tell you the file format (the status line will say fileformat=dos).
You can remove the CR characters on the fly like this:
grep -f <(tr -d '\r' < suspicious.txt) genome.gff
Or you could remove in vi, by running the ex command :set ff=unix and then save the file. There are other command line tools too that can remove the DOS line ending.
Another possibility is that instead of a trailing CR character, you might have trailing whitespace. The output of hexdump -C should make that perfectly clear. After the trailing whitespace characters are removed, the grep -f should work as expected.

Counting chars from right - shell script

I have a temperature sensor connected to a *nix system and the typical output is something like:
pi#raspberrypi $ cat /sys/bus/w1/devices/28-00000202070c/w1_slave
c3 00 4b 46 7f ff 0d 10 12 : crc=12 YES
c3 00 4b 46 7f ff 0d 10 12 t=12187
The result comes without any comma, but is assumed that is always coming with 3 digits after the comma, so in this example would be 12.187º.
I have implemented a filter that places a comma after the second char, and it works most of the time:
grep t= | awk '{print $10;}'| sed 's/t\=//g' | sed 's/\([0-9][0-9]\)\([0-9]\)/\1,\2/g'
However, during winter, temperature drops below 10º and my filter returns values like 95,32º (when it should be 9,532º).
Is there any way of counting characters from the right, so I could always count with the 3 digital characters (and avoiding this problem in temperatures below 10º)?
Thanks,
Joaoabs
awk can handle floating point operations:
awk '/t=/ { sub(/t=/,"",$NF); print $NF/1000}' /sys/bus/w1/devices/28-00000202070c/w1_slave
If I understand you correctly, then what you want to do is :
sed 's/\([0-9][0-9][0-9]\)$/,\1/g'
$ in a regex means 'the end' so this searches for 3 digits right at the end of a string and replaces them with comma+the found digits.
(Note: This should be the last part of your pipe, with the beginning unchanged.)
Using awk
awk -F= '/t=/ {print $NF/1000}' /sys/bus/w1/devices/28-00000202070c/w1_slave
12.187
Or store it to a variable:
temp=$(awk -F= '/t=/ {print $NF/1000}' /sys/bus/w1/devices/28-00000202070c/w1_slave)
echo "$temp"
12.187
don't do grep|awk|sed|sed..., try this:
...|awk -F't=' -v OFS='t=' 'NF>1{$2=sprintf("%'\''d",$2)}7'
test with your data:
kent$ echo "c3 00 4b 46 7f ff 0d 10 12 : crc=12 YES
c3 00 4b 46 7f ff 0d 10 12 t=12187"|awk -F't=' -v OFS='t=' 'NF>1{$2=sprintf("%'\''d",$2)}7'
c3 00 4b 46 7f ff 0d 10 12 : crc=12 YES
c3 00 4b 46 7f ff 0d 10 12 t=12,187
awk has the format pattern in 'printf' for your requirement. just use it.
if you just want to have the t= value:
awk -F't=' -v OFS='t=' 'NF>1{printf "%'\''d\n",$2}'
with same input:
kent$ echo "c3 00 4b 46 7f ff 0d 10 12 : crc=12 YES
c3 00 4b 46 7f ff 0d 10 12 t=12187"|awk -F't=' -v OFS='t=' 'NF>1{printf "%'\''d\n",$2}'
12,187

Resources