grep strings based on the length - bash

Is it possible to search strings based on the length in a specific file using grep?
I have tried using the awk but did not work
awk '$0~"^s" && length($0)==31' strings.xml
If not using grep is it possible to find using some other command line tool.

You can use:
grep -E '^s.{30}$' strings.xml
The regexp matches s at the beginning of the line, followed by any 30 characters, then the end of the line. So it will match a line with exactly 31 characters beginning with s.
But the awk command is equivalent, so if it didn't work, neither will this.

awk default is to split fields by whitespace, therefore if you want to match against the first match starting with s and have a length of 31, you could use:
awk '$1 ~ /^s.{30}$/ {print}' strings.xml
The /^s is to match a string starting with s and the .{30}$ matches any . character (except for line terminators) {30} exactly 30 times

Related

Getting module and version from a file in bash

I have a text file (text.txt) which contain the following lines:
|**tashtit.liba.version**|2001.01.2012072137|
|**tashtit.gimla.version**|2001.01.2012072156|
|**chaluka.version**|2001.01.2012080754|
|**analytics.version**|2001.01.2012072142|
|**yizumim.version**|2001.01.2012072222|
I would like the following output (text2.txt):
tashtit.liba-2001.01.2012072137
tashtit.gimla-2001.01.2012072156
chaluka-2001.01.2012080754
analytics-2001.01.2012072142
yizumim-2001.01.2012072222
How can i get that using bash and some regex?
The above is example and the answer should fit the following convention:
|**${module}.version**|$version|
sed 's/|\*\*\([a-z.]*\)\.version\*\*|\([0-9.]*\)|/\1-\2/1'
This matches a literal |** followed by a grouping of lowercase letters and periods terminated by a literal .version**|, then another grouping of numbers and periods terminated by |, with the first grouping, a hyphen, and the second grouping.
sed -rn 's/(^\|\*\*)(.*)(.version)(\*\*\|)(.*)(\|$)/\3-\5/p' file
Split each line into 5 sections base don regular expressions. Substitute the line for the 2nd and 4th sections, separated by "-" using sed and print.
Awk alternative:
awk -F\| '{ $0=gensub("*","","g",$0);split($2,map,".");print map[1]"-"$3 }' file
Set the field delimiter to | and then then strip any asterix out of the lines and further split the 3rd delimited field with ".". Print the 2nd and 3th | delimited fields.

unix command to extract digits after last alphabetical string

String:"gamma021AH00999NAK41"
last two digit may vary.It may be 3 digit 4 digit ,etc...
"NAK" in the given string can be any other string but it contains only characters.
So my intention is to extract last numbers(example 41 in the given string) until first character.
Thanks in advance
Using only shell builtins (no external commands like sed or awk, thus much faster if you're going to be repeating this over and over, f/e, once per line):
s=gamma021AH00999NAK41
result=${s##*[[:alpha:]]}
echo "$result"
${var##pattern} is a parameter expansion which removes the longest possible match for pattern from the front of the value of var before returning it. *[[:alpha:]], as a wildcard followed by an alpha character, will thus remove everything before the K in your string.
You can replace all the alphabetic characters by for example "#" and then take the last field based on the "#" separator:
echo "gamma021AH00999NAK41" | sed "s/[aA-zZ]/#/g" | awk -F'#' '{print $NF}'
NOTE: This won't work if you have other than alphanumeric symbols in your string.
EDIT: Only without awk (Thanks #CharlesDuffy):
echo "gamma021AH00999NAK41" | awk -F'[[:alpha:]]' '{print $NF}'
I see no mention of varying length, so this command will work:
echo "gamma021AH00999NAK41" | cut -b '19-'
Answer : 41

Want to find specific pattern, without knowing the words it contains in Unix

Using grep in bash
Sylvester,Stallone,+42 6944789099
Tommy, Lee Jones,+37 6923441223
Jean Claude,Van Damme,+44 6977654322
Jose Maria,de Santo Agostinho,+30 6936130089
Chuck, Norris, +30 6987543212
Chuck,Norris,+32 6944221234
Chuck1, Norris, +32 6944221234
I have this file , and i want to find the lines where there is only whitespace in the phone number, how do i grep this desired pattern? For example the grep result should yield
Sylvester,Stallone,+42 6944789099
Chuck,Norris,+32 6944221234
EDIT : I want to find the lines that contain exactly one space, and that space MUST be between the country code(ex. +42) and the number itself(6944789000)
It is much easier with awk without using any regex:
awk 'NF==2' file
Sylvester,Stallone,+42 6944789099
Chuck,Norris,+32 6944221234
By default awk splits fields on whitespaces
By using condition NF==2 we print a row that has 2 fields or a single whitespace.
If you specifically want to find lines with space between digits then use:
grep -P '^\S+\d+ \d+$' file
Sylvester,Stallone,+42 6944789099
Chuck,Norris,+32 6944221234
Or by using POSIX classes:
grep -E '^[^[:blank:]]+[[:digit:]]+ [[:digit:]]+$' file

Sed: get the first part of a line till "_Sxx_" where xx is 0-999

I have a lot of files (~9000) they are named like this:
Something_some_more_even_more_S0_other_stuff
Something_S2_other_stuff
Something_even_more_S13_other_stuff
Something_some_more_even_S999_other_stuff
As you see the length of the bit in front of the delimiter Sxx is not fixed.
Also the delimiter can range from S0 to S999 (not S01 or so).
The underscores are actually there.
So how to get the first part till Sxx?
Using sed:
sed 's/_S[0-9]\+_.*$//' file
Something_some_more_even_more
Something
Something_even_more
Something_some_more_even
In this sed command we match using pattern starting from _S<digit> and ending at line end i.e. $. In replacement we just use an empty string.
This awk will also work:
awk -F '_S[0-9]+_.*$' '{print $1}' file

Explained shell statement

The following statement will remove line numbers in a txt file:
cat withLineNumbers.txt | sed 's/^.......//' >> withoutLineNumbers.txt
The input file is created with the following statement (this one i understand):
nl -ba input.txt >> withLineNumbers.txt
I know the functionality of cat and i know the output is written to the 'withoutLineNumbers.txt' file. But the part of '| sed 's/^.......//'' is not really clear to me.
Thanks for your time.
That sed regular expression simply removes the first 7 characters from each line. The regular expression ^....... says "Any 7 characters at the beginning of the line." The sed argument s/^.......// substitutes the above regular expression with an empty string.
Refer to the sed(1) man page for more information.
that sed statement says the delete the first 7 characters. a dot "." means any character. There is an even easier way to do this
awk '{print $2}' withLineNumbers.txt
you just have to print out the 2nd column using awk. No need to use regex
if your data has spaces,
awk '{$1="";print substr($0,2)}' withLineNumbers.txt
sed is doing a search and replace. The 's' means search, the next character ('/') is the seperator, the search expression is '^.......', and the replace expression is an empty string (i.e. everything between the last two slashes).
The search is a regular expression. The '^' means match start of line. Each '.' means match any character. So the search expression matches the first 7 characters of each line. This is then replaced with an empty string. So what sed is doing is removing the first 7 characters of each line.
A more simple way to achieve the same think could be:
cut -b8- withLineNumbers.txt > withoutLineNumbers.txt

Resources