How to use a regular expression to get the following pattern? - bash

Hello I have the following text:
some text,+
this field is another parameter
this is the final of the field
t10681374flp
t10681375flp
I would like to match the following two lines:
t10681374flp
t10681375flp
the rule is that these words begin with 't' and end with 'p',
I tried:
grep -e t*p testing
however I got:
this field is another parameter
t10681374flp
t10681375flp
So I really would like to appreciate support to overcome this task,

Using grep, to avoid matching strange lines and the perfect match, the code below
grep "^t[0-9]*flp$" testing
This matches the below lines,
t10681374flp
t10681375flp
This doesn't match the lines as below,
this field is another parameter
these dont grep
Hope you get resolved..

Following should do the work:
grep ^t.*p$ testing
^ indicates begining of the line, .* indicates any character
and $ indicates end of line.

Related

chef inspec output consists of error due to regex

When executing the below chef inspec command getting error.
describe command ("cat sql.conf | grep 'log_filename'") do
its('stdout') {should match (/^'sql-(\d)+.log'/)}
end
Expected pattern matching is sql-20201212.log. pls check.
This regex /^'sql-(\d)+.log'/ doesn't match this string sql-20201212.log. You can try it out on https://regexr.com/
There are a few problems with your regex:
' is in your regex but not in your string
. matches any character expect line breaks, perhaps you want to match only a dot(?), if so, then you'd need to e.g. escape it \.
you probably don't need to have \d in a group (())
So, this regex ^sql-\d+\.log$ would match sql-20201212.log string. I also added $ to match the end of the string.

How to get match of a pattern even if it is splitted by characters using a bash command (similar to grep)?

I'm trying to output all the lines of a file which contain a specific word/pattern even if it contains other characters between its letters.
Let's say we have a bunch of domain names and we want to filter out all those that contain "paypal" inside, I would like to have this kind of output :
pay-pal-secure.com
payppal.net
etc...
I was wondering if this is possible with grep or does it exist something else that might do it.
Many thanks !
Replace paypal with regexp p.*a.*y.*p.*a.*l to allow all characters between the letters.
Update:
Use extended regular expression p.{0,2}a.{0,2}y.{0,2}p.{0,2}a.{0,2}l to limit characters between the letters to none to two.
Example: grep -E 'p.{0,2}a.{0,2}y.{0,2}p.{0,2}a.{0,2}l' file
See: The Stack Overflow Regular Expressions FAQ
Alternatively you could use agrep (approximate grep):
$ agrep -By paypal file
agrep: 2 words match within 1 error
pay-pal-secure.com
payppal.net

Extract a substring (value of an HTML node tag) in a bash/zsh script

I'm trying to extract a tag value of an HTML node that I already have in a variable.
I'm currently using Zsh but I'm trying to make it work in Bash as well.
The current variable has the value:
<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>
and I would like to get the value of data-count (in this case 0, but could be any length integer).
I have tried using cut, sed and the variables expansion as explained in this question but I haven't managed to adapt the regexs, or maybe it has to be done differently for Zsh.
There is no reason why sed would not work in this situation. For your specific case, I would do something like this:
sed 's/.*data-count="\([0-9]*\)".*/\1/g' file_name.txt
Basically, it just states that sed is looking for the a pattern that contains data-count=, then saves everything within the paranthesis \(...\) into \1, which is subsequently printed in place of the match (full line due to the .*)
Could you please try following.
awk 'match($0,/data-count=[^ ]*/){print substr($0,RSTART+12,RLENGTH-13)}' Input_file
Explanation: Using match function of awk to match regex data-count=[^ ]* means match everything from data-count till a space comes, if this regex is TRUE(a match is found) then out of the box variables RSTART and RLENGTH will be set. Later I am printing current line's sub-string as per these variables values to get only value of data-count.
With sed could you please try following.
sed 's/.*data-count=\"\([^"]*\).*/\1/' Input_file
Explanation: Using sed's capability of group referencing and saving regex value in first group after data-count=\" which is its length, then since using s(substitution) with sed so mentioning 1 will replace all with \1(which is matched regex value in temporary memory, group referencing).
As was said before, to be on the safe side and handle any syntactically valid HTML tag, a parser would be strongly advised. But if you know in advance, what the general format of your HTML element will look like, the following hack might come handy:
Assume that your variable is called "html"
html='<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>'
First adapt it a bit:
htmlx="tag ${html%??}"
This will add the string tag in front and remove the final />
Now make an associative array:
declare -A fields
fields=( ${=$(tr = ' ' <<<$htmlx)} )
The tr turns the equal sign into a space and the ${= handles word splitting. You can now access the values of your attributes by, say,
echo $fields[data-count]
Note that this still has the surrounding double quotes. Yuo can easily remove them by
echo ${${fields[data-count]%?}#?}
Of course, once you do this hack, you have access to all attributes in the same way.

Sed keep original indentation and camel-casing a variable

I have a simple sed script and I am replacing a bunch of lines in my application dynamically with a variable, the variable is a list of strings.My function works but does not keep the original indentation.the function deletes the line if it contains the certain string and replaces the line with a completely new line, I could not do a replace due to certain syntax restrictions.
How do I keep my original indentation when the line is replaced
Can I capitalize my variable and remove the underscore on the fly, i.e. the title is a capitalize and underscore removed version of the variableName, the list of items in the variable array is really long so I am trying to do this in one shot.
Ex: I want report_type -> Report Type done mid process
Is there a better way to solve this with sed? Thanks for any inputs much appreciated.
sed function is as follows
variableName=$1
sed -i "/name\=\"${variableName}\.name\" value\=model\.${variableName}\.name options\=\#lists\./c\\{\{\> \_dropdown title\=\"${variableName}\" required\=true name\=\"${variableName}\"\}\}" test
SAMPLE INPUT
{{> _select title="Report Type" required=true name="report_type.name" value=model.report_type.name options=#lists.report_type}}
SAMPLE EXPECTED OUPUT
{{> _dropdown title="Report Type" required=true name="report_type" value=model.report_type.name}}
sample input variable
report_type
Try this:
sed -E "s/^(\s+).*name\=\"(report_type)\.name\" value\=model\.report_type\.name options\=\#lists\..*$/\1\{\{\> \_dropdown title\=\"\2\" required\=true name\=\"\2\"\}\}/;T;s/\"(\w+)_(\w+)\"/\"\u\1 \u\2\"/g" input.txt > output.txt
I used "report_type" instead of ${variableName} for testing as an sed one-liner.
Please change back to ${variableName}.
Then go back to using -i (in addition to -E, which is for extended regex).
I am not sure whether I can do it without extended regex, let me know if that is necessary.
use s/// to replace fine tuned line
first capture group for the white space making the indentation
second capture group for the variable name
stop if that did not replace anything, T;
another s///
look for something consisting of only letters between "",
with a "_" between two parts,
seems safe enough because this step is only done on the already replaced line
replace by two parts, without "_"
\u for making camel case
Note:
Doing this on your sample input creates two very similar lines.
I assume that is intentional. Otherwise please provide desired output.
Using GNU sed version 4.2.1.
Interesting line of output:
{{> _dropdown title="Report Type" required=true name="Report Type"}}

How to remove all characters but dots and numbers

I need to clear all characters but numbers and dots in a file.
The numbers are formatted as follows:
$(24.50)
Im using the following code to accomplish the task:
sed 's/[^0-9]*//'
It works but the last parenthesis is not removed. After running the code i get:
24.50)
I should get:
24.50
Please help
I think you could use the following:
sed 's/[^0-9.]//g'
Your regular expression is only matching a single instance of [^0-9.]*. Namely, the $( at the beginning. In order to get sed to match and replace all instances, you need to put a g at the end, as in:
sed 's/[^0-9.]*//g'
The g basically means "match this regular expression anywhere in the input". By default, it will only match on the first instance it encounters, and then stop.

Resources