extract some info from a complete sentance using preg_match

extract some info from a complete sentance using preg_match - preg-match

I want to extract specific info using preg_match in php. Is there any idea?
sample text = 2xMUA Matte Lipstick - Totally Nude[300]=[600]
here i want to
product_name = MUA Matte Lipstick - Totally Nude
product_qty = 2
product_price = 300
product_subtotal = 600

You could capture your values in (named) captured groups.
Try it like this:
(?<qty>\d+)x(?<name>[^[]+)\[(?<price>[^]]+)\]=\[(?<subtotal>[^]]+)\]
That would match:
(?<qty> # Named captured group qty
\d+ # One or more digits
) # Close group
x # Match x
(?<name> # Named captured group name
[^[]+ # Match not [ one or more times
) # Close group
\[ # Match [
(?<price> # Named captured group price
[^]]+ # Match not ] one or more times
) # Close group
\]=\[ # Match ]=[
(?<subtotal> # Named captured group subtotal
[^]]+ # # Match not ] one or more times
) # Close group
\] # Match ]
Demo php
Or without named captured groups:
(\d+)x([^[]+)\[([^]]+)\]=\[([^]]+)\]

Related

Black is not honoring #fmt: skip

With below configuration in pyproject.toml :
[tool.black]
# How many characters per line to allow.
line-length = 120
# When processing Jupyter Notebooks, add the given magic to the list of known
# python-magics (timeit, prun, capture, pypy, python3, python, time).
# Useful for formatting cells with custom python magics.
# python-cell-magics =
# Require a specific version of Black to be running
# (useful for unifying results across many environments e.g. with a pyproject.toml file).
# It can be either a major version number or an exact version.
# required-version =
# A regular expression that matches files and directories that should be
# included on recursive searches. An empty value means all files are included
# regardless of the name. Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# include = "(\.pyi?|\.ipynb)$"
# A regular expression that matches files and directories that should be
# excluded on recursive searches. An empty value means no paths are excluded.
# Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# exclude = "/(\.direnv|\.eggs|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|venv|\.svn|\.ipynb_checkpoints|_build|buck-out|build|dist|__pypackages__)/"
# Like 'exclude', but adds additional files and directories on top of the excluded ones.
# (Useful if you simply want to add to the default).
# extend-exclude =
# Like 'exclude', but files and directories matching this regex will be excluded
# even when they are passed explicitly as arguments.
# force-exclude =
# The name of the file when passing it through stdin.
# Useful to make sure Black will respect 'force-exclude' option on some editors that rely on using stdin.
# stdin-filename =
# Number of parallel workers.
# Can be a number or a range.
# workers =
and this command line :
black --config "pyproject.toml" --target-version py39 --check --diff .
the following line of code is flagged :
ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # cspell: disable-line # fmt: skip
--- properties/datasets/models.py 2022-11-30 00:01:16.590743 +0000
+++ properties/datasets/models.py 2022-11-30 00:01:18.692767 +0000
## -746,11 +746,13 ##
calculate the mean value of all the dataset points
return: numerical value of this function when all variables are zero
rtype: float
"""
- ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # fmt:skip
+ ave_quantity = self.exec_math(
+ math_iterable["mean"], "mean", []
+ ) # execute the "mean" fxn on the dataset # fmt:skip
return getattr(ave_quantity, "magnitude", 0.0)
def serialize(self, flat=False):
return {
"type": "dataset",
would reformat properties/datasets/models.py
Oh no! 💥 💔 💥
1 file would be reformatted, 102 files would be left unchanged.
What am I missing here?
Using black v22.10.0
Also asked here --> https://github.com/psf/black/issues/451#issuecomment-1331478945

How to parse username, ID or whole part using Ruby Regex in this sentence?

I have a sentences like this:
Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?
And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself.
But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and ()
What I tried so far:
c = ''
f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i)
But no success.

You may use
/#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
See the regex demo
Details
# - a # char
\[ - a [
([^\]\[]*) - Group 1: 0+ chars other than [ and ]
\] - a ] char
\( - a ( char
[^()]*- 0+ chars other than ( and )
: - a colon
(\d+) - Group 2: 1 or more digits
\) - a ) char.
Sample Ruby code:
s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?"
rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
res = s.scan(rx)
puts res
# = > [["Pratha", "1"], ["John", "3"]]

"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/)
#⇒ ["#[Pratha](user:1)", "#[John](user:3)"]
Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).

You could use 2 capturing groups to get the names and the id's:
#\[([^]]+)]\([^:]+:([^)]+)\)
That will match
# Match literally
\[ Match [
([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class.
\( Match literally
[^:]+: Match not :, then match :
([^)]+) 2nd capturing group which matches not ) 1+ times
\) Match )
Regex demo | Ruby demo

Ruby Regular Expression for Parcelify (Shopify)

I'm trying to write a Regex in Ruby for a shipping query.
If postcodes match MK1 - MK10, MK19, MK43, MK46 or MK77, then allow it.
If postcodes match NN1 - NN7, NN12, NN13, NN29 or NN77, then allow it.
If postcodes match MK11 - MK18 then don't allow it.
My trouble is that in the UK our postcodes are a bit funny where you can put MK1 1TS and MK11TS and they're considered the same. By not allowing MK11, MK11TY could be misread as MK11.
I've written a regex below, and so far it will disallow MK111TS and MK11\s1TS, and allow MK1\s1TS but not MK11TS. Any help would be greatly appreciated, I've only tested this for MK11 so far.
^((?!MK11\d).)*$&^((?!MK11\s\d).)*$|(MK(1 |2 |3 |4 |5 |6 |7 |8 |9 |10 ))|(MK19)|(MK43)|(MK46)|(MK77)|(NN1)|(NN2)|(NN3)|(NN4)|(NN5)|(NN6)|(NN7)|(NN12)|(NN13)|(NN29)|(NN77)
Thanks in advance.

r = /
(?: # begin non-capture group
MK # match characters
(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77) # match one of the choices
| # or
NN # match characters
(?:1|2|3|4|5|6|7|12|13|29|77) # match one of the choices
) # end non-capture group
(?![^\sA-Z]) # do not match a space or cap letter
/ix # case indifferent and free-spacing
# regex definition mode
This is conventionally written
r = /(?:MK(?:1|2|...|10|19|...|77)|NN(?:1|2|...|7|12|13|29|77))(?![^\sA-Z])/i
"MK4 abc def MK11MK19ghi NN6 jkl NN13 NN29NN77".scan(r)
# => ["MK4", "NN6", "NN13", "NN29", "NN77"]
"MK11" is not matched because "11" is not in the list. "MK19" is not matched because it is followed by a character that is neither a space nor a capital letter.
Alternatively, one could write
s = (['MK'].product(%w{1 2 3 4 5 6 7 8 9 10 19 43 46 77}).map(&:join) +
['NN'].product(%w{1 2 3 4 5 6 7 12 13 29 77}).map(&:join)).join('|')
# => "MK1|MK2|...|MK10|MK19|MK43|MK46|MK77|NN1|NN2|...|NN7|NN12|NN13|NN29|NN77"
r = /(?:#{s})(?![^\sA-Z])/i
#=> /(?:MK1|MK2|...|MK10|MK19|...|MK77|NN1|NN2|...|NN7|NN12|NN13|NN29|NN77)(?![^\sA-Z])/
If the remainder of the postal code is to be included in the regex, perhaps something like the following could be done.
suffixes = %w|ES AB CD EF|.join('|')
#=> "ES|AB|CD|EF"
Then replace (?![^\sA-Z])/x with the following.
\s? # optionally match a space
(?:#{suffixes}) # match a valid suffix in a non-capture group
(?!\S) # do not match a non-whitespace char (negative lookahead)
/ix # case-indifferent and free-spacing regex definition mode
Note the negative lookahead is satisfied if the suffix is at the end of the string.

Now I have written the following to match the postcodes format exactly:
#format: Area Code, Localities accepted, whitespace (MKor not), any digit, any single character, any single character
((MK|mk|Mk|mK)(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77)\s\d[A-Za-z][A-Za-z]) #with whitespace
|
((MK|mk|Mk|mK)(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77)\d[A-Za-z][A-Za-z]) #without whitespace
|
((NN|nn|Nn|nN)(?:1|2|3|4|5|6|7|12|13|29|77)\s\d[A-Za-z][A-Za-z]) #with whitespace
|
((NN|nn|Nn|nN)(?:1|2|3|4|5|6|7|12|13|29|77)\d[A-Za-z][A-Za-z]) #without whitespace
This works for my purposes, I got here using Cary's answer, which has been extremely helpful. Thank you and have marked up.

Best way to capture multiple matches

Having in same text message fixed part once (id of item) and multiple lines (several references and dimensions of each part):
..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53
I want to parse and store IdCode, References, Array with dimensions.
When applying REGEX.match(my_text) method getting only first occurencies of REF and CMT:
REGEX = %r{
ID\/(?<IdCode> \d{10})\s
(REF\/(?<ReferenceCode> \w{3}\-\d{3}\-\d)\s)+
(CMT\/(?<Length> \d+)\-(?<Width> \d+)\-(?<Height> \d+)\s)+
}x
The result looks like this:
IdCode: "1100008273"
ReferenceCode: "D14-219-0"
Length: "37"
Width: "37"
Height: "20"
Is there a way to capture multiple occurrences without iterating ?

Suppose your string were:
str = %w| dog
ID/11000082734
REF/D14-109-0
REF/D14-209-0
CMT/49-41-31
CMT/44-34-53
cat
ID/11000082735
REF/D14-109-1
REF/D14-209-1
CMT/49-41-32
CMT/44-34-54
pig |.join("\n")
#=> "dog\nID/11000082734\nREF/D14-109-0\nREF/D14-209-0\nCMT/49-41-31\nCMT/44-34-53\ncat\nID/11000082735\nREF/D14-109-1\nREF/D14-209-1\nCMT/49-41-32\nCMT/44-34-54\npig"
Then you could write:
r = /(ID\/\d{11}) # match string in capture group 1
\n # match newline
((?:REF\/[A-Z]\d{2}-\d{3}-\d\n)+) # match consecutive REF lines in capture group 2
((?:CMT\/\d{2}-\d{2}-\d{2}\n)+) # match consecutive CMT lines in capture group 3
/x # free-spacing regex definition mode
arr = str.scan(r)
#=> [["ID/11000082734", "REF/D14-109-0\nREF/D14-209-0\n",
# "CMT/49-41-31\nCMT/44-34-53\n"],
# ["ID/11000082735", "REF/D14-109-1\nREF/D14-209-1\n",
# "CMT/49-41-32\nCMT/44-34-54\n"]]
This extracts the desired information without iterating.
At this point it may be desirable to convert arr to a more convenient data structure. For example:
arr.map do |a,b,c|
{ :id => a[/\d+/],
:ref => b.split("\n").map { |s| s[4..-1] },
:cmt => c.scan(/(\d{2})-(\d{2})-(\d{2})/).map { |e|
[:length, :width, :height].zip(e.map(&:to_i)).to_h }
}
end
#=> [{ :id=>"11000082734",
# :ref=>["D14-109-0", "D14-209-0"],
# :cmt=>[{ :length=>49, :width=>41, :height=>31 },
# { :length=>44, :width=>34, :height=>53 }
# ]
# },
# { :id=>"11000082735",
# :ref=>["D14-109-1", "D14-209-1"],
# :cmt=>[{ :length=>49, :width=>41, :height=>32 },
# { :length=>44, :width=>34, :height=>54 }
# ]
# }
# ]

Try this
(?<IdCode>\d{10,})|REF\/(?<ReferenceCode>\w{3}\-\d{3}\-\d)|CMT\/(?<Length>\d+)\-(?<Width>\d+)\-(?<Height>\d+)
Regex demo
Explanation:
( … ): Capturing group sample
?: Once or none sample
\: Escapes a special character sample
|: Alternation / OR operand sample
+: One or more sample
Input
..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53
Output:
MATCH 1
IdCode [29-40] `11000082734`
MATCH 2
ReferenceCode [45-54] `D14-109-0`
MATCH 3
ReferenceCode [59-68] `D14-209-0`
MATCH 4
ReferenceCode [73-82] `D14-219-0`
MATCH 5
Length [87-89] `59`
Width [90-92] `40`
Height [93-95] `25`
MATCH 6
Length [100-102] `38`
Width [103-105] `25`
Height [106-108] `28`
MATCH 7
Length [113-115] `59`
Width [116-118] `40`
Height [119-121] `25`
MATCH 8
Length [126-128] `37`
Width [129-131] `37`
Height [132-134] `20`
MATCH 9
Length [139-141] `40`
Width [142-144] `40`
Height [145-147] `20`
MATCH 10
Length [152-154] `37`
Width [155-157] `37`
Height [158-160] `20`
MATCH 11
Length [165-167] `49`
Width [168-170] `41`
Height [171-173] `31`
MATCH 12
Length [178-180] `44`
Width [181-183] `34`
Height [184-186] `53`

Ruby replace array list

I have two strings:
packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
I need a string pkgs that has the content of packages without exclusion like this:
pkgs="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"
I tried the following code:
pkgs = packages.gsub!( /(?<!^|,)#{exclusion}(?!,|$)/, '\1')
which does not seem to be working. What would be the best working solution in this case?

packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
(packages.split - exclusion.split).join(" ") # => "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"

You need your variables to be arrays, not strings. Then you can just use the - operator to "subtract" the items in exclusion from packages:
packages = [ "linux-image-3.2.0-4-amd64",
"linux-libc-dev",
"linux-headers-3.2.0-4-amd64",
"linux-headers-3.2.0-4-common",
"dnsutils",
"mysql-server-5.5" ]
exclusion = [ "dnsutils", "mysql-server-5.5" ]
remaining = packages - exclusion
# => [ "linux-image-3.2.0-4-amd64",
# "linux-libc-dev",
# "linux-headers-3.2.0-4-amd64",
# "linux-headers-3.2.0-4-common" ]
If you then need the values in a single string, join them together with the join method:
remaining_str = remaining.join(" ")
# => "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"

If you want to keep it simple, you can always split these strings into arrays, and join the difference.
(packages.split - exclusion.split).join ' '
String's split method will default to space characters. This give you two arrays, where you subtract the any values that exist in the both the first and second array from the first array. You then join this new array with space characters.
Longer example:
packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
one = packages.split
# >> ["linux-image-3.2.0-4-amd64", "linux-libc-dev", "linux-headers-3.2.0-4-amd64", "linux-headers-3.2.0-4-common", "dnsutils", "mysql-server-5.5"]
two = exclusion.split
# >> ["dnsutils", "mysql-server-5.5"]
difference = one - two
# >> ["linux-image-3.2.0-4-amd64", "linux-libc-dev", "linux-headers-3.2.0-4-amd64", "linux-headers-3.2.0-4-common"]
finished = difference.join ' '
# >> "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

extract some info from a complete sentance using preg_match - preg-match

I want to extract specific info using preg_match in php. Is there any idea? sample text = 2xMUA Matte Lipstick - Totally Nude[300]=[600] here i want to product_name = MUA Matte Lipstick - Totally Nude product_qty = 2 product_price = 300 product_subtotal = 600

Related

Black is not honoring #fmt: skip

How to parse username, ID or whole part using Ruby Regex in this sentence?

Ruby Regular Expression for Parcelify (Shopify)

Best way to capture multiple matches

Ruby replace array list

Categories

Resources