What is a regex to capture the total amount from strings? [closed] - ruby

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I need to parse the total amount from different files. The layout of each file is different so the lines I need to parse vary.
What should be the regex for capturing from a sting a number that falls after "Total"?
It needs to be case insensitive and should consider the closest match after "Total". There can be anything before or after the word "Total", and I need the first number that comes after it.
For example:
from string "Service charges: 10 Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount: 100 Shipping: 10"
from string "Service charges: 10 Grand Total: 100 Shipping: 10"
from string "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"
The output should be 100 in all the above cases.

If all you're really asking about is a pattern match for various strings, look at using scan and grab the numeric strings:
[
"Service charges: 10 Total: 100 Shipping: 10",
"Service charges: 10 Total Amount: 100 Shipping: 10",
"Service charges: 10 Grand Total: 100 Shipping: 10",
"Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s.scan(/\d+/)[1] }
=> ["100", "100", "100", "100"]
This assumes you want the second number in each string.
If that order is going to change, which is unlikely because it looks like you're scanning invoices, then variations on the pattern and/or scan will work. This switches it up and uses a standard regex search based on the location of "Total", some possible intervening text, followed by ":" and the total value:
[
"Service charges: 10 Total: 100 Shipping: 10",
"Service charges: 10 Total Amount: 100 Shipping: 10",
"Service charges: 10 Grand Total: 100 Shipping: 10",
"Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1] }
=> ["100", "100", "100", "100"]
To get the integer values append to_i inside the map statement:
[
"Service charges: 10 Total: 100 Shipping: 10",
"Service charges: 10 Total Amount: 100 Shipping: 10",
"Service charges: 10 Grand Total: 100 Shipping: 10",
"Service charges: 10 Total Amount (Rs.): 100 Shipping: 10",
].map{ |s| s[/Total.*?: (\d+)/, 1].to_i }
=> [100, 100, 100, 100]
For your example strings, it's probably preferable to use case-sensitive patterns to match "Total" unless you have knowledge that you will encounter "total" in lower-case. And, in that case, you should show such an example.

I think you can do this:
/Total[^:]*:\s+([0-9]+)/i
Explanation:
Total seach for "total"
[^:]* followed by anything or nothing until a colon ":" is found
:\s+ read over the colon and any following white space (maybe take * instead of +)
([0-9]+) read the numbers into a group for later retrieval -> 100
I am not sure how to indicate case insensitivity in the environment you use, but usually this can be done with some flags like I indicated with the i
here is a fiddle as an example

# assuming you have all your files ready in an array
a = ["Service charges: 10 Total: 100 Shipping: 10", "Service charges: 10 Total Amount: 100 Shipping: 10", "Service charges: 10 Grand Total: 100 Shipping: 10", "Service charges: 10 Total Amount (Rs.): 100 Shipping: 10"]
# we find every total with the following regexp
a.map {|s| s[/total[^\d]*(?<total>\d+)/i, 'total']}
#=> ["100", "100", "100", "100"]
The regexp is /total[^\d]*(?<total>\d*)/i. It looks for the word "total" and ignores any following character, until it finds a number (which it returns in a capture group). The i option makes it case insensitive.

Related

how to grab text after newline and concat each line to make a new one in a text file no clean of spaces, tabs

I have a text like this:
Print <javascript:PrintThis();>
www.example.com
Order Number: *912343454656548 * Date of Order: November 54 2043
------------------------------------------------------------------------
*Dicders Folcisad:
* STACKOVERFLOW
*dum FWEFaadasdd:* ‎[U+200E] ‎
STACK OVERFLOW
BLVD OF SOMEPLACENICE 434
SANTA MONICA, COUNTY
LOS ANGEKES, CALI 90210
(SW)
*Order Totals:*
Subtotal Usd$789.75
Shipping Usd$87.64
Duties & Taxes Usd$0.00 ‎
Rewards Credit Usd$0.00
*Order Total * *Usd$877.39 *
*Wordskccds:*
STACKOVERFLOW
FasntAsia
xxxx-xxxx-xxxx-
*test Method / Welcome Info *
易客满x京配个人行邮税- 运输 + 关税 & 税费 / ADHHX15892013504555636
*Order Number: 916212582744342X*
*#* *Item* *Price* *Qty.* *Discount* *Subtotal*
1
Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535
Usd$141.92 4 -Usd$85.16 Usd$482.52
2
Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg,
60 Stresss CXB-034251
Usd$192.24 1 -Usd$28.83 Usd$163.41
3
34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Tablesds XDF-38452
Usd$169.20 1 -Usd$25.38 Usd$143.82
*Extra Discounts:* Extra 15% discounts applied! Usd$139.37
*Stackoverflox Contact Information :*
*Web: *www.example.com
*Disclaimer:* something made, or service sold through this website,
have not been test by the sweden Spain norway and Dumrug
Advantage. They are not intended to treet, treat, forsee or
forshadow somw clover.
I'm trying to grab each line that start with number, then concat second line, and finally third line. example text:
1 Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535 Usd$141.92 4 -Usd$85.16 Usd$482.52
2 Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg, 60 Stresss CXB-034251 Usd$192.24 1 -Usd$28.83 Usd$163.41 <- 1 line
3 34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Wedscsd XDF-38452 Usd$169.20 1 -Usd$25.38 Usd$143.82 <- 1 lines as first
as you may notices Second line has 3 lines instead of 2 lines. So make it harder to grab.
Because of the newline and whitespace, the next command only grabs 1:
grep -E '1\s.+'
also, I have been trying to make it with new concats:
grep -E '1\s|[A-Z].+'
But doesn't work, grep begins to select similar pattern in different parts of the text
awk '{$1=$1}1' #done already
tr -s "\t\r\n\v" #done already
tr -d "\t\b\r" #done already
I'm trying to make a script, so I give as an ARGUMENT a not clean FILE and then grab the table and select each number with their respective data. Sometimes data has 4 lines, sometimes 3 lines. So copy/paste don't work for ME.
I think the last line to be joined is the line starting with "Usd". In that case you only need to change the formatting in
awk '
!orderfound && /^[0-9]/ {ordernr++; orderfound=1 }
orderfound { order[ordernr]=order[ordernr] " " $0 }
$1 ~ "Usd" { orderfound = 0 }
END {
for (i=1; i<=ordernr; i++) { print order[i] }
}' inputfile

Spring Boot Actuator 'http.server.requests' metric MAX time

I have a Spring Boot application and I am using Spring Boot Actuator and Micrometer in order to track metrics about my application. I am specifically concerned about the 'http.server.requests' metric and the MAX statistic:
{
"name": "http.server.requests",
"measurements": [
{
"statistic": "COUNT",
"value": 2
},
{
"statistic": "TOTAL_TIME",
"value": 0.079653001
},
{
"statistic": "MAX",
"value": 0.032696019
}
],
"availableTags": [
{
"tag": "exception",
"values": [
"None"
]
},
{
"tag": "method",
"values": [
"GET"
]
},
{
"tag": "status",
"values": [
"200",
"400"
]
}
]
}
I suppose the MAX statistic is the maximum time of execution of a request (since I have made two requests, it's the the time of the longer processing of one of them).
Whenever I filter the metric by any tag, like localhost:9090/actuator/metrics?tag=status:200
{
"name": "http.server.requests",
"measurements": [
{
"statistic": "COUNT",
"value": 1
},
{
"statistic": "TOTAL_TIME",
"value": 0.029653001
},
{
"statistic": "MAX",
"value": 0.0
}
],
"availableTags": [
{
"tag": "exception",
"values": [
"None"
]
},
{
"tag": "method",
"values": [
"GET"
]
}
]
}
I am always getting 0.0 as a max time. What is the reason of this?
What does MAX represent (MAX Discussion)
MAX represents the maximum time taken to execute endpoint.
Analysis for /user/asset/getAllAssets
COUNT TOTAL_TIME MAX
5 115 17
6 122 17 (Execution Time = 122 - 115 = 17)
7 131 17 (Execution Time = 131 - 122 = 17)
8 187 56 (Execution Time = 187 - 131 = 56)
9 204 56 From Now MAX will be 56 (Execution Time = 204 - 187 = 17)
Will MAX be 0 if we have less number of request (or 1 request) to the particular endpoint?
No number of request for particular endPoint does not affect the MAX (see an image from Spring Boot Admin)
When MAX will be 0
There is Timer which set the value 0. When the endpoint is not being called or executed for sometime Timer sets MAX to 0. Here approximate timer value is 2 to 2.30 minutes (120 to 150 seconds)
DistributionStatisticConfig has .expiry(Duration.ofMinutes(2)) which sets the some measutement to 0 if there is no request has been made for last 2 minutes (120 seconds)
Methods such as public TimeWindowMax(Clock clock,...), private void rotate() Clock interface has been written for the same. You may see the implementation here
How I have determined the timer value?
For that, I have taken 6 samples (executed the same endpoint for 6 times). For that, I have determined the time difference between the time of calling the endpoint - time for when MAX set back to zero
MAX property belongs to enum Statistic which is used by Measurement
(In Measurement we get COUNT, TOTAL_TIME, MAX)
public static final Statistic MAX
The maximum amount recorded. When this represents a time, it is
reported in the monitoring system's base unit of time.
Notes:
This is the cases from metric for a particular endpoint (here /actuator/metrics/http.server.requests?tag=uri:/user/asset/getAllAssets).
For generalize metric of actuator/metrics/http.server.requests
MAX for some endPoint will be set backed to 0 due to a timer. In my view for MAX for /http.server.requests will be same as a particular endpoint.
UPDATE
The document has been updated for the MAX.
NOTE: Max for basic DistributionSummary implementations such as
CumulativeDistributionSummary, StepDistributionSummary is a time
window max (TimeWindowMax). It means that its value is the maximum
value during a time window. If the time window ends, it'll be reset to
0 and a new time window starts again. Time window size will be the
step size of the meter registry unless expiry in
DistributionStatisticConfig is set to other value explicitly.
You can see the individual metrics by using ?tag=url:{endpoint_tag} as defined in the response of the root /actuator/metrics/http.server.requests call. The details of the measurements values are;
COUNT: Rate per second for calls.
TOTAL_TIME: The sum of the times recorded. Reported in the monitoring system's base unit of time
MAX: The maximum amount recorded. When this represents a time, it is reported in the monitoring system's base unit of time.
As given here, also here.
The discrepancies you are seeing is due to the presence of a timer. Meaning after some time currently defined MAX value for any tagged metric can be reset back to 0. Can you add some new calls to your endpoint then immediately do a call to /actuator/metrics/http.server.requests to see a non-zero MAX value for given tag?
This is due to the idea behind getting MAX metric for each smaller period. When you are seeing these metrics, you will be able to get an array of MAX values rather than a single value for a long period of time.
You can get to see this in action within Micrometer source code. There is a rotate() method focused on resetting the MAX value to create above described behaviour.
You can see this is called for every poll() call, which is triggered every some period for metric gathering.

How to find an expression in a text file and process all lines until the next occurrence of the expression and repeat until end of the file

I have a text file:
Some comment on the 1st line of the file.
processing date: 31.8.2016
amount: -1.23
currency: EUR
balance: 1234.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY
processing date: 30.8.2016
amount: -2.23
currency: EUR
balance: 12345.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 28.08.2016 Place: 123456789XY
processing date: 29.8.2016
amount: -3.23
currency: EUR
balance: 123456.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 27.08.2016 Place: 123456789XY
I need to process the file so I will have the values on the right side, 31.8.2016, -1.23, EUR, 1234.56, etc., stored in a MySQL database.
I only achieved returning either 1 occurrence of the line which contains a particular string or all the lines using find or find_all, but this is not sufficient as I somehow need to identify the block starting with "processing date:" and ending with "additional info:" and process the values there, then process next block, and next, until the end of the file.
Any hints how to achieve this?
I'd start with this:
File.foreach('data.txt', "\n\n") do |li|
next unless li[/^processing/]
puts "'#{li.strip}'"
end
If "data.txt" contains your content, foreach will read the file and return paragraphs, not lines, of text in li. Once you have those you can manipulate them as you need. This is very fast and efficient and doesn't have the scalability problems readlines or any read-based I/O could have.
This is the output:
'processing date: 31.8.2016
amount: -1.23
currency: EUR
balance: 1234.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'
'processing date: 30.8.2016
amount: -2.23
currency: EUR
balance: 12345.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 28.08.2016 Place: 123456789XY'
'processing date: 29.8.2016
amount: -3.23
currency: EUR
balance: 123456.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 2
additional info: Amount: 2.23 EUR 27.08.2016 Place: 123456789XY'
You can see by the wrapping ' that the file is being read in chunks or paragraphs delineated by "\n\n" then each chunk is stripped to remove trailing blanks.
See the foreach documentation for more information.
split(':', 2) is your friend:
'processing date: 31.8.2016'.split(':', 2) # => ["processing date", " 31.8.2016"]
'amount: -1.23'.split(':', 2) # => ["amount", " -1.23"]
'currency: EUR'.split(':', 2) # => ["currency", " EUR"]
'balance: 1234.56'.split(':', 2) # => ["balance", " 1234.56"]
'payer reference: /VS123456/SS0011223344/KS1212'.split(':', 2) # => ["payer reference", " /VS123456/SS0011223344/KS1212"]
'type of the transaction: Some type of the transaction 1'.split(':', 2) # => ["type of the transaction", " Some type of the transaction 1"]
'additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'.split(':', 2) # => ["additional info", " Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"]
From that you can do:
text = 'processing date: 31.8.2016
amount: -1.23
currency: EUR
balance: 1234.56
payer reference: /VS123456/SS0011223344/KS1212
type of the transaction: Some type of the transaction 1
additional info: Amount: 1.23 EUR 29.08.2016 Place: 123456789XY'
text.lines.map{ |li| li.split(':', 2).map(&:strip) }.to_h
# => {"processing date"=>"31.8.2016", "amount"=>"-1.23", "currency"=>"EUR", "balance"=>"1234.56", "payer reference"=>"/VS123456/SS0011223344/KS1212", "type of the transaction"=>"Some type of the transaction 1", "additional info"=>"Amount: 1.23 EUR 29.08.2016 Place: 123456789XY"}
There are a number of ways to continue parsing the information into more usable data but that's for you to figure out.

How to exchange correctly using the Money gem

I use the Money gem to work with different currencies. I'm seeing a strange behavior with the "JPY" currency.
I have the following rates:
config.add_rate('USD', 'EUR', 0.92)
config.add_rate('USD', 'JPY', 123.0)
Trying to exchange currencies, I get strange results:
10.to_money.exchange_to('EUR')
=> #<Money fractional:920 currency:EUR>
10.to_money.exchange_to('JPY')
=> #<Money fractional:1230 currency:JPY>
The "JPY" conversion should be #<Money fractional:123000 currency:JPY>. Any ideas on what's going on?
It really depends on definition of Currency. Below code shows that 10 USD is indeed equal to 1230 yen.
require "rails"
require "money-rails"
Money.add_rate('USD', 'EUR', 0.92)
Money.add_rate('USD', 'JPY', 123.0)
p 10.to_money.exchange_to('JPY') == Money.new(1230,"JPY")
#=> true
Your expectation that you should see 123000 may not be correct if you inspect the JPY currency
p Money.new(1230,"JPY").currency
#<Money::Currency id: jpy, priority: 6, symbol_first: true, thousands_separator: ,, html_entity: ¥, decimal_mark: ., name: Japanese Yen, symbol: ¥, subunit_to_unit: 1, exponent: 0.0, iso_code: JPY, iso_numeric: 392, subunit: , smallest_denomination: 1>
Important field to note in Currency definition is the value of subunit_to_unit: 1. As per documentation:
:subunit_to_unit the proportion between the unit and the subunit
This means that in case of Yen, the value displayed is in Yen, and it need not be multiplied by 100 as is the case with USD or EUR.
p 10.to_money.exchange_to('EUR')
#=> #<Money fractional:920 currency:EUR>
p 10.to_money.exchange_to('JPY')
#=> #<Money fractional:1230 currency:JPY>
Below is Currency definition for EUR
#<Money::Currency id: eur, priority: 2, symbol_first: true, thousands_separator: ., html_entity: €, decimal_mark: ,, name: Euro, symbol: €, subunit_to_unit: 100, exponent: 2.0, iso_code: EUR, iso_numeric: 978, subunit: Cent, smallest_denomination: 1>
In case of EUR, subunit_to_unit: 100 indicates that value is in cents (or equivalent)

symfony2 translating errormessages

I want to translate errormessages inside validation.yml.
If I have a normal "NotBlank" rule, it works like following:
- NotBlank: { message: not.blank.firstname }
But what if there are some further rules like:
- NotBlank: { message: not.blank.username }
- Length:
min: 7
max: 50
minMessage: "Your Username must be at least {{ limit }} characters length"
This works, but how should I handle the minMessage? Also for the reason that I want to give USers some hints about the min Length of the input.
You can do something like this:
- NotBlank: { message: not.blank.username }
- Length:
min: 7
max: 50
minMessage: 'username.minLength'
maxMessage: 'username.maxLength'
Your validators.LANG.yml:
username:
minLength: "Your Username must be at least 7 characters length"
maxLength: "Your Username must be at least 50 characters length"

Resources