how to match this pattern using regex - ruby

I am new to ruby and trying to use regular expression.
Basically I want to read a file and check if it has the right format.
Requirements to be in the correct format:
1: The word should start with from
2: There should be one space and only one space is allowed, unless there is a comma
3: Not consecutive commas
4: from and to are numbers
5: from and to must contain a colon
from: z to: 2
from: 1 to: 3,4
from: 2 to: 3
from:3 to: 5
from: 4 to: 5
from: 4 to: 7
to: 7 from: 6
from: 7 to: 5
0: 7 to: 5
from: 24 to: 5
from: 7 to: ,,,5
from: 8 to: 5,,5
from: 9 to: ,5
If I have the correct regular expression, then the output should be:
from: 1 to: 3,4
from: 2 to: 3
from: 4 to: 5
from: 4 to: 7
from: 7 to: 5
from: 24 to: 5
so in this case these are the false ones:
from: z to: 2 # because starts with z
from:3 to: 5 # because there is no space after from:
to: 7 from: 6 # because it starts with to but supposed to start with from
0: 7 to: 5 # starts with 0 instead of from
from: 7 to: ,,,5 # because there are two consecutive commas
from: 8 to: 5,,5 # two consecutive commas
from: 9 to: ,5 # start with comma

OK, the regex you want is something like this:
from: \d+(?:,\d+)* to: \d+(?:,\d+)*
This assumes that multiple numbers are permitted in the from: column as well. If not, you want this one:
from: \d+ to: \d+(?:,\d+)*
To verify that the whole file is valid (assuming all it contains are lines like this one), you could use a function like this:
def validFile(filename)
File.open(filename).each do |line|
return false if (!/\d+(?:,\d+)* to: \d+(?:,\d+)*/.match(line))
end
return true
end

What you are looking for is called negative lookahead. Specifically, \d+(?!,,) which says: match 1 or more consecutive digits not followed by 2 commas. Here is the whole thing:
str = "from: z to: 2
from: 1 to: 3,4
from: 2 to: 3
from:3 to: 5
from: 4 to: 5
from: 4 to: 7
to: 7 from: 6
from: 7 to: 5
0: 7 to: 5
from: 24 to: 5
from: 7 to: ,,,5
from: 8 to: 5,,5
from: 9 to: ,5
"
str.each_line do |line|
puts(line) if line =~ /\Afrom: \d+ to: \d+(?!,,)/
end
Output:
from: 1 to: 3,4
from: 2 to: 3
from: 4 to: 5
from: 4 to: 7
from: 7 to: 5
from: 24 to: 5

Related

Windows CMD batch 'if' statement string check works sometimes with case-sensitivity

I have a simple CMD 'if' check:
set param=%9
IF %param%=="true" (
...
param is coming from Python and is a Boolean which can be True or False and which str converted version is again starting with upper-case: "True" or "False"
The problem here is when I run mentioned CMD script as Windows batch file on my PC, it works, despite the case difference and absence of quotes, but on other PC it works only if I add quotes and write uppercase "True":
IF "%param%"=="True" (
The only difference I can find on my and the other PC is that my Windows language is English and other one uses German. Both use Windows 10.
What can cause the CMD to work differently?
So, why not use findstr instead if:
With findstr will check if argument/parameter %~9 match with string True/False, and take action accorded results.
#echo off
echo/%~9|%__APPDIR__%Findstr "True False" >nul && (
if "%%~9"=="True" (set "_str=True") else set "_str=%%~9")
if not "%_str%"=="" echo/%_str% = %~9
Input with 9th arguments and then results
Q59717331.cmd 1 2 3 4 5 6 7 8 True
rem :: results ::
%~9 = True
Q59717331.cmd 1 2 3 4 5 6 7 8 False
rem :: results ::
%~9 = False
Q59717331.cmd 1 2 3 4 5 6 7 8 TRUE
rem :: results (no results/no match::
Q59717331.cmd 1 2 3 4 5 6 7 8 FALSE
rem :: results (no results/no match::
Q59717331.cmd 1 2 3 4 5 6 7 8 true
rem :: results (no results/no match::
Q59717331.cmd 1 2 3 4 5 6 7 8 false
rem :: results (no results/no match::

High & Low Numbers From A String (Ruby)

Good evening,
I'm trying to solve a problem on Codewars:
In this little assignment you are given a string of space separated numbers, and have to return the highest and lowest number.
Example:
high_and_low("1 2 3 4 5") # return "5 1"
high_and_low("1 2 -3 4 5") # return "5 -3"
high_and_low("1 9 3 4 -5") # return "9 -5"
Notes:
All numbers are valid Int32, no need to validate them.
There will always be at least one number in the input string.
Output string must be two numbers separated by a single space, and highest number is first.
I came up with the following solution however I cannot figure out why the method is only returning "542" and not "-214 542". I also tried using #at, #shift and #pop, with the same result.
Is there something I am missing? I hope someone can point me in the right direction. I would like to understand why this is happening.
def high_and_low(numbers)
numberArray = numbers.split(/\s/).map(&:to_i).sort
numberArray[-1]
numberArray[0]
end
high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6")
EDIT
I also tried this and receive a failed test "Nil":
def high_and_low(numbers)
numberArray = numbers.split(/\s/).map(&:to_i).sort
puts "#{numberArray[-1]}" + " " + "#{numberArray[0]}"
end
When omitting the return statement, a function will only return the result of the last expression within its body. To return both as an Array write:
def high_and_low(numbers)
numberArray = numbers.split(/\s/).map(&:to_i).sort
return numberArray[0], numberArray[-1]
end
puts high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6")
# => [-214, 542]
Using sort would be inefficient for big arrays. Instead, use Enumerable#minmax:
numbers.split.map(&:to_i).minmax
# => [-214, 542]
Or use Enumerable#minmax_by if you like result to remain strings:
numbers.split.minmax_by(&:to_i)
# => ["-214", "542"]

How can I filter through my groups/clusters to keep only the ones with different column2 values?

I have a file which looks something like this:
1 Ape 5138150 5140933
1 Ape 4289 7147
1 Ape 2680951 2683603
1 Ape 1484200 1486662
1 Baboon 3706008 3708636
1 Baboon 11745108 11747790
1 Baboon 3823683 3826474
2 Dog 216795245 216796748
2 Dog 14408 15922
3 Elephant 18 691
3 Ape 1 824
4 Frog 823145 826431
4 Sloth 35088 37788
4 Snake 1071033 1074121
5 Tiger 997421 1003284
5 Tiger 125725 131553
6 Tiger 2951524 2953649
6 Lion 178820 180879
Each group (or cluster) is indicated by the line number (e.g. all lines starting with 1 are in group 1) and different groups are separated by a blank line, as shown above. I'm interested in column 2. I want to keep all groups that have at least two different animals in column 2, but delete all groups that only have the one animal (i.e. species-specific groups). So with this file, I want to get rid of groups 2 and 5, but keep the others:
1 Ape 5138150 5140933
1 Ape 4289 7147
1 Ape 2680951 2683603
1 Ape 1484200 1486662
1 Baboon 3706008 3708636
1 Baboon 11745108 11747790
1 Baboon 3823683 3826474
3 Elephant 18 691
3 Ape 1 824
4 Frog 823145 826431
4 Sloth 35088 37788
4 Snake 1071033 1074121
6 Tiger 2951524 2953649
6 Lion 178820 180879
Is there a quick/easy way to do this? My actual file has over 10,000 different groups, so doing it manually is not a (sensible) option. I have a feeling I should be able to do this with awk, but no luck so far.
With GNU awk for length(array):
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
delete keys
for (i=1; i<=NF; i++) {
split($i,f," ")
keys[f[2]]
}
}
length(keys) > 1
$ awk -f tst.awk file
1 Ape 5138150 5140933
1 Ape 4289 7147
1 Ape 2680951 2683603
1 Ape 1484200 1486662
1 Baboon 3706008 3708636
1 Baboon 11745108 11747790
1 Baboon 3823683 3826474
3 Elephant 18 691
3 Ape 1 824
4 Frog 823145 826431
4 Sloth 35088 37788
4 Snake 1071033 1074121
6 Tiger 2951524 2953649
6 Lion 178820 180879
You can solve it with python:
group = []
animals = set()
with open('data') as f:
for l in f:
line = l.strip()
if line == '':
if len(animals) > 1:
for g in group:
print g
print ''
group = []
animals = set()
continue
group.append(line)
animals.add(line.split()[1])
if len(animals) > 1:
for g in group:
print g
data is the name of your input file.
Explanation:
Iterate over every line of the file.
If the line is not a blank line, we add the line to the group to being able to print it later. Also, we add the second column to the animals distinct set.
If it is a blank line, we check whether we had more than one animal in the group. In that case we print all the lines of the group. In any case, we reset the group and animals since we are starting a new group.
The lines outside of the loop are required to write the last group if it contains more than one animal and if the file does not end with a blank line.

How to match text in between two words(a..b) till the first occurence of the ending word(b)

Can anyone help me to match the text between From and first occurrence of Subject from the following set of lines,
Input
Random Line 1
Random Line 2
From: person#example.com
Date: 01-01-2011
To: friend#example.com
Subject: This is the subject line
Random Line 3
Random Line 4
Subject: This is subject
This is the end
Output
From: person#example.com
Date: 01-01-2011
To: friend#example.com
Subject: This is the subject line
I tried with the following regular expression,
/(From:.*(?i)Subject:.*?)\n/m
The above regexp selects till the last Subject
This works (see: http://rubular.com/r/Lw9rhfwVGt):
/(From.*?Subject.*?)\n/m

Suggestions for data extraction Data in fortran

I use F95/90 and IBM compiler. I am trying to extract the numerical values from block and write in a file. I am facing a strange error in the output which I cannot understand. Every time I execute the program it skips the loop between 'Beta' and 'END'. I am trying to read and store the values.
The number of lines inside the Alpha- and Beta loops are not fixed. So a simple 'do loop' is of no use to me. I tried the 'do while' loop and also 'if-else' but it still skips the 'Beta' part.
Alpha Singles Amplitudes
15 3 23 4 -0.186952
15 3 26 4 0.599918
15 3 31 4 0.105048
15 3 23 4 0.186952
Beta Singles Amplitudes
15 3 23 4 0.186952
15 3 26 4 -0.599918
15 3 31 4 -0.105048
15 3 23 4 -0.186952
END `
The simple short code is :
program test_read
implicit none
integer::nop,a,b,c,d,e,i,j,k,l,m,ios
double precision::r,t,rr
character::dummy*300
character*15::du1,du2,du3
open (unit=10, file="1.txt", status='old',form='formatted')
100 read(10,'(a100)')dummy
if (dummy(1:3)=='END') goto 200
if(dummy(2:14)=='Alpha Singles') then
i=0
160 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
do while(du1.ne.' Bet')
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'AS',du1,b,du2,c,du3,d,du4,e,r
goto 160
end do
elseif (dummy(2:14)=='Beta Singles') then
170 read(10,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
if((du1=='END'))then
stop
else
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')'BS',du1,b,du2,c,du3,d,du4,e,r
goto 170
end if
end if
goto 100
200 print*,'This is the end'
end program test_read
Your program never gets out of the loop which checks for Beta because when your while loop exits, it has already read the line with Beta. It then goes to 100 which reads the next line after Beta, so you never actually see Beta Singles. Try the following
character(len=2):: tag
read(10,'(a100)')dummy
do while (dummy(1:3).ne.'END')
if (dummy(2:14)=='Alpha Singles') then
tag = 'AS'
else if (dummy(2:14)=='Beta Singles') then
tag = 'BS'
else
read(dummy,'(a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')du1,b,du2,c,du3,d,du4,e,r
write(*,'(a2,a4,i2,a6,i1,a4,i2,a6,i1,f12.6)')tag,du1,b,du2,c,du3,d,du4,e,r
end if
read(10, '(a100)') dummy
end do
print*,'This is the end'

Resources