Can this be done in one regex? - ruby

I need a regex to match a string that:
has only digits 0-9 and spaces
all digits must be same
should have at-least 2 digits
should start and end with digits
Matches:
11
11111
1 1 1 1 1
1 1
11 1 1 1 1 1
1 1
1 1 1
No matches:
1 has only one digit
11111 has space at the end
11111 has space at beginning
12 digits are different
11: has other character
I know regex for each of my requirement. That way I'll use 4 regex
tests. Can we do it in one regex?

Yes it can be done in one regex:
^(\d)(?:\1| )*\1$
Rubular link
Explanation:
^ - Start anchor
( - Start parenthesis for capturing
\d - A digit
) - End parenthesis for capturing
(?: - Start parenthesis for grouping only
\1 - Back reference referring to the digit capture before
| - Or
- A literal space
) - End grouping parenthesis
* - zero or more of previous match
\1 - The digit captured before
$ - End anchor

Consider this program:
#!/usr/bin/perl -l
$_ = "3 33 3 3";
print /^(\d)[\1 ]*\1$/ ? 1 : 0;
print /^(\d)(?:\1| )*\1$/ ? 1 : 0;
It produces the output
0
1
The answer is obvious when you look at the compiled regexes:
perl -c -Mre=debug /tmp/a
Compiling REx "^(\d)[\1 ]*\1$"
synthetic stclass "ANYOF[0-9][{unicode_all}]".
Final program:
1: BOL (2)
2: OPEN1 (4)
4: DIGIT (5)
5: CLOSE1 (7)
7: STAR (19)
8: ANYOF[\1 ][] (0)
19: REF1 (21)
21: EOL (22)
22: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1
Compiling REx "^(\d)(?:\1| )*\1$"
synthetic stclass "ANYOF[0-9][{unicode_all}]".
Final program:
1: BOL (2)
2: OPEN1 (4)
4: DIGIT (5)
5: CLOSE1 (7)
7: CURLYX[1] {0,32767} (17)
9: BRANCH (12)
10: REF1 (16)
12: BRANCH (FAIL)
13: EXACT < > (16)
15: TAIL (16)
16: WHILEM[1/1] (0)
17: NOTHING (18)
18: REF1 (20)
20: EOL (21)
21: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1
/tmp/a syntax OK
Freeing REx: "^(\d)[\1 ]*\1$"
Freeing REx: "^(\d)(?:\1| )*\1$"
Backrefs are just regular octal characters inside character classes!!

^(\d)( *\1)+$

/^(\d)(\1| )*\1$/

Related

Problem with if condition on a "random walk" script

I'm trying to make the coordinate "x" randomly move in the interval [-1,1]. However, my code works sometimes, and sometimes it doesn't. I tried ShellCheck but it says "no issues detected!". I'm new to conditionals, am I using them wrong?
I'm running this on the windows subsystem for linux. I'm editing it on nano. Since I have a script that will plot 200 of these "random walks", the code should work consistenly, but I really don't understant why it doesn't.
Here's my code:
x=0
for num in {1..15}
do
r=$RANDOM
if [[ $r -lt 16383 ]]
then
p=1
else
p=-1
fi
if [[ $x -eq $p ]]
then
x=$(echo "$x-$p" | bc )
else
x=$(echo "$x+$p" | bc )
fi
echo "$num $x"
done
I expect something like this:
1 -1
2 0
3 1
4 0
5 1
6 0
7 1
8 0
9 1
10 0
11 -1
12 0
13 1
14 0
15 1
But the usual output is something like this:
1 1
2 0
3 -1
4 0
5 -1
6 0
7 -1
(standard_in) 1: syntax error
8
(standard_in) 1: syntax error
9
(standard_in) 1: syntax error
10
(standard_in) 1: syntax error
11
(standard_in) 1: syntax error
12
(standard_in) 1: syntax error
13
(standard_in) 1: syntax error
14
(standard_in) 1: syntax error
15
Always stopping after a -1.
You can do this with bash:
x=$(( x - p ))
or
(( x -= p ))
and you don't need bc.
Replace x=$(echo "$x-$p" | bc ) with x=$(echo "$x-($p)" | bc ) to avoid echo "-1--1" | bc.
One-liner equivalents to the OP's 18-line random walk script, using bash arithmetic evaluation:
x=0; printf '%-5s\n' {1..15}\ $(( x=(RANDOM%2 ? 1 : -1) * (x==0) ))
x=0; printf '%-5s\n' {1..15}\ $(( x=( x ? 0 : (RANDOM%2 ? 1 : -1) ) ))
Sample output of either, (the 2nd column will vary between runs):
1 -1
2 0
3 -1
4 0
5 1
6 0
7 1
8 0
9 -1
10 0
11 1
12 0
13 -1
14 0
15 -1
How it works:
echo {1..15}\ $(( ...some code... )) prints the numbers 1 to
15 followed by 15 instances of whatever result in the $(( ... )) code returns. One flaw with this approach is that with the resulting 15 pairs of numbers, (e.g. 1 -1, 2 0, etc.), each appears to bash as one string, rather than 30 separate numbers.
(RANDOM%2): the % is a modulo operator and here returns the remainder when divided by 2, which is either 0 or 1.
(x==0): $x can be one of three numbers, but if the previous value of $x was -1 or 1 the only legal random step is 0, so we only need a random number if the previous value of $x was 0.
The if logic is replaced with shortcuts of the form (expr?expr:expr); these use the same logic as the OP script.

Select a value in text file

I have a table of +500000 rows in a text file and I need to select those rows which follow a criteria. Its head column is "Quantity" and the value I want to get is 12 (integer value).
I use the Windows command line.
There are four columns. In this example, you can see the value "12", but there are more than twelve values.
As far as I know, it is not possible to do that by cmd.
You can use findstr command to find some specyficed strings in a text file, but cmd cant recognize column.
according to the comments, you want to delete each line, that has a 12 as second value (word). This can be done even with findstrs crippled REGEX support, which makes it a simple one-line command:
findstr /rvc:"^[^ ]* 12 " input.txt>output.txt
findstr switches: r: Regex-Support, v exclude findings, c: "literal" (needed because of the spaces)
REGEX:
^: Start of the line
[^ ]: any character ([] that is not (^) a space ()
*: maybe more of such
12: space-12-space
Results in finding any line, that starts with one or more non-space characters, followed by a space, a 12 and another space. (or ignoring them with /v)
Record of my cmd session:
>type t.txt
123.54 12 1 5
123.52 12 1 4
12.52 12 1 3
423.05 11 2 4
41 10 1 6
12 22 33 4
411,26 5 12 4
>findstr /rvc:"^[^ ]* 12 " t.txt>output.txt
>type output.txt
423.05 11 2 4
41 10 1 6
12 22 33 4
411,26 5 12 4
>findstr /rc:"^[^ ]* 12 " t.txt>output.txt
>type output.txt
123.54 12 1 5
123.52 12 1 4
12.52 12 1 3
>

How to convert unary fractional digits to binary fractional digits (is there a way?)

I worked out how how to convert unary integers to binary integers by grouping pairs with or without rest.
an example is given here by the decimal value of 9
LW RW
0a) 111111111 | ->
0b) 1(11)(11)(11)(11) |
1a) 1 1 1 1 | 1 (by no rest the RW (right value would be 0)
1b) (1 1) (1 1) |
2a) 1 1 | 0 1
2b) (1 1) |
3a) 1 | 0 0 1
4) | 1 0 0 1 (which is 9)
Can you help me find the äquivalent form for the convertation of fractional unary digits to binary ones? Is there a possibility?
thx in advance

Merge all line starting with IF and ending with semicolon in shell script

i am stuck in below problem. I have done this with while loop and it is working fine but not able to omit the sequence number appended with string in each loop.
i have a file content
1 SELECT abc from a ;
2 .IF activi <> 1
3 THEN
4 QUIT;
5 .IF ERROR <> 0 THEN QUIT ERROR;
6 SELECT
7 a,
8 b,
9 c
10 FROM xyz;
11 .IF ERROR <> 0
12 THEN
13 QUIT ERROR;
i want to edit in same file
with the following output
1 SELECT abc from a;
2 .IF activi <> 1 THEN QUIT ;
5 .IF ERROR <> 0 THEN QUIT ERROR;
6 SELECT
7 a,
8 b,
9 c
10 FROM xyz;
11 .IF ERROR <> 0 THEN QUIT ERROR;
With sed:
$ sed '/\.IF/{:a;/; *$/!{N;s/ *\n *[0-9]*//;ta}}' file
Output:
1 SELECT abc from a ;
2 .IF activi <> 1 THEN QUIT;
5 .IF ERROR <> 0 THEN QUIT ERROR;
6 SELECT
7 a,
8 b,
9 c
10 FROM xyz;
11 .IF ERROR <> 0 THEN QUIT ERROR;
Explanation:
sed '
/\.IF/ { # for lines containing ".IF"
:a; # define a "a" label for upcoming loop
/; *$/! { # if line does not contain ";"
N; # add next line to pattern space
s/ *\n *[0-9]*//; # remove newline and leading digits
ta; # loops to label "a" if ";" is not found
}
}' file

Combining lines that are tab delimited

I've got all the lines in a proteins_num sorted numerically, I now need to combine the lines with identical number in a way so that new information is added to the upper line:
When I've sorted all the lines numerically, I need to combinde the lines with identical number in a way so that new information is added to the upper line. Take for instance the lines with no 61:
: Col | : 1 | : 2 | : 3 | : 4 | : 5 | : 6 | :7 | : 8 | : 9 | : 10 | : 11
: ----| : 61| :PTS... cyt 1bl.. 0,38 MONOMER homo-trimer FRUC... PER...Bac..
61 PTS... 3
becomes:
Col 1 2 3 4 5 6 7 8 9 10 11
61 PTS... cyt 1bl.. 0,38 MONOMER homo-trimer FRUC... PER...Bac.. 3
Sometimes there'll be information missing in some columns in the upper line that is found in the lower one. Therefore the order of joining must be concise.
Is If there are info in both lines that doable?
The file is here with 1021 lines
https://www.dropbox.com/s/yuu46crp7ql4z65/Proteins_num.txt?dl=0
An awk/gawk solution could be:
gawk '
BEGIN { SEQ="" };
$1 == SEQ { $1=""; printf("%s\t",$0)};
$1 != SEQ { SEQ=$1; printf("\n%s",$0);}
' Proteins_num.txt
where SEQ is the number at beginning of line. When it detects a numeration change, print last line with carriage return. If no change is detected, line is printed without break line, to join with next line. File must be numerical sorted previously.

Resources