oracle - regex to find parts of a version number - oracle

I have a field in a table for software-style version numbers, i.e., one to three numeric values separated by "dots" but these are not decimal numbers. For example the version after 3.9 might be 3.10, then 3.11, etc. I'm trying to build a query to find the different parts of the version number (major version, minor version, and build#), using REGEX_SUBSTR in Oracle:
SELECT version,
to_number(regexp_substr(version, '[^.]+', 1)) major,
to_number(regexp_substr(version, '[^.]+', 2)) minor,
to_number(regexp_substr(version, '[^.]+', 3)) build
FROM mytable
However, I'm getting some weird behavior which tells me I don't have the regex quite right. For example, if my version number has two or three single-digit parts, the second digit comes up as both the "minor" and "build" numbers. Here are some samples of the results
"3" -> 3/null/null (the intended behavior)
"3.0" -> 3/0/0
"3.0.1" -> 3/0/0
"33.0" -> 33/3/0
How can I change the regex (or the whole query) to more easily get the parts correctly, and nulls for parts that aren't present?
FYI, I adapted these expressions from this old question about sorting by version numbers: Find the greatest version

You're missing an argument to regexp_substr(); you're supplying the starting position for each search, not the occurrence you want to match.
At the moment it's doing:
major: start at the first character, find any non-period characters.
minor: start at the second character, find any non-period characters.
build: start at the third character, find any non-period characters.
which means for your values:
"3" - major starts at "3" and stops, and finds nothing for minor/build.
"3.0" - major starts at "3" and stops at the first period; minor starts at the second character, so ".0", skips the period because it has to find at least one non-period so moves on to the zero, and stops at the end; build starts at the third character, so "0", and stops at the end.
"3.0.1" - major starts at "3" and stops at the first period; minor starts at the second character, so ".0", skips the first period because it has to find at least one non-period so moves on to the zero, and stops at the second period; build starts at the third character, so "0", and also stops at second period.
"33.3" - major starts at first "3" and stops at the first period, giving 33; minor starts at the second character, so the second "3", and stops at the first period; build starts at the third character, so ".0", skips the period because it has to find at least one non-period so moves on to the zero, and stops at the end, giving 0.
You need to supply both of those arguments:
SELECT version,
to_number(regexp_substr(version, '[^.]+', 1, 1)) major,
to_number(regexp_substr(version, '[^.]+', 1, 2)) minor,
to_number(regexp_substr(version, '[^.]+', 1, 3)) build
FROM mytable
VERSION
MAJOR
MINOR
BUILD
3
3
null
null
3.0
3
0
null
3.0.1
3
0
1
33.0
33
0
null
fiddle

Related

Confusion regarding Delay inside an always block in Verilog

I am referring to the popular paper: Correct Methods For Adding Delays To Verilog Behavioral Models by Cummings
http://www.sunburst-design.com/papers/CummingsHDLCON1999_BehavioralDelays_Rev1_1.pdf
always #(a or b or ci)
#12 {co, sum} = a + b + ci;
From the paper,
a input changes at time 15 as shown in Figure 3, then if
the a, b and ci inputs all change during the next 9ns, the
outputs will be updated with the latest values of a, b and
ci. This modeling style has just permitted the ci input to
propagate a value to the sum and carry outputs after
only 3ns instead of the required 12ns propagation delay.
Can anyone tell me why is this true?
We are using a blocking assignment here. So,shouldn't the always block remain inactive from 15 to 27 ns because the current always block is not completed? But here it remains active , means always block gets triggered whenever a change is noticed.
It helps to reformat the code with a different layout to more accurately capture the functionality that is happening
always
#(a or b or ci) // line 1
#12 // line 2
{co, sum} = a + b + ci; // line 3
The first line suspends the always process until it sees a change in one of the listed signals, which happens at time 15.
The second line suspends the process for 12 time units. During this time period, there are changes to a, b, and ci that are ignored.
The third line wakes up at time 27 (15+12) and uses the current values of a, b, and ci and makes a blocking assignment to {co, sum}.
Since this is a always block, after the assignment, it loops back to the first line and waits for another change.

Print lines around position in the file

I'm importing a big csv (5gb) file to the BiqQuery and I had information about an error in the file and its position — specified as a byte offset from the start of the file (for example, 134683757). I'd like to look at lines around this error position.
Some example lines of the file:
field1, field2, field3
abc, bcd, efg
...
dge, hfr, kdf,
dgj, "a""a", fbd # in this line is an invalid csv element and I get error, let's say on the position 134683757
skd, frd, lqw
...
asd, fij, fle
I need some command to show lines around error like
dge, hfr, kdf,
dgj, "a""a", fbd
skd, frd, lqw
I tried sed and awk but I didn't find any simple solution.
It was definitely not clear from the original version of the question that you only got a byte offset from the start of the file.
You need to get a better position from the software generating the error; the developer was lazy in reporting an unusable number. It is reasonable to request a line number (and preferably offset within the line), rather than (or as well as) the byte offset from the start.
Assuming that the number is a byte position in the file, that gets tricky. Most Unix utilities work with lines (of variable length). I'd be tempted to write some C code to do the job, but that might be beyond you (and no shame in that).
Failing that, your best is likely the dd command. If the number reported is 134683757, then I'd guess that your lines are probably not more than 1 KiB each (adjust numbers if they're bigger, or smaller), and then use:
dd if=big.csv of=extract.csv bs=1 skip=$((134683757 - 3 * 1024)) count=6144
echo >> extract.csv
You'd then look at extract.csv. The raw dd output probably won't have a newline at the end of the last line (the echo >>extract.csv fixes that). The output will probably start part way through a record and end part way through another record. However, you're likely to have the relevant information, as well as some irrelevant information. As I said, adjust the numbers to suit your exact situation.
The trickiest part is identifying exactly where the byte offset is in the file you get. With custom C code, that can be provided easily (more easily). With the output from dd, you have to do the calculation yourself.
awk -v offset=$((134683757 - 3 * 1024)) '
{ printf "%9d: %s\n", offset, $0; offset += length($0) + 1 }
' extract.cvs
That takes the starting offset from the dd command, and prefixes the (remnants of) the first line with that number and the data; it then adds the length to the offset plus one for the newline that wasn't counted, and continues to the end of the file. That gives you the start offset for each line in the extracted data. You can see where your actual start was by looking at the offsets — you should be able to identify which record that was.
You could use a variant of this Awk script that reads the whole file line by line, and tracks the offset (as well as the line numbers) and prints the data when it gets to the vicinity of where you have the problem.
In times long past, I had to deal with data from 1/2 inch mag tapes (those big circular tapes you see in old movies) where the files generated on a mainframe seemed sanely formatted for the first few tens of megabytes, but then the format changed to some alternative format for a few megabytes, and then reverted to the original format once more. I never did find out why; I just learned how to deal with it. Trial and error!

In Asciidoc (and Asciidoctor), how do I format console output best?

In my adoc document, I want to show some output logging to the console.
Should I use [source], [source,shell] or nothing before the ----?
----
Solving started: time spent (67), best score (-20init/0hard/0soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
CH step (0), time spent (128), score (-18init/0hard/0soft), selected move count (15), picked move ([Math(101) {null -> Room A}, Math(101) {null -> MONDAY 08:30}]).
CH step (1), time spent (145), score (-16init/0hard/0soft), selected move count (15), picked move ([Physics(102) {null -> Room A}, Physics(102) {null -> MONDAY 09:30}]).
----
I'd argue it's not really source code (it's output) and I definitely don't output text that happen to contain shell language syntax to be code colored as shell language (because it's not).
The [source] and plain ---- notations are identical in this case. I would use either (your preference), without the shell type specifier, to get plaintext.

`objdump` MIPS64 Instruction Encoding—Nonexistent Instruction?

I have a MIPS64 binary (readelf tells me it's release 2), and using a corresponding objdump I can see that the first instruction of __start is:
1200009a0: 03e00025 move zero,ra
I do not understand this. Looking at the ISA[note], the opcode (first six bits) is 000000₂, corresponding to the "special" block with function 100101₂ (last six bits): the or instruction (ref. pg. 413). In any case, we see that move is not an instruction anyway (ref. §3.2).
However, I notice that some other instructions present in the file exist and are encoded correctly.
What's going on? Is this an error in objdump or something? How do I resolve it?
[note]Apparently MIPS64 comes in six revisions. Revisions 1–5 are mostly compatible, while release 6 changes many things. I wasn't able to find a release 2 specification, so I linked revision 5. move doesn't occur at least in releases 1, 5, or 6, which is all the specifications I could find.

lua math.random first randomized number doesn't reroll

so I'm new to LUA and am writing a simple guess-the-number script, but I've found a weird quirk that happens with math.random and I would like to understand what's happening here.
So I create a random seed with math.randomseed(os.time()), but when I go to get a random number, like this:
correctNum = math.random(10)
print(correctNum),
it always gets the same random number everytime I run it, unless I do it twice (irrespective of arguments given):
random1 = math.random(10)
print(random1)
random2 = math.random(10)
print(random2),
in which case the first random number will never reroll on every execution, but the second one will.
Just confused about how randomization works in LUA and would appreciate some help.
Thanks,
-Electroshockist
Here is the full working code:
math.randomseed(os.time())
random1 = math.random(10)
print(random1)
random2 = math.random(10)
print(random2)
repeat
io.write "\nEnter your guess between 1 and 10: "
guess = io.read()
if tonumber(guess) ~= random2 then
print("Try again!")
end
print()
until tonumber(guess) == random2
print("Correct!")
I guess you are calling the script twice within the same second. The resolution of os.time() is one second, i.e. if you are calling the script twice in the same second, you start with the same seed.
os.time ([table])
Returns the current time when called without arguments, or a time representing the date and time specified by the given table. This table must have fields year, month, and day, and may have fields hour, min, sec, and isdst (for a description of these fields, see the os.date function).
The returned value is a number, whose meaning depends on your system. In POSIX, Windows, and some other systems, this number counts the number of seconds since some given start time (the "epoch"). In other systems, the meaning is not specified, and the number returned by time can be used only as an argument to date and difftime.
Furthermore you are rolling a number between 1 and 10, so there is a 0.1 chance that you are hitting 4 (which is not that small).
For better methods to seed random numbers, take a look here: https://stackoverflow.com/a/31083615

Resources