MATCH AGAINST SQL Query on MariaDB not working as expected - full-text-search

I am working on a project where I am using MySQL MATCH AGAINST. I've used before in another project without any issue and utilising the same base code, except I'm having some odd behaviour. The only real difference is that I am using MariaDB instead of MySQL.
Below is how my table is defined
CREATE TABLE `temp_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`LogID` int(11) NOT NULL,
`LogReceived` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Host` int(11) NOT NULL,
`LogName` varchar(100) NOT NULL,
`LogLine` text NOT NULL,
PRIMARY KEY (`id`),
KEY `IDXLogID` (`LogID`,`LogReceived`,`Host`),
FULLTEXT KEY `IDXLogLine` (`LogLine`)
) ENGINE=MyISAM AUTO_INCREMENT=5838772 DEFAULT CHARSET=latin1;
One of the columns that I am doing the full text search against contains the following:
19/06/2019 19:01:18: Took 0 seconds to read lines for log 'Log Agent'
If I do the query as follows (LogLine is the column with the full text search):
SELECT * FROM log_agent.temp_logs WHERE MATCH(LogLine) AGAINST ('+Log' IN BOOLEAN MODE);
But the above query returns no results, even though as shown above the column value contains Log. If I try changing +Log to be +seconds it then returns the row so why does it find seconds but not Log, again if I change +Log for +Agent rows are returned so their doesn't seem to be any rhyme or reason for what its doing.
I've tried removing the IN BOOLEAN MODE as I didn't need this previously but makes no difference.

There are 3 caveats in using MyISAM's FULLTEXT:
Words that occur in more than half the rows are not indexed.
Words shorter than ft_min_word_len are not indexed.
Words in the "stop word" list are not indexed.
When filtering on things that FULLTEXT prefers to ignore, this trick is useful:
WHERE MATCH(`LogLine`) AGAINST ('...' IN BOOLEAN MODE) -- things that FT can do
AND `LogLine` LIKE "..." -- (or NOT LIKE or RLIKE or NOT RLIKE, as needed)
This will be reasonably efficient because it will first do the FT test, which will find only a few rows. Then it will go to the extra effort on those rows.

Related

Updating big tables with string replace runs into ORA-30036: unable to extend segment

Updating big tables with string replace runs into ORA-30036: unable to extend segment
The table is like (with an Index on ID):
CREATE TABLE "DB"."C_DATA"
(
"ID" VARCHAR2(32 CHAR) NOT NULL ENABLE,
"KEY" VARCHAR2(512 CHAR) NOT NULL ENABLE,
"VALUE" CLOB,
"UNIQUE_ID" VARCHAR2(512 CHAR),
"DT_CREATE" TIMESTAMP (6) DEFAULT sysdate,
CONSTRAINT "C_DATA_PK" PRIMARY KEY ("ID")
)
In the column VALUE are strings with varying length from some character up to some 1000 characters.
I have to update the column and replace some characters lets say replace comas with semicolons:
UPDATE DB.C_DATA tbl
SET VALUE = REPLACE(tbl.VALUE, ',', ';')
WHERE VALUE like '%,%';
With this I have now two problems:
1) When I run this I run into the “ORA-30036: unable to extend segment …”
2) For some values I have the feeling this does not work and not all characters are replaced correctly.
Add2: If I run on smaller data set I got the problem that for longer entries it seems to not do its job. Just like ignoring the replace. (is this related to the second problem?)
Add1: As soon as I test with the bigger dataset in the DB it fails like ORA-30036 (not even to think about the full dataset (50.000.000 rows))
It is clear that the table space could be increased but that is not really an option now,
Would there be a way to tell the database to split this statement into smaller Jobs and execute them after each other to not have the full set into the undo space?
Not having this executed/applied at once would not be an issue (as long is it finishes in an finite time).
If this is not doable within an oracle SQL statement / PLSQL – what (scriptable) approach would be possible?

Not able to perform UPDATE query with sysdate - ORACLE

I am trying to run the following, fairly simple, update statement in ORACLE.
UPDATE PROJECT_BUG_SNAPSHOTS
SET SNAPSHOT_DATESTAMP = sysdate,
SNAPSHOT_TYPE = P_SNAPSHOT_TYPE
WHERE PROJECT_ID = P_PROJECT_ID
AND BUG_NO = P_BUG_NO
AND BUG_STATUS = P_BUG_STATUS;
It complains of unique constraint violation.
The PK comprises of PROJECT_ID,BUG_NO,SNAPSHOT_DATESTAMP,SNAPSHOT_TYPE.
The table structure is
PROJECT_ID NUMBER
SNAPSHOT_DATESTAMP DATE
SNAPSHOT_TYPE VARCHAR2(20 BYTE)
BUG_NO NUMBER
BUG_STATUS VARCHAR2(100 BYTE)
This is quite weird as sysdate should be different with each run and it should never hit the "unique constraint violation" error.
The primary key is a combination of PROJECT_ID, BUG_NO, SNAPSHOT_DATESTAMP, and SNAPSHOT_TYPE. This means you allow (and probably have!) several rows with the same project id, bug number and snapshot type, but from different dates. Your update statement, will attempt to set all the snapshot dates of a given project, bug number and status to the same date (the current date), thus breaking the uniqueness and failing due to a constraint violation.

DB2 duplicate key error when inserting, BUT working after select count(*)

I have a - for me unknown - issue and I don't know what's the logic/cause behind it. When I try to insert a record in a table I get a DB2 error saying:
[SQL0803] Duplicate key value specified: A unique index or unique constraint *N in *N
exists over one or more columns of table TABLEXXX in SCHEMAYYY. The operation cannot
be performed because one or more values would have produced a duplicate key in
the unique index or constraint.
Which is a quite clear message to me. But actually there would be no duplicate key if I inserted my new record seeing what records are already in there. When I do a SELECT COUNT(*) from SCHEMAYYY.TABLEXXX and then try to insert the record it works flawlessly.
How can it be that when performing the SELECT COUNT(*) I can suddenly insert the records? Is there some sort of index associated with it which might give issues because it is out of sync? I didn't design the data model, so I don't have deep knowledge of the system yet.
The original DB2 SQL is:
-- Generate SQL
-- Version: V6R1M0 080215
-- Generated on: 19/12/12 10:28:39
-- Relational Database: S656C89D
-- Standards Option: DB2 for i
CREATE TABLE TZVDB.PRODUCTCOSTS (
ID INTEGER GENERATED BY DEFAULT AS IDENTITY (
START WITH 1 INCREMENT BY 1
MINVALUE 1 MAXVALUE 2147483647
NO CYCLE NO ORDER
CACHE 20 )
,
PRODUCT_ID INTEGER DEFAULT NULL ,
STARTPRICE DECIMAL(7, 2) DEFAULT NULL ,
FROMDATE TIMESTAMP DEFAULT NULL ,
TILLDATE TIMESTAMP DEFAULT NULL ,
CONSTRAINT TZVDB.PRODUCTCOSTS_PK PRIMARY KEY( ID ) ) ;
ALTER TABLE TZVDB.PRODUCTCOSTS
ADD CONSTRAINT TZVDB.PRODCSTS_PRDCT_FK
FOREIGN KEY( PRODUCT_ID )
REFERENCES TZVDB.PRODUCT ( ID )
ON DELETE RESTRICT
ON UPDATE NO ACTION;
I'd like to see the statements...but since this question is a year old...I won't old my breath.
I'm thinking the problem may be the
GENERATED BY DEFAULT
And instead of passing NULL for the identity column, you're accidentally passing zero or some other duplicate value the first time around.
Either always pass NULL, pass a non-duplicate value or switch to GENERATED ALWAYS
Look at preceding messages in the joblog for specifics as to what caused this. I don't understand how the INSERT can suddenly work after the COUNT(*). Please let us know what you find.
Since it shows *N (ie n/a) as the name of the index or constraing, this suggests to me that is is not a standard DB2 object, and therefore may be a "logical file" [LF] defined with DDS rather than SQL, with a key structure different than what you were doing your COUNT(*) on.
Your shop may have better tools do view keys on dependent files, but the method below will work anywhere.
If your table might not be the actual "physical file", check this using Display File Description, DSPFD TZVDB.PRODUCTCOSTS, in a 5250 ("green screen") session.
Use the Display Database Relations command, DSPDBR TZVDB.PRODUCTCOSTS, to find what files are defined over your table. You can then DSPFD on each of these files to see the definition of the index key. Also check there that each of these indexes is maintained *IMMED, rather than *REBUILD or *DELAY. (A wild longshot guess as to a remotely possible cause of your strange anomaly.)
You will find the DB2 for i message finder here in the IBM i 7.1 Information Center or other releases
Is it a paging issue? we seem to get -0803 on inserts occasionally when a row is being held for update and it locks a page that probably contains the index that is needed for the insert? This is only a guess but it appears to me that is what is happening.
I know it is an old topic, but this is what Google shown me on the first place.
I had the same issue yesterday, causing me a lot of headache. I did the same as above, checked the table definitions, keys, existing items...
Then I found out the problem was with my INSERT statement. It was trying to insert to identical records at once, but as the constraint prevented the commit, I could not find anything in the database.
Advice: review your INSERT statement carefully! :)

H2 database update table execution takes too long when MVCC=true

I have a table with 18 columns with a primary key and 8 indexes. If I make a usual connection to H2 DB (embedded mode) and update the non-index field of this table, it takes around 20 seconds in H2 console to update 50000 records. However, if I set MVCC=true in connection string and then try to update SAME 50000 records, the table does not get updated for even more than 30 minutes
Schema below
CREATE TABLE
TEMP
(
SWITCHIPADDRESS VARCHAR(16),
ID BIGINT NOT NULL IDENTITY,
MACADDRESS VARCHAR(14),
USERID VARCHAR(32),
TIMESTMP TIMESTAMP NOT NULL,
LINKCOUNT INTEGER,
HASLINKTOSWITCH BOOLEAN,
LINKIPADDR VARCHAR(16),
IFINDEX INTEGER,
PORT INTEGER,
SLOT INTEGER,
VLANID INTEGER,
IFSPEED INTEGER,
IFADMINSTATUS INTEGER,
PORTDUPLEXMODE INTEGER,
UNP VARCHAR(32),
DOMAIN INTEGER,
DISPOSITION INTEGER,
PRIMARY KEY (ID)
)
Indexes
KEY `ForwardIdx` (`SwitchIPAddress`,`MACAddress`,`slot`,`port`),
KEY `ForwardSwIPIdx` (`SwitchIPAddress`),
KEY `ForwardMACIdx` (`MACAddress`),
KEY `ForwardSlotIdx` (`slot`),
KEY `ForwardPortIdx` (`port`),
KEY `ForwardVlanIdx` (`VlanID`),
KEY `UserIdIdx` (`UserId`),
KEY `UNPIdx` (`UNP`)
I can see in the trace log file that thousands of keys are first getting removed and then getting added which is probably taking time. But I wonder, why would key realignment required when what is being done is a simple update on non-idx field.
The problem remains even if I have just 1 index which is used in where clause.
Can someone please let me know how to speed this up and improve update performance out here. Is deletion and then addition of leys by design>>
Our application is multi-threaded and we are getting "Timeout error trying to lock table" issues for which I have added MVCC=true in the connection line and now ran into another problem.
MVCC mode is experimental. Also, we going to replace it as some point with a different engine, so using it really a "at your own risk" solution.
I suggest that you turn MVCC mode off, and try to fix your timeout problem some other way.

Defining a Character Set for a column For oracle database tables

I am running following query in SQL*Plus
CREATE TABLE tbl_audit_trail (
id NUMBER(11) NOT NULL,
old_value varchar2(255) NOT NULL,
new_value varchar2(255) NOT NULL,
action varchar2(20) CHARACTER SET latin1 NOT NULL,
model varchar2(255) CHARACTER SET latin1 NOT NULL,
field varchar2(64) CHARACTER SET latin1 NOT NULL,
stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id NUMBER(11) NOT NULL,
model_id varchar2(65) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (id),
KEY idx_action (action)
);
I am getting following error:
action varchar2(20) CHARACTER SET latin1 NOT NULL,
*
ERROR at line 5:
ORA-00907: missing right parenthesis
Can you suggest what am I missing?
The simple answer is that, unlike MySQL, character sets can't be defined at column (or table) level. Latin1 is not a valid Oracle character set either.
Character sets are consistent across the database and will have been specified when you created the database. You can find your character by querying NLS_DATABASE_PARAMETERS,
select value
from nls_database_parameters
where parameter = 'NLS_CHARACTERSET'
The full list of possible character sets is available for 11g r2 and for 9i or you can query V$NLS_VALID_VALUES.
It is possible to use the ALTER SESSION statement to set the NLS_LANGUAGE or the NLS_TERRITORY, but unfortunately you can't do this for the character set. I believe this is because altering the language changes how Oracle would display the stored data whereas changing the character set would change how Oracle stores the data.
When displaying the data, you can of course specify the required character set in whichever client you're using.
Character set migration is not a trivial task and should not be done lightly.
On a slight side note why are you trying to use Latin 1? It would be more normal to set up a new database in something like UTF-8 (otherwise known as AL32UTF8 - don't use UTF8) or UTF-16 so that you can store multi-byte data effectively. Even if you don't need it now it's wise to attempt - no guarantees in life - to future proof your database with no need to migrate in the future.
If you're looking to specify differing character sets for different columns in a database then the better option would be to determine if this requirement is really necessary and to try to remove it. If it is definitely necessary1 then your best bet might be to use a character set that is a superset of all potential character sets. Then, have some sort of check constraint that limits the column to specific hex values. I would not recommend doing this at all, the potential for mistakes to creep in is massive and it's extremely complex. Furthermore, different character sets render different hex values differently. This, in turn, means that you need to enforce that a column is rendered in a specific character, which is impossible as it falls outside the scope of the database.
1. I'd be interested to know the situation
According to provided DDL statement it's some need to use 2 character sets. The implementation of this functionality in Oracle is different from MySQL and done with n* data types like nvarchar2, nchar... Latin1 is similar to some Western European character set that might be default. So you able to define for example "Latin1" (WE**) and some Unicode (UTF8..).
The NVARCHAR2 datatype was introduced by Oracle for databases that want to use Unicode for some columns while keeping another character set for the rest of the database (which uses VARCHAR2). The NVARCHAR2 is a Unicode-only datatype.
The reason you want to use NVARCHAR2 might be that your DB uses a non-Unicode character and you still want to be able to store Unicode data for some columns.
Columns in your example would be able to store the same data, however the byte storage will be different.

Resources