Searching UTF8 special characters with a query using Google Bigquery

Searching UTF8 special characters with a query using Google Bigquery - utf-8

I have a query that is supposed to find the occurence of a special character in UTF8.
That is my query:
SELECT key FROM workspace.table1 WHERE key CONTAINS '\u266D'
However, I get the error message "Invalid string literal".
How can I properly use UTF8 characters in my query?

BigQuery accepts UTF8 encoded input, so you should be able to pass the utf character in your query directly, rather than using the hex code:
SELECT count(*) from publicdata:samples.wikipedia where
title contains '♭'
returns 651

Related

Special Character issue in Oracle DB

I need to update value in one table, which is having special character.
Below is the Update Query I have Executed:
UPDATE TABLE_X
SET DISPLAY_NAME = 'AC¦', NATIVE_IDENTITY='AC¦'
WHERE ID='idNumber'
Special Character "¦" is not getting updated in Oracle.
I have already tried below approaches:
Checked the character set being used in Oracle using below query
select * from nls_database_parameters where parameter='NLS_CHARACTERSET';
It is having "US7ASCII" Character set.
I have tried to see if any of the character set will help using below query
SELECT CONVERT('¦ ', 'ASCII') FROM DUAL;
I have tried below different encoding:
WE8MSWIN1252
AL32UTF8
BINARY - this one is giving error "ORA-01482: unsupported character set"
Before Changing the character set in DB i wanted to try out 'CONVERT' function from Oracle, but above mentioned character set is either returning "Block Symbol" or "QuestionMark � " Symbol.
Any idea how can I incorporate this special symbol in DB?

Assuming that the character in question is not part of the US7ASCII character set, which it does not appear to be unless you want to replace it with the ASCII vertical bar character |, you can't validly store the character in a VARCHAR2 column in the database.
You can change the database character set to a character set that supports all the characters you want to represent
You can change the data type of the column to NVARCHAR2 assuming your national character set is UTF-16 which it would normally be.
You can store a binary representation of the character in some character set you know in a RAW column and convert back from the binary representation in your application logic.
I would prefer changing the database character set but that is potentially a significant change.

How do I look for non UTF-8 characters in DB2 database?

I want to look for non utf8 characters in my MySQL database when the user inputs addesses there are usually other characters, especially when a user copies directly from PDF file to input box
I tried this but it gives me all the columns doesn't matter if they have non utf8 characters. Is there a SQL query that would do this and only target non UTF-8 characters?
SELECT * FROM MyTable WHERE LENGTH(MyColumn) = CHAR_LENGTH(MyColumn)
This is my database table
table name: employees
emp_num(int)
birth_date(date)
first_name (varchar(15))
last_name (varchar(20))
gender (ENUM('M','F'))
address (varchar(50))
So what I did was
SELECT * FROM employees WHERE LENGTH(address) = CHAR_LENGTH(address)
Don't know if this is correct
this image is from my database, see the weird Y that is what is coming out and other characters too.

Ÿ is a valid utf8 character (hex C5B8: 2 bytes, 1 character). And a valid latin1 character (hex 9f)
So, using utf8:
mysql> SELECT LENGTH('Ÿ'), CHAR_LENGTH('Ÿ');
+--------------+-------------------+
| LENGTH('Ÿ') | CHAR_LENGTH('Ÿ') |
+--------------+-------------------+
| 2 | 1 |
+--------------+-------------------+
So, your test with length vs char_length tests something, but not for "non utf8" characters.
In fact, the only "non utf8" characters are Emoji and some Chinese characters that are in utf8mb4 but not in utf8.
But maybe that was not your intended question???
Since you have not provided (1) the charset of the columns, nor (2) the charset of the connection, nor (3) what the text should have said, there is a limit to what can be diagnosed.
What is the "input box"? Is it an HTML field? Does it have
<form accept-charset="UTF-8">
Use SELECT HEX(col) ... to show us what is currently in the table. The see "Test the data" in here for a preliminary analysis of what the character(s) is.
Other
For searching for non-alphanum:
WHERE col RLIKE '[^a-zA-Z0-9_ ]'
would include rows that have something other than letters, digits, underscore, and space.
WHERE HEX(col) RLIKE '^(..)*[89ABCDEF]'
would check for any byte with the 8th bit on. That is, not entirely 7-bit ascii.
So, either specify your problem better, or learn about REGEXP. I suspect "utf8" is not the term to chase. The above RLIKEs will catch things in latin1, too.

Hive INSTR function working incorrectly on string with UTF8 characters

Hive INSTR function is working incorrectly on strings with UTF8 characters. When an accent character is part of the string, INSTR returns an incorrect character location for subsequent characters. It seems to be counting bytes instead of characters.
With the accent character as part of string it returns 8
select INSTR("Réservation:", 'a'); returns 8
Without the accent character as part of string it returns 7
select INSTR("Reservation:", 'a'); returns 7
Is there a fix to this or an alternate function that I could use ?

This what I'm getting with hive 1.1.0,
hive>select INSTR("Réservation:", 'a');
OK
7
So no issues with Hive. If you still need problem with using INSTR write your own UDF to achieve this. For writing UDF refer the below link,
Click here for UDF

SQL Loader incompatible length

This is my control file
FIELDS (
dummy1 filler terminated by "cid=",
address enclosed by "<address>" and "</address>"
...
The address column in the table is varchar(10).
If the address in the file is over 10 characters then SQL*Loader cannot load it.
How I can capture address truncating to 10 characters?

The documentation has a section on applying SQL operators to fields.
A wide variety of SQL operators can be applied to field data with the SQL string. This string can contain any combination of SQL expressions that are recognized by the Oracle database as valid for the VALUES clause of an INSERT statement. In general, any SQL function that returns a single value that is compatible with the target column's datatype can be used.
In this case you can use the substr() function on the value from the file:
...
dummy filler terminated by "cid=",
address enclosed by "<address>" and "</address>" "substr(:address, 1, 10)"
...
The quoted "substr(:address, 1, 10)" passes the initial value from the file through the function before inserting the resulting 10 character (maximum) value, however long the original value in the file was. Note the colon before the name in that function call.
If your file is XML then you might be better off loading it as an external table and then using the built-in XML query tools to extract the data you want, rather than trying to parse it through delimited field definitions.

Inserting French character into Oracle gets converted into some junk characters

Using PL/SQL Developer, I'm able to insert French character in my Oracle database without any error.
Querying:
SELECT * FROM nls_database_parameters WHERE parameter = 'NLS_NCHAR_CHARACTERSET';
Output: AL16UTF16
But when i retreive the data using select statement it get converted into some junk characters, For eg:
système gets converted to systÃ¨me and so on....
Any suggestion/workaround will be appreciated.

The issue was due to different values in NLS_LANGUAGE at client and server.
At server it was: AMERICAN
use following query to read the parameters:
SELECT * FROM nls_database_parameters
At client it was: AMERICAN_AMERICA.WE8MSWIN1252
In PL/SQL Developer Help->About, click on Additional Info button and scroll down.
What i observed other thing, while trying to fix the issue:
The characters were not converting to junk characters in first update.
But when i retreive them(which contains non-ascii characters) and update again, then they are converting to junk characters.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Searching UTF8 special characters with a query using Google Bigquery - utf-8

I have a query that is supposed to find the occurence of a special character in UTF8. That is my query: SELECT key FROM workspace.table1 WHERE key CONTAINS '\u266D' However, I get the error message "Invalid string literal". How can I properly use UTF8 characters in my query?

BigQuery accepts UTF8 encoded input, so you should be able to pass the utf character in your query directly, rather than using the hex code: SELECT count(*) from publicdata:samples.wikipedia where title contains '♭' returns 651

Related

Special Character issue in Oracle DB

How do I look for non UTF-8 characters in DB2 database?

Hive INSTR function working incorrectly on string with UTF8 characters

SQL Loader incompatible length

Inserting French character into Oracle gets converted into some junk characters

Categories

Resources