Replace unicode characters into a column - utf-8

I am trying to normalize a column in my table without having an easy way to do it.
I tried to use the NORMALIZE function and other solutions posted here, but nothing works.
I would like to get a text normalized from the table like "C\u00f3mo funciona" => "Cómo funciona".

The problem was that before saving the data, I made a json_encode that encodes all the data before it can be writted. I only use JSON_QUERY and JSON_VALUE functions of BigQuery to recover and decode the special characters.

Related

Problem when COPY INTO TABLE is performed with special characters into the files

I have a file with a column data something like this. As can be seen there is a special character in between.
The original datatype was varchar(60). When COPY INTO TABLE is performed then it is throwing an error. I change the collation to utf-8 and still doing the same. Is there a way to solve this problem?
ABC COMPANY ▒ Sample Data
Thanks!

How to detect data type in column of table in ORACLE database (probably blob or clob)?

I have a table with a column in the format VARCHAR2(2000 CHAR). This column contained a row containing comma-separated numbers (ex: "3;3;780;1230;1;450.."). Now the situation has changed. Some rows contain data in the old format, but some contain the following data (ex: "BAAAABAAAAAgAAAAHAAAAAAAAAAAAAAAAQOUw6.."). Maybe it's blob or clob. How can I check exactly? And how can I read it now? Sorry for my noob question :)
The bad news is you really can't. Your column is a VARCHAR2 so it's all character data. It seems like what you're really asking is "How do I tell if this value is a comma separated string or a binary value encoded as a string?" So the best you can do is make an educated guess. There's not enough information here to give a very good answer, but you can try things like:
If the value is numeric characters with separators (you say commas but your example has semicolons) then treat it as such.
But what if the column value is "123", is that a single number or a short binary value?
If there are any letters in the value, you know it's not a separated list of numbers, then treat it as binary. But not all encoded binary values will have letters.
Try decoding it as binary, if it fails, maybe it's actually the separated list. This probably isn't a good one.

how to replace characters in hive?

I have a string column description in a hive table which may contain tab characters '\t', these characters are however messing some views when connecting hive to an external application.
is there a simple way to get rid of all tab characters in that column?. I could run a simple python program to do it, but I want to find a better solution for this.
regexp_replace UDF performs my task. Below is the definition and usage from apache Wiki.
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT):
This returns the string resulting from replacing all substrings in INITIAL_STRING
that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT,
e.g.: regexp_replace("foobar", "oo|ar", "") returns fb
Custom SerDe might be a way to do it. Or you could use some kind of mediation process with regex_replace:
create table tableB as
select
columnA
regexp_replace(description, '\\t', '') as description
from tableA
;
select translate(description,'\\t','') from myTable;
Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)
Char/varchar support added as of Hive 0.14.0
You can also use translate(). If the third argument is too short, the corresponding characters from the second argument are deleted. Unlike regexp_replace() you don't need to worry about special characters.
Source code.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions
There is no OOTB feature at this moment which allows this. One way to achieve that could be to write a custom InputFormat and/or SerDe that will do this for you. You might this JIRA useful : https://issues.apache.org/jira/browse/HIVE-3751. (not related directly to your problem though).

How to increase maximum size of csv field in Magento, where is this located

I have one field when importing that can contain large data, it seems that CSV has unofficial limitation of about 65000 (likely 65535*) character. as both libreoffice calc and magento truncating the data for that particular field. I have investigated well and I'm certain it is not because of a special character or quotes. the data pretty straight forward, the lines are similar in format to each other.
Question: How to increase that size? or at least where I should look to find it?
Note: I counted in libreoffice writer and it was about 65040. but probably with carriage return characters it could reach 65535
I change:
1) in table catalog_category_entity_text
type of field "value" from "text" to "longtext"
2) in file app/code/core/Mage/ImportExport/Model/Import/Entity/Abstract.php
const DB_MAX_TEXT_LENGTH = 65536;
to
const DB_MAX_TEXT_LENGTH = 16777215;
and all OK
You are right, there is a limitation in Magento, because it sets texts fields as TEXT in MySQL database and, according to MySQL docs, this kind of field supports a maximum of 65535 chars.
http://dev.mysql.com/doc/refman/5.0/es/storage-requirements.html
So you could change the column type in your Magento database to use MEDIUMTEXT. I guess the correct place is in the catalog_product_entity_text table, where you should modify the 'value' field type to match your needs. But please, keep in mind this is dangerous. Make a full backup before trying. And you may even need to play with core files... not recommended!
I'm having the same issue with 8 products from a list of more than 400, and I think I'm not going to mess with Magento core and database, we can reduce the description strings for those few products.
The CSV could care less. Due to Microsoft Access allowing Memo fields which can contain quite a bit of data, I've exported 2-3k descriptions in CSV format to be imported into Magento quite successfully.
Your limitation is either because you are using a spreadsheet that has a cell limitation or export limitation on cells or because the field you are trying to import into has a maximum character limitation set in its table for that field.
You can determine the latter by using phpMyAdmin to see what the maximum character setting is for that field.

How can I automatically .Trim() whitespace from my Linq queries?

How can I automatically .Trim() whitespace from the results of my Linq2SQL query?
It seems that if SQL has a varchar width of 255 my returned result for "abc" will have 252 chars of whitespace.
Are using char(255) rather than varchar(255)?
If not, check the data in your database - you must be storing all those spaces in the column. Linq-to-sql will only return the column as a string. It does not pad it with spaces, and will only return the 252 spaces if they exist in your database. Are you storing all those spaces in the database? e.g. "abc______________"
I'd firstly suggest you fix your database, but if you can't do that then you can edit the generated code as Exoas suggests.
Try casting as a string:
(string)abc.Trim()
A Quick and Dirty way to make sure the fields are trimmed from your query's automatically, is to modify the designer generated getters for the fields you want trimmed to call the trim method.
get
{
return this._sometext.Trim();
}
Downside is that if you change the mappings it will be generated.

Resources