SQLAlchemy Oracle can't insert characters with accents - oracle

I've got a project based in Flask that uses a Oracle database and communicates trough SQLAlchemyand the cx_Oracle plugin. My problem is that I have a simple table with 2 Strings:
class Example(Base):
__tablename__ = 'example'
id = Column(Integer, primary_key=True)
title = Column(String(255))
description = Column(String(1024))
And when I try to save values with accents I get this error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)
In which de encode characters is different depending on the value of the text.
Here's an example of the values:
object = Example()
object.title = 'É its a character with accent'
object.description = 'Á another characters with accent'
db_session.add(object)
db_session.commit()
Do you have any idea what I can do to fix this? Some configuration?
Thanks :)
UPDATE:
As suggested I've tried 2 other ways:
class Example(Base):
tablename = 'example'
id = Column(Integer, primary_key=True)
title = Column(Unicode(255))
description = Column(Unicode(1024))
And
class Example(Base):
tablename = 'example'
id = Column(Integer, primary_key=True)
title = Column(String(255, convert_unicode=True))
description = Column(String(1024, convert_unicode=True))
Still got the same error.

that is because the names you are using, specially the accents are not in the ASCII table, please try to declare the title property as:
title = Column(String(255, convert_unicode=True)
This may help, if not declare it as Unicode instead of String.
For more information you can also check the documentation here:
http://docs.sqlalchemy.org/en/latest/core/type_basics.html
You should also ensure that on your create_engine() function you have the encoding optional param as "UTF-8" or "latin1" depending on the charset that you might need to be inputed. Of course "UTF-8" has everything you might actually need.

Related

Fix tokenization to tensors with padding Huggingface

I'm trying to tokenize my dataset with the following preprocessing function. I've already donlowaded with AutoTokenizer from the Spanish BERT version.
`
max_input_length = 280
max_target_length = 280
source_lang = "es"
target_lang = "en"
prefix = "translate spanish_to_women to spanish_to_men: "
def preprocess_function(examples):
inputs = [prefix + ex for ex in examples["mujeres_tweet"]]
targets = [ex for ex in examples["hombres_tweet"]]
model_inputs = tokz(inputs,
padding=True,
truncation=True,
max_length=max_input_length,
return_tensors = 'pt'
)
# Setup the tokenizer for targets
with tokz.as_target_tokenizer():
labels = tokz(targets,
padding=True,
truncation=True,
max_length=max_target_length,
return_tensors = 'pt'
)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
`
And I get the following error when trying to pass my dataset object through the function.
I've already tried dropping the columns that have strings. I've seen also that when I do not set the return_tensors it does tokenize my dataset (but later on I have the same problem when trying to train my BERT model. Anyone knows what might be going on? *inserts crying face
Also, I've tried tokenizing it without the return_tensors and then doing set_format but it returns and empty dataset object *inserts another crying face.
My Dataset looks like the following
And an example of the inputs
So that I just do:
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)

Oracle XMLType - Loading from XML flat-file as chunks

Using Java 8 and Oracle 11g. Regarding loading XML data from a flat file into an Oracle XMLType field. I can make it work with this code:
private String readAllBytesJava7(String filePath) {
Files files;
Paths paths;
String content;
content = "";
try {
content = new String ( Files.readAllBytes( Paths.get(filePath) ) );
}
catch (IOException e) {
log.error(e);
}
return content;
}
pstmt = oracleConnection.prepareStatement("update MYTABLE set XML_SOURCE = ? where TRANSACTION_NO = ?");
xmlFileAsString = this.readAllBytesJava7(fileTempLocation);
xmlType = XMLType.createXML(oracleConnection, xmlFileAsString);
pstmt.setObject(1,xmlType);
pstmt.setInt(2, ataSpecHeader.id);
pstmt.executeUpdate();
But as you might surmise, that only works for small XML files... Anything too large will cause a memory exception.
What I'd like to do is load the XML file in "chunks" as described here:
https://docs.oracle.com/cd/A97335_02/apps.102/a83724/oralob2.htm
and
https://community.oracle.com/thread/4721
Those posts show how to load a BLOB/CLOB column from a flat-file by "chunks". I can make it work if the column is blob/clob, but I couldn't adapt it for an XMLType column. Most of what I found online in regards to loading an XMLType column deals with using the oracle-directory object or using sql-loader, but I won't be able to use those as my solution. Is there any kind of post/example that someone knows of for how to load an XML file into an XMLType column as "chunks"?
Additional information:
I'm trying to take what I see in the posts for blob/clob and adapt it for XMLType. Here's the issues I'm facing:
sqlXml = oracleConnection.createSQLXML();
pstmt = oracleConnection.prepareStatement("update MYTABLE set
XML_SOURCE = XMLType.createXML('<e/>') where 1=1 and TRANSACTION_NO = ?");
pstmt.setInt(1, ataSpecHeader.id);
pstmt.executeUpdate();
With blob/clob, you start out by setting the blob/clob field to "empty" (so it isn't null)... I'm not sure how to do this with XMLType... the closest I can get it just to set it to some kind of xml as shown above.
The next step is to select the blob/clob field and get the output stream on it. Something like what is shown here:
cmd = "SELECT XML_SOURCE FROM MYTABLE WHERE TRANSACTION_NO = ${ataSpecHeader.id} FOR UPDATE ";
stmt = oracleConnection.createStatement();
rset = stmt.executeQuery(cmd);
rset.next();
xmlType = ((OracleResultSet)rset).getOPAQUE(1);
//clob = ((OracleResultSet)rset).getCLOB(1);
//blob = ((OracleResultSet)rset).getBLOB(1);
clob = xmlType.getClobVal();
//sqlXml = rset.getSQLXML(1);
//outstream = sqlXml.setBinaryStream();
//outstream = blob.getBinaryOutputStream();
outstream = clob.getAsciiOutputStream();
//At this point, read the XML file in "chunks" and write it to the outstream object by doing: outstream.write
The lines that are commented-out are to show the different things I've tried. To re-state... I can make it work fine if the field in the table is a BLOB or CLOB. But I'm not sure what to do if it's an XMLType. I'd like to get an outstream handle to the XMLType field so I can write to it, as I would if it were a BLOB or CLOB. Notice for BLOB/CLOB it selects the blob/clob field with "for update" and then gets an Outstream on it so I can write to it. For XMLType, i tried getting the field to an XMLType java class and SQLXML java class, but it won't work that way. I also tried getting the field first as xmltype/sqlxml and then casting to blob/clob to then get an outstream, but it won't work either. The truth is, I'm not sure what I'm supposed to do in order to be able to write to the XMLType field as a stream/chunks.

How to write this domain to be sure at 100 % that we will get the right stock pack operation for each invoice line?

This post should be a little more complex than usual.
We have created a new field for an account.invoice.line : pack_operation. With this field, we can print serial/lot number for each line on the PDF invoice (this part works well).
Many hours passed trying to write the domain to select the EXACT and ONLY stock pack operation for each invoice line.
In the code below, we used the domain [('id','=', 31)] to make our tests printing the PDF.
Ho to write this domain to be sure at 100 % that we will get the right stock pack operation for each invoice line?
I really need your help here... Too complex for my brain.
Our code :
class AccountInvoiceLine(models.Model):
_inherit = "account.invoice.line"
pack_operation = fields.Many2one(comodel_name='stock.pack.operation', compute='compute_stock_pack_operation_id')
def compute_stock_pack_operation_id(self):
stock_operation_obj = self.env['stock.pack.operation']
stock_operation = stock_operation_obj.search( [('id','=', 31)] )
self.pack_operation = stock_operation[0]
EDIT#1
I know that you won't like my code. But, this one seems to work. I take any comments and improvements with pleasure.
class AccountInvoiceLine(models.Model):
_inherit = "account.invoice.line"
pack_operation = fields.Many2one(comodel_name='stock.pack.operation', compute='compute_stock_pack_operation_id')#api.one
def compute_stock_pack_operation_id(self):
procurement_order_obj = self.env['procurement.order']
stock_operation_obj = self.env['stock.pack.operation']
all_picking_ids_for_this_invoice_line = []
for saleorderline in self.sale_line_ids:
for procurement in saleorderline.procurement_ids:
for stockmove in procurement.move_ids:
if stockmove.picking_id.id not in all_picking_ids_for_this_invoice_line
all_picking_ids_for_this_invoice_line.append(stockmove.picking_id.id)
all_picking_ids_for_this_invoice_line))
stock_operation = stock_operation_obj.search(
[ '&',
('picking_id','in',all_picking_ids_for_this_invoice_line),
('product_id','=',self.product_id.id)
]
)
self.pack_operation = stock_operation[0]
The pack_operation field is a computed field, that be default means that the field will not be saved on the database unless you set store=True when you define your field.
So, what you can do here is change:
pack_operation = fields.Many2one(comodel_name='stock.pack.operation', compute='compute_stock_pack_operation_id')
to:
pack_operation = fields.Many2one(comodel_name='stock.pack.operation', compute='compute_stock_pack_operation_id', store=True)
And try running your query again.

searching by special character on linq

I need a searching process on linq like this. For example, I will make searching on Name column,and user enters "Ca?an" word to textbox. Question mark will e used for sprecial search character for this sitution.
It will search by Name column and, find Canan,Calan,Cazan etc.
I hope I can explain my problem correctly.
Can anyone give me an idea about this linq query. Thank in advance...
You can use this regular expression (if you are using C#) to check for the "Ca?an".
string d = "DDDDDDDDDDCasanDDDDDDDDDD";
Regex r = new Regex(#"Ca([a-zA-Z]{1})an");
string t = r.Match(d).Value;
Output will be:
"Casan"
You have all your colum stored in a database, then do something like:
List<Person> list = new List<Person>(); //Filled
var res = list.Select(x => r.Match(x.Name));
Output will be a IEnumerable with all the "Persons" who contains in the Name "Ca?an", being ? no matter which letter
You need to convert your search-syntax into an existing search-engine - I'd suggest Regex. So the steps will be:
Safely convert the entered search-string into Regex-pattern
Perform the search in Linq on name-property
Solution:
1: Safely convert search string by replacing '?' with Regex-version of wildchar:
var userInput = "Ca?an";
var regexPattern = Regex.Escape(userInput).Replace(#"\?", ".");
2: Perform search in Linq (assuming itemList implements IEnumerable):
var results = itemList.Where(item => Regex.IsMatch(item.Name, regexPattern));
Hope this helps!

import 3 digit field with leading zero

ASP.NET, C#, MVC 3, Code First project
I'm trying to import data from an Excel spreadsheet. I've formated all cells as Text.
A sample row in the Import worksheet is as follows.
Account Card ThreeCode Route
04562954830287127 32849321890233127 183 154839254
04562954830287128 32849321890233128 233
04562954830287129 32849321890233129 082
04562954830287130 32849321890233130 428
When I run in debug and drill down into the ds DataSet the Account and Card columns are imported as strings, the 3-Digit and Route columns are imported as doubles. The problem arises with the 3 digit number starting with 0 (082) in data row 3. It gets imported as a System.DBNull and is empty. I need to be able to import 3 digit codes with leading zeros. Is there a way to force the import to be all strings or another way to approach this problem? I searched the web for and haven't found a solution. This will run from a browser so anything to do with the registry, dll or ini files on the local machine is not an option. The import code is below. Thank you in advance for any help.
public ActionResult ExcelToDS(string Path = "C;\File.xls")
{
string strConn = "Provider= Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + Path + "; " + "Extended Properties=Excel 8.0;";
OleDbConnection conn = new OleDbConnection(strConn);
conn.Open(); string strExcel = "";
OleDbDataAdapter myCommand = null;
DataTable dt = new DataTable();
strExcel = "select * from [Import$]";
DataSet ds = null;
myCommand = new OleDbDataAdapter(strExcel, strConn);
ds = new DataSet(); myCommand.Fill(ds, "table1");
Ah yes the joys of the excel driver. What happens is it makes a determination from the first say ten rows on the data type, anything outside of that format becomes null.
Solutions are to use a more robust third party driver usually costing something, or set the registry key to fully sample all of the rows rather than the default 8.
Check out the link here for TypeGuessRows
http://www.connectionstrings.com/excel
HKLM\Software\Wow5432Node\Microsoft\Jet\4.0\Engines\Excel
Set the value TypeGuessRows equal to zero
There doesn't appear to be a way to make this work consistently. The work around I came up with is to add 1000 to the ThreeCode column in the Excel workbook. You are then able to import the data into a dataset. Then when the data is read out you simply strip off the "1" preifx. Here is my inline method to do that.
public static string last3(this string instring)
{ int len = instring.Length - 3; string outstring = instring.Substring(len, 3); return outstring; }
Which you can call in the code with:
card.3Dig = code.last3();
'card' and '3dig' are the class and field being populated. 'code' is the 4 digit dataset data.

Resources