I have a BPEL Web Service. when I set the output type to XML , it has no problem and the utf-8 characters are working well. but when I set the output type to json, the utf-8 parts of the result goes wrong :
{
name :'ارست',
Code:12544,
Country: 'China',
Adress : 'Sian Street'
}
any suggestion to solve this problem will be appreciated.
Related
I'm trying to inspect a CSV file and there are no findings being returned (I'm using the EMAIL_ADDRESS info type and the addresses I'm using are coming up with positive hits here: https://cloud.google.com/dlp/demo/#!/). I'm sending the CSV file into inspect_content with a byte_item as follows:
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
In looking at the supported file types, it looks like CSV/TSV files are inspected via Structured Parsing.
For CSV/TSV does that mean one can't just sent in the file, and needs to use the table attribute instead of byte_item as per https://cloud.google.com/dlp/docs/inspecting-structured-text?
What about for XSLX files for example? They're an unspecified file type so I tried with a configuration like so, but it still returned no findings:
byte_item: {
type: :BYTES_TYPE_UNSPECIFIED,
data: File.open('/xxxxx/dlptest.xlsx', 'rb').read
}
I'm able to do inspection and redaction with images and text fine, but having a bit of a problem with other file types. Any ideas/suggestions welcome! Thanks!
Edit: The contents of the CSV in question:
$ cat ~/Downloads/dlptest.csv
dylans#gmail.com,anotehu,steve#example.com
blah blah,anoteuh,
aonteuh,
$ file ~/Downloads/dlptest.csv
~/Downloads/dlptest.csv: ASCII text, with CRLF line terminators
The full request:
parent = "projects/xxxxxxxx/global"
inspect_config = {
info_types: [{name: "EMAIL_ADDRESS"}],
min_likelihood: :POSSIBLE,
limits: { max_findings_per_request: 0 },
include_quote: true
}
request = {
parent: parent,
inspect_config: inspect_config,
item: {
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
}
}
dlp = Google::Cloud::Dlp.dlp_service
response = dlp.inspect_content(request)
The CSV file I was testing with was something I created using Google Sheets and exported as a CSV, however, the file showed locally as a "text/plain; charset=us-ascii". I downloaded a CSV off the internet and it had a mime of "text/csv; charset=utf-8". This is the one that worked. So it looks like my issue was specifically due the file being an incorrect mime type.
xlsx is not yet supported. Coming soon. (Maybe that part of the question should be split out from the CSV debugging issue.)
I'm trying to return russian text:
#Produces(MediaType.TEXT_PLAIN + ";charset=utf-8")
return Response.status(200).entity("Русский текст.").build();
Before I started reading standalone-full.xml, I was getting normal Russian text, but when I changed this in standalone.bat:
set "SERVER_OPTS=--server-config=standalone-full.xml"
I'm getting something like this.
This don't help me:
I have a method in our software that pulls the text from a PDF, from a scan or text generated.
I usually try the GetTextFromPage() method first. If it doesn't return text, then I move onto OCR'ing the page.
I have a particular 6 page PDF with the first three pages being a scanned document, and the last two being a form.
On this PDF I'm getting an error that I can't figure out how to resolve.
'StandardEncoding' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
Parameter name: name
at System.Globalization.EncodingTable.internalGetCodePageFromName(String name)
at System.Globalization.EncodingTable.GetCodePageFromName(String name)
at iText.IO.Util.IanaEncodings.GetEncodingEncoding(String name)
at iText.IO.Util.EncodingUtil.ConvertToBytes(Char[] chars, String encoding)
at iText.IO.Font.PdfEncodings.ConvertToBytes(String text, String encoding)
at iText.IO.Font.FontEncoding.FillNamedEncoding()
at iText.IO.Font.FontEncoding.CreateFontEncoding(String baseEncoding)
at iText.Kernel.Font.PdfType1Font..ctor(PdfDictionary fontDictionary)
at iText.Kernel.Font.PdfFontFactory.CreateFont(PdfDictionary fontDictionary)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.GetFont(PdfDictionary fontDict)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.SetTextFontOperator.Invoke(PdfCanvasProcessor processor, PdfLiteral operator, IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.InvokeOperator(PdfLiteral operator, IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessContent(Byte[] contentBytes, PdfResources resources)
at iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy, IDictionary`2 additionalContentOperators)
at iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(PdfPage page)
at EFR.OCR.OCR.ExtractTextFromPDF(FileInfo fileInfo, Int32 StartingPage, Int32 NumberOfPages) in P:\Cloud\Dropbox\EF Recovery\OCRTest\EFR.OCR\OCR.vb:line 113
I've processed many PDFs through my code, some text, some scans, some mixed together. Some had forms... This is the first time that I've had this error.
Here's a snippet of my code...
Using reader As New iText.Kernel.Pdf.PdfReader(fileInfo.FullName)
reader.SetUnethicalReading(True)
Using sourceDoc As New iText.Kernel.Pdf.PdfDocument(reader)
If NumberOfPages = 0 Then NumberOfPages = sourceDoc.GetNumberOfPages
For i As Integer = StartingPage To StartingPage + NumberOfPages - 1
Dim pageText As String = ""
Try
pageText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(sourceDoc.GetPage(i))
Catch ex As Exception
OCRLog.Log($"Error attempting to extract text from page {i}. {ex.ToString}")
End Try
If pageText = "" Then
'extract this page
Dim results As OCRResults = ExtractTextFromPDFImagePage(fileInfo.FullName, i)
pageText = results.Text
pageItems.Add(New OCRResults.PagesClass(results.Accuracy, True, pageText))
Else
pageItems.Add(New OCRResults.PagesClass(100, False, pageText))
End If
stringBuilder.Append(pageText)
Next
Return New OCRResults(stringBuilder.ToString, pageItems)
End Using
End Using
Any ideas?
There is an error in the PDF, just as indicated by the error text "'StandardEncoding' is not a supported encoding name.".
The fonts on the page you shared use the name StandardEncoding in their Encoding entries. This is not a valid name here. According to the specification ISO 32000-1 the only valid values here are MacRomanEncoding, MacExpertEncoding, and WinAnsiEncoding, see Table 111 – Entries in a Type 1 font dictionary – and Table 114 – Entries in an encoding dictionary.
Adobe Preflight also complains about these names when checking for syntax errors:
An unexpected value is associated with the key
Key: BaseEncoding
Value: /StandardEncoding
Type: CosName
Formal Representation: Encoding
Cos ID: 38
Traversal Path: ->Pages->Kids->[0]->Resources->Font->WARSP->Encoding
An unexpected value is associated with the key
Key: Encoding
Value: /StandardEncoding
Type: CosName
Formal Representation: Font.FontType1
Cos ID: 27
Traversal Path: ->Pages->Kids->[0]->Resources->Font->Arial,Bold
An unexpected value is associated with the key
Key: BaseEncoding
Value: /StandardEncoding
Type: CosName
Formal Representation: Encoding
Cos ID: 22
Traversal Path: ->Pages->Kids->[0]->Resources->Font->Arial->Encoding
An unexpected value is associated with the key
Key: BaseEncoding
Value: /StandardEncoding
Type: CosName
Formal Representation: Encoding
Cos ID: 19
Traversal Path: ->Pages->Kids->[0]->Resources->Font->ARROW->Encoding
(Excerpt from a preflight report for your shared PDF)
In spite of StandardEncoding not being a valid name here, the PDF specification knows a "Standard Encoding", see Annex D of ISO 32000-1. Most likely your document attempts to refer to that encoding at the locations outlined above.
If you need to extract text from the document in question, therefore, you may want to follow the recommendation of the error message:
For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
The Encoding class here is the one in System.Text.
To extract the text from your PDF, therefore, it should suffice to implement an EncodingProvider that for the name StandardEncoding provides an Encoding instance according to the information from the STD column of the table in Annex D.2 – Latin Character Set and Encodings – of ISO 32000-1.
I have problem when I trying to insert object to database.
Looks like it's convertion problem. Is there a way to fix this?
I, [2017-10-28T14:02:19.923386 #56398] INFO -- : [49eba256-de7f-48df-8d00-05148a6495d3] Completed 500 Internal Server Error in 286ms (ActiveRecord: 9.5ms)
F, [2017-10-28T14:02:19.925305 #56398] FATAL -- : [49eba256-de7f-48df-8d00-05148a6495d3]
F, [2017-10-28T14:02:19.925557 #56398] FATAL -- : [49eba256-de7f-48df-8d00-05148a6495d3] ActiveRecord::StatementInvalid (Encoding::UndefinedConversionError: U+0142 from UTF-8 to US-ASCII: INSERT INTO "RECIPE_INGREDIENTS" ("QUANTITY", "RECIPE_ID", "INGREDIENT_ID", "CREATED_AT", "UPDATED_AT", "ID") VALUES (:a1, :a2, :a3, :a4, :a5, :a6)):
F, [2017-10-28T14:02:19.925663 #56398] FATAL -- : [49eba256-de7f-48df-8d00-05148a6495d3]
This happens only when I using polish characters like ł, ą, ć
It rather looks as if your underlying database is configured with a US7ASCII characterset which doesn't support UTF8 characters, but your application is a UTF8 application. You'll likely need to work with the DBA team to get a database with AL32UTF8 or similar character set.
The WebAPI request has a POST method which expects Content body. I've tried to use both Parameters and Body options but I receive error responses - 'Invalid Request' with 400 Status code, etc.
JMeter request Sample Content Body:
{
"ParamA": 111,
"ParamB": "Char String",
"ParamC": "VarType"
}
OR
{ "ParamA": 111, "ParamB": "Char String", "ParamC": "VarType"}
Listener Request:
POST data:
--8vpH3B6WcV4f1La46_wccVi4c25lrLJaGcN--
Listener Response:
{"message":"The request is invalid.","modelState":{"value":["An error
has occurred."]}}
Any insight into viable options? Eventually, I'm planning on reading the Body string from a .csv file so I can parameterize the request. Reading from a .CSV file only reads the first line of the request body - for example: '{'
Any help would be greatly appreciated.
Best,
Ray
HTTP Request
Request
Uncheck in HTTP request the option:
Use multipart/form data for POST
Also check your CSV does not contain some data that contains the CSV separator which is '\t' by default.
Ensure it doesn't by changing separator to '|' for example if you're sure your JSON will never contain it.