What is the required file format for Google AutoML Datasets? - google-cloud-automl

Whenever I try to upload my dataset to the AutoML Natural Language Web UI, I get the error
Something is wrong, please try again.
The documentation is not very insightful about how my CSV file is supposed to look, but I tried to make a simple sample file just to make sure it works at all, it looks like this:
text,label
asdf,cat
asodlkao,dog
asdkasdsadksafask,cat
waewq23,cat
dads,cat
saiodjas,cat
skdoaskdoas,dog
hgfkgizk,dog
fzdrgbfd,cat
otiujrhzgf,cat
vchztzr,dog
aksodkasodks,dog
sderftz,dog
dsoakd,dog
qweqweqw,cat
asdqweqe,cat
dkawosdkaodk,dog
ewqeweq,cat
fdsffds,dog
bvcghh,cat
rthnghtd,dog
sdkosadkasodk,cat
sdjidghdfig,cat
kfodskdsof,dog
saodsadok,dog
ksaodksaod,dog
vncvb,cat
I chose this formatting according to the Google suggested Syntax
But even with this formatting I still get the same error
I've seen this question Format of the input dataset for Google AutoML Natural Language multi-label text classification but according to the answers there it seems my formatting should work, so I do not know why I get the error

I've just copied the CSV file and uploaded it to my own project and the dataset created worked. One problem is that an extra label was created "label" - this is because the header is not expected to be in the csv file (probably this should get fixed).
Based on that it seems the problem isn't the CSV file format. I would recommend to check if your project is setup correctly. You can open a bug to get someones help. Either you can open a bug in public issue tracker or send feedback using the UI (there is 'Feedback' option in the menu on top right side of the page).

I have found the problem! As Michal K said, there was nothing wrong with the formatting, the real problem was I was not assigned the role of Storage Object Creator, which is necessary because the Data is uploaded in Cloud Storage first

Related

I'm trying to use Data Validation in Google Sheets to only accept a hex color – any help appreciated

I think the title says it all. I have a shared google sheet which I'd like to limit several columns to make sure the information is correctly added as a hexcode.
I've been searching and I just can't seem to find anything, but this code I found for EXCEL may be a starting point:
=AND(LEN(A2)<13,ISERROR(HEX2DEC(A2))=FALSE)
It does not seem to work for Sheets...
try:
=ISNUMBER(HEX2DEC(REGEXEXTRACT(A1, "#(.*)")))*(LEN(A1)<8)

Importing book names from goodreads.com into Google Sheets with ImportXML gives "Import Internal Error" sometimes

I have a formula that fetches names of books from goodreads.com:
=IMPORTXML("https://www.goodreads.com/book/show/" & gr_id; "//*[#id='bookTitle']")
where gr_id is a column containing ids of the books. For example when gr_id=23848607, it fetches from URL https://www.goodreads.com/book/show/23848607 and the result is "Warheart".
The formula worked fine some time ago. I did not change anything and now I noticed it stopped working for some of the books (still working for others). Instead of the name of the book now it gives N/A with "Import Internal Error" hint. The ids that do not work are:
48332548
35906922
How to make it work for all books?
There were many questions posted about "Import Internal Error" problems. I tried some solutions including copying the formula to a fresh sheet, but it did not work.
Update: I tried the following different XPath formulas instead of "//*[#id='bookTitle']".
"//h1[#id='bookTitle']"
"//h1"
Those different XPath formulas worked the same as the original XPath formula. They worked correctly for the same ids that the original one did and produced N/As for the same ids that the original one did.
Update: I just re-checked and all my formulas worked correctly for all gr_ids (I had not changed anything since the time when they did not work.) May be someone knows how to prevent them from stopping working in the future.
Update: I re-checked once again. Of all gr_ids only this one was showing N\A now: 35906922. I created an example spreadsheet, because my working spreadsheet contains too many unrelated details, but the problem did not appear in the example spreadsheet. I went back to my working spreadsheet and reloaded it - and the problem disappeared in my working spreadsheet too. Then I added more test data in the example spreadsheet and the following new example gr_ids showed N\A:
48213012
48213092
I tried to make a copy of the example spreadsheet to see if it fixes the problem. The behavior in the copy example spreadsheet was identical to the original example spreadsheet - the problem only with two gr_ids specified above.
if you run full IMPORTXML on those two IDs you can see it won't return anything at all:
=IMPORTXML("https://www.goodreads.com/book/show/48213012-fathers-and-sons", "//*")
which means that Google Sheets can't reach the XML content for some reason (could be something similar to https://stackoverflow.com/a/24891676/5632629)
therefore we can try to read the source code directly with IMPORTDATA where we can find around 70 elements with the same information so we pick one, isolate it and remove HTML tags. then we just wrap the prior formula in IFERROR and force the formula to take a 2nd look if it fails first time. the result is like this:
=IFERROR(IMPORTXML("https://www.goodreads.com/book/show/"&A:A, "//*[#id='bookTitle']"),
REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(
IMPORTDATA("https://www.goodreads.com/book/show/"&A:A), 100, 1),
"select Col1 where Col1 contains '</title>'"), ">(.*) by"))
IMPORTXML() seems to be unreliable. I decided not to use it, because I did not find an acceptable solution to my problem. Instead of using IMPORTXML() I exported my books from goodreads.com to csv file (there is such a feature of goodreads.com) and then imported the csv file into my spreadsheet. This is not be an perfect solution, because I need to re-import every time I need to update the books, but at least it works.

Flat file data validation

I am supposed to load some data that is received in flat files (csv). The problem is that the supplier is generating a lot of junk data.
before starting to develop anything new on my own, I would like to ask if there is something that could automate this process.
I have found an open source tool called flat file checker, It can accept a bunch of various validation rules including regex, but the problem is that it does not work. It is exactly what I need, but it is not validating.
Does anyone have any suggestion for something like this, but actually works

SSRS error on preview : "The size necessary to buffer the XML content exceeded the buffer quota" hides original error

I understand that there is definitely something wrong with my report (e.g. columns missmatcch) and I need to correct it but what I see is the WCF error message that hides actual problem and exactly this hiding irritates me much more than original problem: columns missmatch.
I guess we need to adjust the WCF 'buffer size' and we will get original problem message. But where is the config file?
Text search of "system.serviceModel" in the C:\Program Files (x86)\Microsoft Visual Studio 10.0 doesn't bring good idea...
P.S. Since this is just preview of report I do not think that it is SSRS configuration problem. Problem localised somewhere in DevStudio process or int the DevStudio's internal web server process ...
P.P.S Please help me too improve the question. I see that responders doesn't understand what kind of help I need.
I have encountered multiple "flavors" of this bug in SSRS Preview. It seems the renderer for Preview mode is quite fragile.
There is a simple way to solve this. Ignore the error and attempt to upload the RDL file to your reporting server. The uploader will happily tell you exactly what is wrong with your file - it will tell you exactly what field has a problem and what that problem is. If there are multiple errors, you will get told each and every field and the error associated with each one.
I can create this bogus XML buffer error with any of the following:
Add a new Tablix, start to connect it to a dataset, then cancel out.
Copy/paste some text into a textbox from a MS Word document where one or more lines have a negative right indent (right column end is outside page margin).
Connect a dataset with a varchar(8000) returned value.
Please Check if any of your report items are referencing fields that are not in existing dataset scope.
This indeed worked for me.
See Below link for more information:
http://connect.microsoft.com/SQLServer/feedback/details/742913/ssdt-reporting-services-designer-error
I have seen this error when adding a new field to an existing dataset by clicking "Refresh Fields".
The dataset source was a stored procedure. The result was only a few of original fields showed up in the dataset field list and not the new field. If I tried to preview the report I get XML buffer error.
Workaround was to not refresh fields but hit add new field and type the new field name into the dataset properties field last.
Worked fine after that.
I got this error again today.
I had created a table to hold data to replace two slow queries. I changed some names to clean up the process.
I think the error actually means that there are so many problems with my report that the buffer holding the various error messages isn't large enough which leads to the error message.
The size necessary to buffer the XML content exceeded the buffer quota
Of course this should be an easy fix but Microsoft has said that they will not fix it.
https://connect.microsoft.com/SQLServer/feedback/details/742913/ssdt-reporting-services-designer-error
EDIT: I've updated my answer based on having fixed the issue.
I'm currently experiencing this problem after having changed multiple stored procedures and updating the dataset names in the SSRS report.
And when I try to run the preview I get the exact same error.
As it turns out, after investigating the issue, the problem was that I had changed the name property of my datasets.
There several places in my report where formulas or expressions use the old name properties of the datasets I renamed. After reverting the dataset names back, I managed to get the real errors like missing fields etc. atcual errors came back after I set my dataset name properties back to what they were.
I only changed the name property back to what it was, the stored procedure names were correctly referring to my renamed stored procedures.
I had this problem when after copying and pasting a tablix, it changed CDbl in a formula to Microsoft.ReportingServices.RdlObjectModel.ExpressionParser.VBFunctions.CDbl. I opened up the XML and removed all instances of "Microsoft.ReportingServices.RdlObjectModel.ExpressionParser.VBFunctions." and the report then worked.
For a working report, when I tried to add a column it gave me this error. I edited the .rdl file using notepad++. After SSRS prompt to reload the change from disk, it worked without issues.
I got this error after copying my Custom Code to Visual Studio for hightlighting the code for better readability. Well, Visual Studio added class definitions to the beginning and end of the file. After editing code, I pasted it back to report Custom Code, then got this error. Fix was just to remove class definitions (Public Class Class1 and End Class) from Custom Code. So, check your Custom Code also (if any).
I got this error after adding some new parameters to an existing report.
For some reason when I created the parameters first then modified the Dataset to use the new parameters I got the error, but when I modified the Dataset first then added the parameters second and I did not get the error.
This seemed like very strange behavior to me so I tested it by restoring the report from repository and repeating the process three times with each method, and had identical behavior every time.
I am also facing this problem. I solve this Find and replace
Microsoft.VisualBasic.Interaction.iif ==> iif
Microsoft.ReportingServices.RdlObjectModel.ExpressionParser.VBFunctions.cdbl
==> cdbl
I hope this may helps someone. Thanks
Possible root causes
Parameter name is incorrect(case/order)
Accessing non-existing property.
and many more...
Solution: To get the exact error message are
Deploy SSRS report and find the error : Suggested by "Kim Crosser" already
Remove the section(SSRS/Report content) temporarily you feel is error free to free space in buffer so that you can get actual error message. Later add sections back to the page(removed earlier).
I had the same error message and it was totally caused by my doing. It's a bit embarrassing, but if it helps someone out then great! I had accidentally copied my dataset query that included a small sub select statement within it, which I was using to check parameter/variable values.
Another solution is to open the .rdl file in Report Builder 3.0 (as opposed to Visual Studio) and try to preview it. I found this gave me the details of the error, although if more than one error is present it only shows the first.
I previously binded a TextBox to
Fields!FieldName
and fixed it with
Fields!FieldName.Value
With that said, and with the other answers posted as well, this error happens in different flavors. My issue was fixed after I had the field property "Value" included.

CSV import with user correction

I'm looking for general UI advice on importing a CSV file. The UI is done in ASP.NET MVC3.
When the user uploads the file I need to validate it and allow them to manually correct any errors within the browser before I store it in the database. There's so many potential errors to check for and I'm really not sure what the best way is to achieve this. Another thing is that I only have a few days to implement this so it can't be too complicated. I'm fine with regular expressions and programming and I already have the posted file stream available, but I just can't think of a good and practical way to present this functionaly to the user.
Hope someone can inspire me. Many thanks.
There are some suggestions here:
Reading a CSV file in .NET?
Of these, we chose to use Linq2CSV in our MVC projects.
http://www.codeproject.com/KB/linq/LINQtoCSV.aspx
It is fairly easy to use, and validation is nice. You define a simple class that lays out the structure (columns) of the csv file. It will do basic validation, and if that passed, we sent it through a Validator that used DataAnnotation attributes to validate against more complex rules. We found it reliable, and we were able to add some features to it that we wanted.
If the file was pathologically bad, we'd fail the whole thing and present a single error message. If the file was reasonably sound, we would display the rows in error along with the error messages for the row so they could see the problem in context. In our case, this was a display grid only - we did not allow editing through the website - because the CSVs were being generated out of their data system, and we needed them to edit the source data in their system and regenerate the CSV. To do in place editing, you would need to stage all the column values as strings so they can fix numbers that don't parse, etc.

Resources