Csv with Weka how to add comma as a value not a seperator - algorithm

I have a dataset. With using this dataset, I must run machine learning algorithms. But my dataset has some elements which also has comma but when I convert CSV to Arff this comma values does not recognized.
Example;
a,b,c
asdasd'%sdas,1,5,4234
My elements are
asdasd'%sdas 1,5 4234
But I could not handle the value has comma inside it.
I tried these
a,b,c
asdasd'%sdas,1\,5,4234
a,b,c
asdasd'%sdas,"1,5",4234
How can I pass comma valued element while using weka? My another wonder is how pass an element as string which has special chars like "sdas&%',+" Is it possible or something similar with this?

The following should work:
"asdasd'%sdas","1,5",4234
You can send strings that contain special characters just like this.

Related

extract and replace parameters in SQL query using M-language

This question is related with this question. However, in that question I made some wrong assumptions...
I have a string that contains a SQL query, with or without one or more parameters, where each parameter has a "&" (ampersand sign) as prefix.
Now I want to extract all parameters, load them into a table in excel where the user can enter the values for each parameter.
Then I need to use these values as a replacement for the variables in the SQL query so I can run the query...
The problem I am facing is that extracting (and therefore also replacing) the parameter names is not that straight forward, because the parameters are not always surrounded with spaces (as I assumed in my previous question)
See following examples
Select * from TableA where ID=&id;
Select * from TableA where (ID<&ID1 and ID>=&ID2);
Select * from TableA where ID = &id ;
So, two parts of my question:
How can I extract all parameters
How can I replace all parameters using another table where the replacements are defined (see also my previous question)
A full solution for this would require getting into details of how your data is structured and would potentially be covering a lot of topics. Since you already covered one way to do a mass find/replace (which there are a variety of ways to accomplish in Power Query), I'll just show you my ugly solution to extracting the parameters.
List.Transform
(
List.Select
(
Text.Split([YOUR TEXT HERE], " "), each Text.Contains(_,"&")
),
each List.Accumulate
(
{";",")"}, <--- LIST OF CHARACTERS TO CLEAN
"&" & Text.AfterDelimiter(_, "&"),
(String,Remove) => Text.Replace(String,Remove,"")
)
)
This is sort of convoluted, but here's the best I can explain what is going on.
The first key part is combining List.Select with Text.Split to extract all of the parameters from the string into a list. It's using a " " to separate the words in the list, and then filtering to words containing a "&", which in your second example means the list will contain "(ID<&ID1" and "ID>=&ID2);" at this point.
The second part is using Text.AfterDelimiter to extract the text that occurs after the "&" in our list of parameters, and List.Accumulate to "clean" any unwanted characters that would potentially be hanging on to the parameter. The list of characters you would want to clean has to be manually defined (I just put in ";" and ")" based on the sample data). We also manually re-append a "&" to the parameter, because Text.AfterDelimiter would have removed it.
The result of this is a List object of extracted parameters from any of the sample strings you provided. You can setup a query that takes a table of your SQL strings, applies this code in a custom column where [YOUR TEXT HERE] is the field containing your strings, then expand the lists that result and remove duplicates on them to get a unique list of all the parameters in your SQL strings.

How to detect data type in column of table in ORACLE database (probably blob or clob)?

I have a table with a column in the format VARCHAR2(2000 CHAR). This column contained a row containing comma-separated numbers (ex: "3;3;780;1230;1;450.."). Now the situation has changed. Some rows contain data in the old format, but some contain the following data (ex: "BAAAABAAAAAgAAAAHAAAAAAAAAAAAAAAAQOUw6.."). Maybe it's blob or clob. How can I check exactly? And how can I read it now? Sorry for my noob question :)
The bad news is you really can't. Your column is a VARCHAR2 so it's all character data. It seems like what you're really asking is "How do I tell if this value is a comma separated string or a binary value encoded as a string?" So the best you can do is make an educated guess. There's not enough information here to give a very good answer, but you can try things like:
If the value is numeric characters with separators (you say commas but your example has semicolons) then treat it as such.
But what if the column value is "123", is that a single number or a short binary value?
If there are any letters in the value, you know it's not a separated list of numbers, then treat it as binary. But not all encoded binary values will have letters.
Try decoding it as binary, if it fails, maybe it's actually the separated list. This probably isn't a good one.

Parsing semicolon separated key value pairs into CSV file

I have a piece of data that is composed of semicolon separated key value pairs (a round 50 pairs) on the same line. The existence of all pairs is not necessary in each line.
Below is a sample of the data:
A=0.1; BB=2; CD=hi there; XZV=what's up; ...
A=-2; CD=hello; XZV=no; ...
I want to get a CSV file of this data, where the key becomes the field (column) name and the value becomes the row value of that particular line. Missing pairs should be replaced by default value or left blank.
In other words, I want my CSV to look like this:
A,BB,CD,XZV,....
0.1,2,"hi there","what's up",...
-2,0,"hello","no";...
The volume of my data is extremely large. What is the most efficient way to do this? Bash solution is highly appreciated.

sphinx query on NOT array

I can't find example for my task
I have an array of strings and I need to make query to retrieve all rows that DON'T contain every string from the array
building string like this
(!11 !22 !33) doesn't work (single NOT operator)
Thanks!
If you really do need to do that, need to include a term that will match every row.
eg you could arrange for 99 to be present on every single row.
(99 !11 !22 !33)

List of names and their numbers needed to be sorted .TXT file

I have a list of names (never over 100 names) with a value for each of them, either 3 or 4 digits.
john2E=1023
mary2E=1045
fred2E=968
And so on... They're formatted exactly like that in the .txt file. I have Python and Excel, also willing to download whatever I need.
What I want to do is sort all the names according to their values in a descending order so highest is on top. I've tried to use Excel by replacing the '2E=' with ',' so I can have the name,value then important the data so each are in separate columns but I still couldn't sort them any other way than A to Z.
Help is much appreciated, I did take my time to look around before posting this.
Replace the "2E=" with a tab character so that the data is displayed in excel in two columns. Then sort on the value column.

Resources