I would import a csv file into python with FileChooser. Then when using rpy2, I can perform Statistical analyses with R I know much better compared to Python. Below is a piece of my code:
import pygtk
pygtk.require("2.0")
import gtk
from rpy2.robjects.vectors import DataFrame
def get_open_filename(self):
filename = None
chooser = gtk.FileChooserDialog("Open File...", self.window,
gtk.FILE_CHOOSER_ACTION_OPEN,
(gtk.STOCK_CANCEL, gtk.RESPONSE_CANCEL,
gtk.STOCK_OPEN, gtk.RESPONSE_OK))
response = chooser.run()
if response == gtk.RESPONSE_OK:
don = DataFrame.from_csvfile(chooser.get_filename())
print(don)
chooser.destroy()
return filename
When runing the code, don is printed. But the question is: in don, there are two columns, X and Y I can't access to perform analyses. Thanks for your kind help
Did you check the documentation about extracting elements from a DataFrame ?
Related
I'm using the optuna.integration.lightgbm for training a LightGBM model.
The issue is theres a TON out outputs, and frankly I just want to disable them (or atleast find a way to regularize it).
I have tried this and lots of stuff e.g
import optuna
import optuna.integration.lightgbm as lgb
from lightgbm import log_evaluation
optuna.logging.set_verbosity(optuna.logging.ERROR) #Ignore outputs from Optuna when training
params = {
"objective": "softmax",
"metric":"multi_logloss",
"boosting_type": "gbdt",
"is_unbalance":True,
"num_classes":4,
"num_boost_round":10,
"verbosity":-1
}
model = lgb.train(
params,
dtrain,
valid_sets=[dtrain, dval],
callbacks = [early_stopping(100,verbose=False),log_evaluation(0)],
)
but I still get "early_stopping" outputs, validation outputs from each round etc. etc. as seen
There's even a suggestion of using log_evalution(), which I have also passed.
I can't think of more ways to (try) to ignore outputs.
I'm writing a program to generate text...
I need to remove the input from the generated text. How can I do this?
The code:
input_ids = tokenizer(context, return_tensors="pt").input_ids
gen_tokens = model.generate(
input_ids,
do_sample=True,
temperature=0.8,
top_p=0.9)
strs = tokenizer.batch_decode(gen_tokens)[0]
Here the strs contains the input I've given...
How to remove that?
The Transformers library does not provide you with a way to do it, but this is something you can easily achieve with 1 line of code:
strs = strs.replace(context,"")
This is actually what I'm doing behind my NLP Cloud API as it uses Transformers behind the hood.
I found a bug (I think) using the 2.13.4 version of vtd-xml. Well, in short I have the following snippet code:
String test = "<catalog><description></description></catalog>";
VTDGen vg = new VTDGen();
vg.setDoc(test.getBytes("UTF-8"));
vg.parse(true);
VTDNav vn = vg.getNav();
//get nodes with no childs, text and attributes
String xpath = "/catalog//*[not(child::node()) and not(child::text()) and count(#*)=0]";
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath(xpath);
//block inside while is never executed
while(ap.evalXPath()!=-1) {
System.out.println("current node "+vn.toRawString(vn.getCurrentIndex()));
}
and this doesn't work (=do not find any node, while it should find "description" instead). The code above works if I use the self closed tag:
String test = "<catalog><description/></catalog>";
The point is every xpath evaluator works with both version of the xml. Sadly I receive the xml from an external source, so I have no power over it...
Breaking the xpath I noticed that evaluating both
/catalog//*[not(child::node())]
and
/catalog//*[not(child::text())]
give false as result. As additional bit I tried something like:
String xpath = "/catalog/description/text()";
ap.selectXpath(xpath);
if(ap.evalXPath()!=-1)
System.out.println(vn.toRawString(vn.getCurrentIndex()));
And this print empty space, so in some way VTD "thinks" the node has text, even empty but still, while I expect a no match. Any hint?
TL;DR
When I faced this issue, I was left mainly with three options (see below). I went for the second option : Use XMLModifier to fix the VTDNav. At the bottom of my answser, you'll find an implementation of this option and a sample output.
The long story ...
I faced the same issue. Here are the main three options I first thought of (by order of difficulty) :
1. Turn empty elements into self closed tags in the XML source.
This option isn't always possible (like in OP case). Moreover, it may be difficult to "pre-process" the xml before hand.
2. Use XMLModifier to fix the VTDNav.
Find the empty elements with an xpath expression, replace them with self closed tags and rebuild the VTDNav.
2.bis Use XMLModifier#removeToken
A lower level variant of the preceding solution would consist in looping over the tokens in VTDNav and remove unecessary tokens thanks to XMLModifier#removeToken.
3. Patch the vtd-xml code directly.
Taking this path may require more effort and more time. IMO, the optimized vtd-xml code isn't easy to grasp at first sight.
Option 1 wasn't feasible in my case. I failed implementing Option 2bis. The "unecessary" tokens still remained. I didn't look at Option 3 because I didn't want to fix some (rather complex) third party code.
I was left with Option 2. Here is an implementation:
Code
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.ximpleware.AutoPilot;
import com.ximpleware.NavException;
import com.ximpleware.VTDException;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
import com.ximpleware.XMLModifier;
#Test
public void turnEmptyElementsIntoSelfClosedTags() throws VTDException, IOException {
// STEP 1 : Load XML into VTDNav
// * Convert the initial xml code into a byte array
String xml = "<root><empty-element></empty-element><self-closed/><empty-element2 foo='bar'></empty-element2></root>";
byte[] ba = xml.getBytes(StandardCharsets.UTF_8);
// * Build VTDNav and dump it to screen
VTDGen vg = new VTDGen();
vg.setDoc(ba);
vg.parse(false); // Use `true' to activate namespace support
VTDNav nav = vg.getNav();
dump("BEFORE", nav);
// STEP 2 : Prepare to fix the VTDNAv
// * Prepare an autopilot to find empty elements
AutoPilot ap = new AutoPilot(nav);
ap.selectXPath("//*[count(child::node())=1][text()='']");
// * Prepare a simple regex matcher to create self closed tags
Matcher elementReducer = Pattern.compile("^<(.+)></.+>$").matcher("");
// STEP 3 : Fix the VTDNAv
// * Instanciate an XMLModifier on the VTDNav
XMLModifier xm = new XMLModifier(nav);
ByteArrayOutputStream baos = new ByteArrayOutputStream(); // baos will hold the elements to fix
String utf8 = StandardCharsets.UTF_8.name();
// * Find all empty elements and replace them
while (ap.evalXPath() != -1) {
nav.dumpFragment(baos);
String emptyElementXml = baos.toString(utf8);
String selfClosingTagXml = elementReducer.reset(emptyElementXml).replaceFirst("<$1/>");
xm.remove();
xm.insertAfterElement(selfClosingTagXml);
baos.reset();
}
// * Rebuild VTDNav and dump it to screen
nav = xm.outputAndReparse(); // You MUST call this method to save all your changes
dump("AFTER", nav);
}
private void dump(String msg,VTDNav nav) throws NavException, IOException {
System.out.print(msg + ":\n ");
nav.dumpFragment(System.out);
System.out.print("\n\n");
}
Output
BEFORE:
<root><empty-element></empty-element><self-closed/><empty-element2 foo='bar'></empty-element2></root>
AFTER:
<root><empty-element/><self-closed/><empty-element2 foo='bar'/></root>
After training an LDA model on gensim LDA model i converted the model to a with the gensim mallet via the malletmodel2ldamodel function provided with the wrapper. Before and after the conversion the topic word distributions are quite different. The mallet version returns very rare topic word distribution after conversion.
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=13, id2word=dictionary)
model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet)
model.save('ldamallet.gensim')
dictionary = gensim.corpora.Dictionary.load('dictionary.gensim')
corpus = pickle.load(open('corpus.pkl', 'rb'))
lda_mallet = gensim.models.wrappers.LdaMallet.load('ldamallet.gensim')
import pyLDAvis.gensim
lda_display = pyLDAvis.gensim.prepare(lda_mallet, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)
Here is the output from gensim original implementation:
I can see there was a bug around this issue which has been fixed with the previous versions of gensim. I am using gensim=3.7.1
Here is an optional function to use instead of malletmodel2ldamodel (reported to have bugs):
from gensim.models.ldamodel import LdaModel
import numpy
def ldaMalletConvertToldaGen(mallet_model):
model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, eta=0, iterations=1000, gamma_threshold=0.001, dtype=numpy.float32)
model_gensim.state.sstats[...] = mallet_model.wordtopics
model_gensim.sync_state()
return model_gensim
converted_model = ldaMalletConvertToldaGen(mallet_model)
I used it and it worked perfectly.
Ok probably barking up the wrong tree with this one but some guidance would be nice!
Currently got an app that exports data to a text file
stream.open(file, FileMode.APPEND);
stream.writeUTFBytes(data1 + data2);
stream.close();
and then use the following to import that data
var textloader:URLLoader = URLLoader(event.target);
MyTextFile_txt.text = textloader.data;
Now is there anyway of sorting this information (for example put it in order of data2 records)? I know sorting from a textfile is probably a little difficult. Would there be a better way of exporting the file instead? Or when importing the file can I get it to import into a specific text box.
Dunno just throwing some ideas out.
Although not essential you can use stream.readUTFBytes instead of URLLoader.
Regarding sorting data you can add all the loaded data into an array and then use sort() on the array.
e.g.
var someArray:Array = [];
for (var i:int; i < loadedData.xmlNodeName.length; i++) {
someArray.push(loadedData.xmlNodeName[i]);
}
someArray.sort();
http://help.adobe.com/en_US/ActionScript/3.0_ProgrammingAS3/WS5b3ccc516d4fbf351e63e3d118a9b90204-7fa4.html