Cannot suppress Stanford parser warnings - stanford-nlp

While using the TokenizerFacotry of Stanford parser I made sure to set options as "untokenizable=noneDelete" still I could not manage to get ride of warnings, what could be the problem?
public static List<Tree> findHeadNounPhrases(List<String> unites)
{
List<Tree> nps = new ArrayList<Tree>();
for(String sentence : unites)
{
HeadFinder hf = new PennTreebankLanguagePack().headFinder();
StringReader reader = new StringReader(sentence);
TokenizerFactory<CoreLabel> tokenizerFactory =
PTBTokenizer.factory(new CoreLabelTokenFactory(), "untokenizable=noneDelete");
tokenizerFactory.setOptions("untokenizable=noneDelete");
Tokenizer<CoreLabel> tok =tokenizerFactory.getTokenizer(reader);
List<CoreLabel> rawWords2 = tok.tokenize();
Tree tree = lp.apply(rawWords2);
...
}
I am getting the following warnings:
Mar 10, 2016 11:13:51 AM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: ି (U+B3F, decimal: 2879)
Mar 10, 2016 11:13:51 AM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: ି (U+B3F, decimal: 2879)
Mar 10, 2016 11:13:56 AM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: (U+89, decimal: 137)

Related

PyTorch - NotImplementedError at model fit using MAC M1 MPS in Notebooks

I am getting error:
NotImplementedError: The operator 'aten::hardswish_' is not currently implemented for the MPS device, goes on to say, Comment https://github.com/pytorch/pytorch/issues/77764, but nothing there.
I have gone through the appropriate set up environment (conda activate 'en' and 'torch.backends.mps.is_available()' returns TRUE
Enviro:
Python Platform: macOS-13.2-arm64-arm-64bit
PyTorch Version: 1.13.1
Python 3.9.13 (main, Aug 25 2022, 18:24:45)
[Clang 12.0.0 ]
GPU is NOT AVAILABLE
MPS is AVAILABLE
Target device is mps
Googling indicates that some operators are not currently supported by PyTorch. If that's the case, should my implementation be different. Has anyone had experience in this, or will not work? (obviously work with cpu).
Getting the error here:
Input In [9], in <cell line: 4>()
13 imgs = list(img.to(device) for img in imgs)
14 annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
---> 15 loss_dict = model(imgs, annotations)
16 losses = sum(loss for loss in loss_dict.values())
18 optimizer.zero_grad()
Setup:
def get_model_instance_segmentation(num_classes):
# load an instance segmentation model pre-trained pre-trained on COCO
#model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=False)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
return model
for img_, annotations in tr_loader:
list(img.to(device) for img in img_)
annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
#print(annotations)
for img_, annotations in val_loader:
list(img.to(device) for img in img_)
annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
#print(annotations)
model = get_model_instance_segmentation(config.num_classes)
if torch.backends.mps.is_available():
model.to(device)
print(device)
# parameters
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(
params, lr=config.lr, momentum=config.momentum, weight_decay=config.weight_decay
)
len_trloader = len(tr_loader)
len_valloader = len(val_loader)
train_loss = []
val_loss = []
min_valid_loss = np.inf
for epoch in range(config.num_epochs):
print(f"Epoch: {epoch}/{config.num_epochs}")
model.train()
i = 0
j = 0
for imgs, annotations in tr_loader:
i += 1
imgs = list(img.to(device) for img in imgs)
annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
loss_dict = model(imgs, annotations)
losses = sum(loss for loss in loss_dict.values())
optimizer.zero_grad()
losses.backward()
optimizer.step()
train_loss.append(losses.tolist())
print(f"Iteration_Train: {i}/{len_trloader}, Loss: {losses}")
valid_loss = 0.0
for imgs, annotations in val_loader:
j += 1
imgs = list(img.to(device) for img in imgs)
annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
loss_dict_val = model(imgs, annotations)
losses_val = sum(loss_val for loss_val in loss_dict_val.values())
val_loss.append(losses_val.tolist())
print(f"Iteration_Val: {i}/{len_valloader}, Loss: {losses_val}")

Typoscript image change every 15 days

I would like to make a change of image say every 15 days. Or twice a month. When the date is between the 1 and 15 is an image, and if it is from 16 to 30 another, so on, are 24 images in the year. I would like it to be the typoscript that manages the change of the image.
I took the following typosript:
lib.headerlogo1 = COA
lib.headerlogo1 {
10 = LOAD_REGISTER
10 {
divSem.cObject = TEXT
divSem.cObject {
data = date:U
strftime = %U
current = 1
setCurrent.data = date:U
setCurrent.wrap = |/2
prioriCalc = 1
}
}
20 = FILES
20 {
references {
data = levelmedia: -1, slide
}
renderObj = IMAGE
renderObj {
file.import.dataWrap = {file:current:storage}:{file:current:identifier}
#file.import.listNum = 0
altText.data = file:current:title
# Affiche bien la valeur de : divSem
#stdWrap.insertData = 1
#stdWrap.wrap = <div class="banner{register:divSem}">|</div>
}
# insertData = 1
insertData = 1
# IT'S FAILLED !!
begin = {register:divSem}
maxItems = 1
}
30 = TEXT
30 {
stdWrap.insertData = 1
stdWrap.wrap = <div class="{register:divSem}">|</div>
}
}
The problem is that I can not start the value of the registry begin = {register:divSem} ... It always starts at 0! Do you have an idea ? The display of the registers in 30 = TEXT is correct.
Do you have a good idea to modify the typoscript?
I just found the solution, instead of begin = {register: divSem}, I did this:
begin.cObject = TEXT
begin.cObject {
value = 0
value.override.cObject = CASE
value.override.cObject {
key.data = register:divSem
1 = TEXT
1.value = 1
2 = TEXT
2.value = 2
...
24 = TEXT
24.value = 24
default = TEXT
default.value = 2
}
}
Maybe there is more simple, if you have an idea, I'm interested.
Best regards.
You found the important detail: you need a .cObject to fill in any data in the simple property.
Why so complicated with a CASE that outputs the same as the key?
So the simplest way would be:
begin.cObject = TEXT
begin.cObject.data = register:divSem
maybe this also worked like you do in .30:
begin = {register:divSem}
begin.insertData = 1
and a more direct way of your .30:
instead of an .insertData for a .wrap use .dataWrap
begin.stdWrap.dataWrap = {register:divSem}
:
30 = TEXT
30.dataWrap = <div class="{register:divSem}">|</div>

Different week no. returned using DateTimeFormatter YYYY-w & by DateTimeFormatter.ISO_WEEK_DATE Java8

I am trying to format the milliseconds date from sample milliseconds value-1451646394000 to get the week number.
Following is the code snippet:
import java.time.Instant;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.chrono.IsoChronology;
import java.time.format.DateTimeFormatter;
public class Test1 {
public static void main(String[] args) {
final String dateFormat = "YYYY-w";
ZoneId zoneId_UTC = ZoneId.of("UTC");
final long indexTimeStampMillis = 1463510726000L;
LocalDateTime dateTime = LocalDateTime.ofInstant(Instant.ofEpochMilli(indexTimeStampMillis), zoneId_UTC);
// Case 1:
final DateTimeFormatter weekFormatter = DateTimeFormatter.ISO_WEEK_DATE;
String output1 = dateTime.format(weekFormatter);
System.out.println("Correct output of week number according to ISO Week numbers" + output1);
// Case 2:
final DateTimeFormatter formatter = DateTimeFormatter.ofPattern(dateFormat).withZone(zoneId_UTC)
.withChronology(IsoChronology.INSTANCE);
String output = dateTime.format(formatter);
System.out.println("Week number I am getting in output " + output);
}
}
Output on console:
Correct output of week number according to ISO Week numbers2016-W20-2
Week number I am getting in output 2016-21
NOTE:
Correct week number for the above date is 20 according to ISO 8601. The date in milliseconds converts to Tue, 17 May 2016 18:45:26 GMT.

boost log text_file_backend performance

I am using boost log and I choose the text_file_backend, but I got bad performance. No matter sync or async, the boost log has a low performance. About in 6 seconds, it wrote 30M data to log file.
Follow is my code snippet, anyone can help me?
typedef boost::log::sinks::asynchronous_sink<
boost::log::sinks::text_file_backend> TextSink;
boost::log::sources::severity_logger_mt<LogSev> logger_;
boost::shared_ptr<TextSink> report_sink_;
// initialize report_sink
boost::shared_ptr<sinks::text_file_backend> report_backend =
boost::make_shared<sinks::text_file_backend>(
keywords::file_name = target + "/" + file_name
+ ".report.log.%Y_%m_%d.%N",
keywords::rotation_size = file_size, keywords::time_based_rotation =
sinks::file::rotation_at_time_point(0, 0, 0),
keywords::auto_flush = false);
boost::shared_ptr<sinks::file::collector> report_collector = CreateCollector(
target, max_use_size / 2, min_free_size);
report_backend->set_file_collector(report_collector);
report_backend->scan_for_files();
// add sink: report_sink
report_sink_ = boost::make_shared<TextSink>(report_backend);
report_sink_->set_formatter(
expr::format("[%1%]" + sep + "[%2%]" + sep + "[%3%]" + sep + "%4%")
% expr::format_date_time<boost::posix_time::ptime>(
"TimeStamp", "%Y-%m-%d %H:%M:%S.%f")
% expr::attr<LogSev>("Severity")
% expr::attr<attrs::current_thread_id::value_type>("ThreadID")
% expr::message);
report_sink_->set_filter(expr::attr<LogSev>("Severity") >= report);
logging::core::get()->add_sink(report_sink_);
logging::add_common_attributes();
BOOST_LOG_SEV(logger_, info) << "blabal...";
I think one performance issue about your implementation is about Timestamp. It needs a system-call to find Time. I encountered the same problem. So I turned to use date library. It returns UTC time very fast. Also check first answer of this question, However if you want Timestamp based on timezone, date library is slow. you should better define your timezone and add to UTC.
see the example:
#include "date.h"
#define MY_TIME std::chrono::hours(4) + std::chrono::minutes(30)
string timestamp = date::format("%F %T", std::chrono::system_clock::now() +
MY_TIME);

Funnels and RethinkDB

I have some steps saved in RethinkDB like this :
{
date: "Sat Feb 06 2015 00:00:00 GMT+00:00",
step: 1
},
{
date: "Sat Feb 06 2015 11:11:11 GMT+00:00",
step: 3
},
{
date: "Sat Feb 06 2015 22:22:22 GMT+00:00",
step: 2
}
I'd like to count the number of steps done in order 1, 2, 3.
So for this example, I'd like to get something like this :
{
step_1: 1,
step_2: 1,
step_3: 0
}
I have seen a step 1, so step_1 is 1
I have seen a step 2, and have seen a step 1 before, so step_2 is 1
I have seen a step 3, but I haven't seen a step 2 before, so step_3 is 0
I've tried a lot of things but didn't find a way to do it and actually I'm not really sure it's possible with RethinkDB.
Any of you have an idea for me ?
Thanks,
Adrien
As a preface: this really abuses the r.js term, and so is always going to be a bit slow. So this solution is not really great for production services. Unless someone comes up with a better answer (i.e.: all ReQL) it would be better to do this mostly client-side, possibly using a group and min to pre-digest some of this.
A couple of us really think that there is an answer, and mine uses reduce with a JavaScript function for the main bit. Unfortunately in working on this answer I found a bug that crashes our server, so for the moment my answer is slightly untested, and not recommended, but the in-progres version might give you the start of an answer:
r.table('alpha').reduce(r.js('''(
function(left, right) {
var returnValue = [];
var sources = [left, right];
for (var i in sources) {
var source = sources[i];
if (source.date) {
if (!returnValue[source.step] || returnValue[source.step] > source.date) {
returnValue[source.step] = source.date;
}
} else {
for (i in source) {
if (!returnValue[i] || returnValue[i] > source(key)) {
returnValue[i] = source(key);
}
}
}
}
return returnValue;
})''')).map(r.js('''(
function (row) {
var allValid = true;
var lastDate = 0;
var returnValue = [];
for (i in row) {
var afterLast = row[i] && row[i] > lastDate && allValid;
allValid = afterLast == true;
returnValue.push(afterLast);
lastDate = row[i];
}
}
})''')).run()

Resources