I'm experimenting with the language_id.txt dataset from the Google Prediction example. Right now I'm trying to update the model with the following method:
def update(label, data)
input = #prediction.trainedmodels.update.request_schema.new
input.label = label
input.csv_instance = [data]
result = #client.execute(
:api_method => #prediction.trainedmodels.update,
:parameters => {'id' => MODEL_ID},
:headers => {'Content-Type' => 'application/json'},
:body_object => input
)
assemble_json_body(result)
end
(This method is based on some Google sample code.)
My problem is that these updates have no effect. Here are the scores for This is a test sentence. regardless of how many updates I run:
{
"response":{
"kind":"prediction#output",
"id":"mymodel",
"selfLink":"https://www.googleapis.com/prediction/v1.5/trainedmodels/mymodel/predict",
"outputLabel":"English",
"outputMulti":[
{
"label":"English",
"score":0.420937
},
{
"label":"French",
"score":0.273789
},
{
"label":"Spanish",
"score":0.305274
}
]
},
"status":"success"
}
Per the disclaimer at the bottom of "Creating a Sentiment Analysis Model", I have made sure to update at least 100 times before expecting any changes. First, I tried using a single sentence and updating it 1000 times. Second, I tried using ~150 unique sentences drawn from Simple Wikipedia and updated with each once. Each update was "successful":
{"response":{"kind":"prediction#training","id":"mymodel","selfLink":"https://www.googleapis.com/prediction/v1.5/trainedmodels/mymodel"},"status":"success"}
but neither approach changed my results.
I've also tried using the APIs Explorer (Prediction, v1.5) and updating ~300 times that way. There's still no difference in my results. Those updates were also "successful".
200 OK
{
"kind": "prediction#training",
"id": "mymodel",
"selfLink": "https://www.googleapis.com/prediction/v1.5/trainedmodels/mymodel"
}
I am quite sure that the model is receiving these updates. get and analyze both show that the model has numberInstances": "2024". Oddly, though, list shows that the model has "numberInstances": "406".
At this point, I don't know what could be causing this issue.
2019 Update
Based on the comment from Jochem Schulenklopper that the API was shut down in April 2018.
Developers who choose to move to the Google Cloud Machine Learning Engine will have to recreate their existing Prediction API models.
Machine Learning API examples:
https://github.com/GoogleCloudPlatform/cloudml-samples
Related
Google search API, as of this morning ( 7/30/2020 ) the Google API custom search with searchtype = image returns without the items[] array. I verified my CSE setting that includes image search = enabled and search the whole web; As nothing changed on my side and it all worked for several years until this morning i'm trying to find what has changed / happen;
The type of search I'm executing is like below:
for example:
"request": [
{
"title": "Google Custom Search - ocean",
"totalResults": "360000",
"searchTerms": "ocean",
"count": 10,
"startIndex": 1,
"inputEncoding": "utf8",
"outputEncoding": "utf8",
"safe": "off",
"cx": "XXX My Own Instance xxx",
"searchType": "image",
"imgSize": "xxlarge"
}
It returns all metadata with number of results and search time etc. but no items, so the my usual query that worked up to now, can't use the fields = kind,items(title,link,snippet) parameter ..
Now 8 hours later I tried again to execute a query with 'searchType=image" and it works;
From my Angular application as well as Python script and the REST testing tools.
I'm guessing Google had some downtime or server malfunction and then they fixed it.
Appreciate if anyone has confirmation of the scenario I'm suspecting
I was wondering what is better performance / memory wise: Iterating over all objects in a collection and calling set/add_to_set or calling set/add_to_set directly on the Criteria or using update all with set/add_to_set.
# update_all
User.where(some_query).update_all(
{
'$addToSet': {
:'some.field.value' => :value_to_add
}
}
)
# each do + add_to_set
User.where(some_query).each do |user|
user.add_to_set(:'some.field.value' => :value_to_add)
end
# Criteria#add_to_set
User.where(some_query).add_to_set(
:'some.field.value' => :value_to_add
)
Any input is appreciated. Thanks!
I started MongoDB server with verbose flag. That's what I got.
Option 1. update_all applied on a selector
2017-04-25 COMMAND command production_v3.$cmd command: update { update: "products", updates: [ { q: { ... }, u: { $addToSet: { test_field: "value_to_add" } }, multi: true, upsert: false } ], ordered: true }
I removed some output so that is easier to read. The flow is:
MongoID generates a single command with query and update specified.
MongoDB server gets the command. It goes through collection and updates each match in [vague] one go.
Note! You may learn from the source code or take as granted. Since MongoID, as per my terminology, generates command to send in step 1, it does not check your models. e.g. If 'some.field.value' is not one of your field in the model User, then the command will still go through and persist on DB.
Option 2. each on a selector
I get find commands like below followed by multiple getMore-s:
2017-04-25 COMMAND command production_v3.products command: find { find: "products", filter: { ... } } 0ms
I also get a huge number of update-s:
2017-04-25 COMMAND command production_v3.$cmd command: update { update: "products", updates: [ { q: { _id: ObjectId('52a6db196c3f4f422500f255') }, u: { $addToSet: { test_field: { $each: [ "value_to_add" ] } } }, multi: false, upsert: false } ], ordered: true } 0ms
The flow is radically different from the 1st option:
MongoID sends a simple query to to MongoDB server. Provided your collection is large enough and the query covers a material chunk of it, the following happens in a loop:
[loop] Respond with a subset of all matches. Leave the rest for the next iteration.
[loop] MongoID gets an array of matching items in Hash format. MongoID parses the each entry and initializes User class for it. That's an expensive operatation!
[loop] For each User instance from the previous step MongoID generates an update commands and sends it to serve. Sockets are expensive too.
[loop] MongoDB gets the command and goes through the collection until the first match. Updates the match. It is quick, but adds up once in a loop.
[loop] MongoID parses the response and updates its User instance accordingly. Expensive and unnecessary.
Option 3. add_to_set applied on a selector
Under the hood it is equivalent to Option 1. Its CPU and Memory overhead is immaterial for the sake of the question.
Conclusion.
Option 2 is so much slower that there is no point in benchmarking. In the particular case I tries, it resulted in 1000s of request to MongoDB and 1000s of User class initialization. Options 1 and 3 resulted in a single request to MongoDB and relied on MongoDB highly optimized engine.
I recently inherited an ES instance and ensured I read an entire book on ES cover-to-cover before posting this, however I'm afraid I'm unable to get even simple examples to work.
I have an index on our staging environment which exhibits behavior where every document is returned no matter what - I have a similar index on our QA environment which works like I would expect it to. For example I am running the following query against http://staging:9200/people_alias/_search?explain:
{ "query" :
{ "filtered" :
{ "query" : { "match_all" : {} },
"filter" : { "term" : { "_id" : "34414405382" } } } } }
What I noticed on this staging environment is the score of every document is 1 and it is returning EVERY document in my index no matter what value I specify ...using ?explain I see the following:
_explanation: {
value: 1
description: ConstantScore(*:*), product of:
details: [
{
value: 1, description: boost
}, { value: 1, description: queryNorm } ] }
On my QA environment, which correctly returns only one record I observe for ?explain:
_explanation: {
value: 1
description: ConstantScore(cache(_uid:person#34414405382)), product of:
details: [ {
value: 1,
description: boost
}, {
value: 1,
description: queryNorm
}
]
}
The mappings are almost identical on both indices - the only difference is I removed the manual field-level boost values on some fields as I read field-level boosting is not recommended in favor of query-time boosting, however this should not affect the behavior of filtering on the document ID (right?)
Is there any clue I can glean from the differences in the explain output or should I post the index mappings? Are there any server-level settings I should consider checking? It doesn't matter what query I use on Staging, I can use match queries and exact match lookups on other fields and Staging just keeps returning every result with Score 1.0
I feel like I'm doing something very glaringly and obviously wrong on my Staging environment. Could someone please explain the presence of ConstantScore, boost and queryNorm? I thought from looking at examples in other literature I would see things like term frequency etc.
EDIT: I am issuing the query from Elastic Search Head plugin
In your HEAD plugin, you need to use POST in order to send the query in the payload, otherwise the _search endpoint is hit without any constraints.
In your browser, if you open the developer tools and look at the networking tab, you'll see that nothing is sent in the payload when using GET.
It's a common mistake people often do. Some HTTP clients (like curl) do send a payload using GET, but others (like /head/) don't. Sense will warn you if you use GET instead of POST when sending a payload and will automatically force POST instead of GET.
So to sum it up, it's best to always use POST whenever you wish to send some payload to your servers, so you don't have to care about the behavior of the HTTP client you're using.
Edit: I found the answer, see below for Logstash <= 2.0 ===>
Plugin created for Logstash 2.0
Whomever is interested in this with Logstash 2.0 or above, I created a plugin that makes this dead simple:
The GEM is here:
https://rubygems.org/gems/logstash-filter-dateparts
Here is the documentation and source code:
https://github.com/mikebski/logstash-datepart-plugin
I've got a bunch of data in Logstash with a #Timestamp for a range of a couple of weeks. I have a duration field that is a number field, and I can do a date histogram. I would like to do a histogram over hour of day, rather than a linear histogram from x -> y dates. I would like the x axis to be 0 -> 23 instead of date x -> date y.
I think I can use the JSON Input advanced text input to add a field to the result set which is the hour of day of the #timestamp. The help text says:
Any JSON formatted properties you add here will be merged with the elasticsearch aggregation definition for this section. For example shard_size on a terms aggregation which leads me to believe it can be done but does not give any examples.
Edited to add:
I have tried setting up an entry in the scripted fields based on the link below, but it will not work like the examples on their blog with 4.1. The following script gives an error when trying to add a field with format number and name test_day_of_week: Integer.parseInt("1234")
The problem looks like the scripting is not very robust. Oddly enough, I want to do exactly what they are doing in the examples (add fields for day of month, day of week, etc...). I can get the field to work if the script is doc['#timestamp'], but I cannot manipulate the timestamp.
The docs say Lucene expressions are allowed and show some trig and GCD examples for GIS type stuff, but nothing for date...
There is this update to the BLOG:
UPDATE: As a security precaution, starting with version 4.0.0-RC1,
Kibana scripted fields default to Lucene Expressions, not Groovy, as
the scripting language. Since Lucene Expressions only support
operations on numerical fields, the example below dealing with date
math does not work in Kibana 4.0.0-RC1+ versions.
There is no suggestion for how to actually do this now. I guess I could go off and enable the Groovy plugin...
Any ideas?
EDIT - THE SOLUTION:
I added a filter using Ruby to do this, and it was pretty simple:
Basically, in a ruby script you can access event['field'] and you can create new ones. I use the Ruby time bits to create new fields based on the #timestamp for the event.
ruby {
code => "ts = event['#timestamp']; event['weekday'] = ts.wday; event['hour'] = ts.hour; event['minute'] = ts.min; event['second'] = ts.sec; event['mday'] = ts.day; event['yday'] = ts.yday; event['month'] = ts.month;"
}
This no longer appears to work in Logstash 1.5.4 - the Ruby date elements appear to be unavailable, and this then throws a "rubyexception" and does not add the fields to the logstash events.
I've spent some time searching for a way to recover the functionality we had in the Groovy scripted fields, which are unavailable for scripting dynamically, to provide me with fields such as "hourofday", "dayofweek", et cetera. What I've done is to add these as groovy script files directly on the Elasticsearch nodes themselves, like so:
/etc/elasticsearch/scripts/
hourofday.groovy
dayofweek.groovy
weekofyear.groovy
... and so on.
Those script files contain a single line of Groovy, like so:
Integer.parseInt(new Date(doc["#timestamp"].value).format("d")) (dayofmonth)
Integer.parseInt(new Date(doc["#timestamp"].value).format("u")) (dayofweek)
To reference these in Kibana, firstly create a new search and save it, or choose one of your existing saved searches (Please take a copy of the existing JSON before you change it, just in case) in the "Settings -> Saved Objects -> Searches" page. You then modify the query to add "Script Fields" in, so you get something like this:
{
"query" : {
...
},
"script_fields": {
"minuteofhour": {
"script_file": "minuteofhour"
},
"hourofday": {
"script_file": "hourofday"
},
"dayofweek": {
"script_file": "dayofweek"
},
"dayofmonth": {
"script_file": "dayofmonth"
},
"dayofyear": {
"script_file": "dayofyear"
},
"weekofmonth": {
"script_file": "weekofmonth"
},
"weekofyear": {
"script_file": "weekofyear"
},
"monthofyear": {
"script_file": "monthofyear"
}
}
}
As shown, the "script_fields" line should fall outside the "query" itself, or you will get an error. Also ensure the script files are available to all your Elasticsearch nodes.
I am using Codeigniter and Alex Bilbie's MongoDB library.
In my API that I am developing users can upload images and other users can comment on them.
I have chosen to include the comments as sub documents to the images.
Each comment contains:
Fullname (of author)
Comment
Created_at
So in other words. The users full name is "hard coded" into each comment so if they
later decides to change their names I have a problem.
I read that I can use atomic updates to update all occurrences of the name (like in comments) but how can I do this using Alex´s library? Can I update all places where the name is wrong?
UPDATE
This is how the image document looks like with the comments.
I think that it is pretty strange that MongoDB encourage the use of subdocuments but then does not include a way to update multiple items in an array.
{
"_id": ObjectId("4e9ead773dc793dc01020000"),
"description": "An image",
"category": "accident",
"comments": [
{
"id": ObjectId("4e96bd063dc7937202000000"),
"fullname": "James Bond",
"comment": "This is a comment.",
"created_at": "2011-10-19 13:02:40"
}
],
"created_at": "2011-10-19 12:59:03"
}
Thankful for all help!
I am not familiar with codeignitor, but mb mongodb shell syntax will help you:
db.comments.update( {"Fullname":"Andrew Orsich"},
{ $set : { Fullname: "New name"} }, false, true )
Last true flag indicate that you want update multiple documents. So it is possible to update all comments in one update operation.
BTW: denormalazing (not 'hard coding') data in mongodb and nosql in general is usual operation. Also operation that require update a lot of documents usually work async. But it is up to you.
Update:
db.comments.update( {"comments.Fullname":"Andrew Orsich"},
{ $set : { comments.$.Fullname: "New name"} }, false, true )
But, above query will update full name in first comment on nested array. If you need to affect changes to more than one array element you will need to use multiple update statements.