Seaweedfs Delete file succeds but existing filer still holds it - go

We use seaweedfs 1.78
When i use grpc delete a file via filer.
curl -X DELETE http://filer1:9889/dataset/qiantao/1.txt
It return success.
Because I have 10 filer. after delete!
curl -H "Accept: application/json" "http://filer2:9889/dataset/qiantao/?pretty=y" |grep qiantao |grep txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15723 0 15723 0 0 1917k 0 --:--:-- --:--:-- --:--:-- 2193k
"FullPath": "/dataset/qiantao/1.txt",
If I start a new filer.
It can not got /dataset/qiantao/1.txt; It looks perfect!!!!
But in exist filers.
Filer get file info below.
curl -H "Accept: application/json" "http://filer1:9889/dataset/qiantao/?pretty=y&limit=1"
{
"Path": "/dataset/qiantao",
"Entries": [
{
"FullPath": "/dataset/qiantao/1.txt",
"Mtime": "2020-12-07T11:15:59+08:00",
"Crtime": "2020-12-07T11:15:59+08:00",
"Mode": 432,
"Uid": 0,
"Gid": 0,
"Mime": "text/plain",
"Replication": "010",
"Collection": "",
"TtlSec": 0,
"UserName": "",
"GroupNames": null,
"SymlinkTarget": "",
"Md5": null,
"Extended": null,
"chunks": [
{
"file_id": "4328,587fb084df9f9dbf",
"size": 2,
"mtime": 1607310959158810676,
"e_tag": "c7c83966",
"fid": {
"volume_id": 4328,
"file_key": 1484763268,
"cookie": 3751779775
}
}
]
}
],
"Limit": 1,
"LastFileName": "1.txt",
"ShouldDisplayLoadMore": true
Get volume info below.
{
"Id": 4328,
"Size": 31492542356,
"ReplicaPlacement": {
"SameRackCount": 0,
"DiffRackCount": 1,
"DiffDataCenterCount": 0
},
"Ttl": {
"Count": 0,
"Unit": 0
},
"Collection": "",
"Version": 3,
"FileCount": 111030,
"DeleteCount": 709,
"DeletedByteCount": 1628822733,
"ReadOnly": false,
"CompactRevision": 0,
"ModifiedAtSecond": 0,
"RemoteStorageName": "",
"RemoteStorageKey": ""
},
So download 4328.idx from volume server. and use see_idx lookup it.
./see_idx -dir /Users/qiantao/Documents/seaweedfs -volumeId=4328 -v=4 |grep 587fb084
key:587fb084 offset:2802901546 size:57
key:587fb084 offset:3937021600 size:4294967295
It looks like key:587fb084 is covered with new?
So How can I fix this problem to make it appear normal?

4294967295 is a tombstone, marking the entry has been deleted.

Related

How to I use the article alias feature

I am trying to link to an article that has an alias associated with it. I have not been able to find any documentation on how this is done.
Here is what I have done so far:
Step 1: Created an Article by posting it to the /articles API:
curl -X 'POST' \
'https://mysubdomain.vanilladevelopment.com/api/v2/articles' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'x-transient-key: mykey' \
-d '{
"body": "demo",
"draftID": 0,
"format": "text",
"knowledgeCategoryID": 512,
"name": "Article Title"
}'
I received positive confirmation that an article was created:
{
"articleID": 987,
"articleRevisionID": 1348,
"knowledgeCategoryID": 512,
"breadcrumbs": [
{
"name": "Learners",
"url": "https://mysubdomain.vanilladevelopment.com/english/kb/documentation-learners"
}
],
"knowledgeBaseID": 4,
"name": "Article Title",
"body": "demo",
"outline": [],
"excerpt": "demo",
"seoDescription": null,
"seoName": null,
"slug": "987-article-title",
"sort": 14,
"score": 0,
"views": 0,
"url": "https://mysubdomain.vanilladevelopment.com/english/kb/articles/987-article-title",
"insertUserID": 9,
"dateInserted": "2022-08-19T13:34:33+00:00",
"updateUserID": 9,
"dateUpdated": "2022-08-19T13:34:33+00:00",
"status": "published",
"featured": false,
"dateFeatured": null,
"locale": "en",
"translationStatus": "up-to-date",
"foreignID": null
}
Step 2: Created an alias to the article using the articles/{id}/aliases API
curl -X 'PUT' \
'https://mysubdomain.vanilladevelopment.com/api/v2/articles/987/aliases' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'x-transient-key: mykey' \
-d '{
"aliases": [
"andy987"
]
}'
I received positive confirmation that the alias was created:
{
"articleID": 987,
"articleRevisionID": 1348,
"knowledgeCategoryID": 512,
"breadcrumbs": [
{
"name": "Learners",
"url": "https://mysubdomain.vanilladevelopment.com/english/kb/documentation-learners"
}
],
"knowledgeBaseID": 4,
"name": "Article Title",
"body": "demo",
"outline": [],
"excerpt": "demo",
"seoDescription": null,
"seoName": null,
"slug": "987-article-title",
"sort": 14,
"score": 0,
"views": 0,
"url": "https://mysubdomain.vanilladevelopment.com/english/kb/articles/987-article-title",
"insertUserID": 9,
"dateInserted": "2022-08-19T13:34:33+00:00",
"updateUserID": 9,
"dateUpdated": "2022-08-19T13:34:33+00:00",
"aliases": [
"andy987"
],
"status": "published",
"featured": false,
"dateFeatured": null,
"locale": "en",
"translationStatus": "up-to-date",
"foreignID": null
}
My question is how do I refer to the alias (i.e. andy987) from within another article ? I have tried:
https://mysubdoamin.vanilladevelopment.com/english/kb/articles/andy987
https://mysubdoamin.vanilladevelopment.com/english/kb/andy987
https://mysubdoamin.vanilladevelopment.com/andy987
and none of these work.
<<<<<<<<<<

_update_by_query fails to update all documents in ElasticSearch

I have over 30 million documents in Elasticsearch (version - 6.3.3), I am trying to add new field to all existing documents and setting the value to 0.
For example: I want to add start field which does not exists previously in Twitter document, and set it's initial value to 0, in all 30 million documents.
In my case I was able to update 4 million only. If I try to check the submitted task with TASK API http://localhost:9200/_task/{taskId}, result from says something like ->
{
"completed": false,
"task": {
"node": "Jsecb8kBSdKLC47Q28O6Pg",
"id": 5968304,
"type": "transport",
"action": "indices:data/write/update/byquery",
"status": {
"total": 34002005,
"updated": 3618000,
"created": 0,
"deleted": 0,
"batches": 3619,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0
},
"description": "update-by-query [Twitter][tweet] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.Twitter.start = 0;', options={}, params={}}",
"start_time_in_millis": 1574677050104,
"running_time_in_nanos": 466805438290,
"cancellable": true,
"headers": {}
}
}
The query I am executing against ES , is something like:
curl -XPOST "http://localhost:9200/_update_by_query?wait_for_completion=false&conflicts=proceed" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.Twitter.start = 0;"
},
"query": {
"exists": {
"field": "Twitter"
}
}
}'
Any suggestions would be great, thanks

Rasa nlu tutorial doesn't work

Rasa NLU version (0.11.3):
Used backend / pipeline (spacy_sklearn):
Operating system (osx):
Issue: I tried to follow the tutorial: https://rasahq.github.io/rasa_nlu/tutorial.html?highlight=project#,
Installed spaCy + sklearn
Created config_spacy.json
Downloaded sample file and train
I've test greeting and goodbye intent and they are work
but when I test with command:
curl -X POST localhost:5000/parse -d '{"q":"I am looking for Mexican food"}' | python -m json.tool
it returns:
{
"intent": {
"name": "None",
"confidence": 1.0
},
"entities": [],
"text": "yes"
}
Content of configuration file (if used & relevant):
{
"project": null,
"fixed_model_name": null,
"config": "config.json",
"data": null,
"emulate": null,
"language": "en",
"log_file": null,
"log_level": "INFO",
"mitie_file": "data/total_word_feature_extractor.dat",
"spacy_model_name": null,
"num_threads": 1,
"max_training_processes": 1,
"path": "/rasa_nlu/projects",
"port": 5000,
"token": null,
"cors_origins": [],
"max_number_of_ngrams": 7,
"pipeline": [],
"response_log": "/rasa_nlu/logs",
"storage": null,
"aws_endpoint_url": null,
"duckling_dimensions": null,
"duckling_http_url": null,
"ner_crf": {
"BILOU_flag": true,
"features": [
[
"low",
"title",
"upper",
"pos",
"pos2"
],
[
"bias",
"low",
"word3",
"word2",
"upper",
"title",
"digit",
"pos",
"pos2",
"pattern"
],
[
"low",
"title",
"upper",
"pos",
"pos2"
]
],
"max_iterations": 50,
"L1_c": 1,
"L2_c": 0.001
},
"intent_classifier_sklearn": {
"C": [
1,
2,
5,
10,
20,
100
],
"kernel": "linear"
}
}
Status:
{
"available_projects": {
"default": {
"status": "ready",
"available_models": [
"fallback"
]
}
}
}
In your config file the pipeline is set to [] but needs to be configured properly. The documentation for the pipeline configuration option can be found here. The available options are discussed here.
The pipeline can either be a pre-configured pipeline like: mitie, spacy_sklearn, or keyword. It can also be a custom pipeline like: ["nlp_spacy", "ner_crf", "ner_synonyms"]. I would recommend setting your pipeline to:
pipeline: "space_sklearn"
Update your configuration file and restart the server. If the server is still running in a console window press Ctrl + c to stop it. Then re-enter the command you used to start it.

GetStats duration and interval parameters, clarifying API documentation for Jelastic API

https://docs.jelastic.com/api/?class=environment.Control&member=GetStats
At the above link in the Jelastic API documentation for the GetStats method there are two parameters duration and interval.
When querying the api i can't figure out how these two parameters interact with each other.
If i query with the below i would expect 100 records at a resolution of 1 minute
/1.0/environment/control/rest/getstats?domain=[myDomiain]&session=[MySession]&duration=6000&interval=60&nodeid=[MyNode]
What i get back is 4 records for each hour so i'm unsure of how the parameters work.
Should i be using GetSumStats?
My final question would be what format are the cpu and mem stats in? MHz and Bytes?
{
"iops_used": 0,
"duration": 3600,
"cpumhz": 3,
"start": "2016-05-03 08:00:00",
"disk": 2141,
"mem": 194840,
"cpu": 12254,
"capacity": 0,
"net": {
"in_int": 703019,
"out_int": 566947,
"in_ext": 46222,
"out_ext": 367209
}
},
{
"iops_used": 0,
"duration": 3600,
"cpumhz": 3,
"start": "2016-05-03 09:00:00",
"disk": 2141,
"mem": 171992,
"cpu": 10076,
"capacity": 0,
"net": {
"in_int": 156703,
"out_int": 314023,
"in_ext": 12627,
"out_ext": 13535
}
},
{
"iops_used": 0,
"duration": 3580,
"cpumhz": 3,
"start": "2016-05-03 10:00:00",
"disk": 2141,
"mem": 172400,
"cpu": 11198,
"capacity": 0,
"net": {
"in_int": 515521,
"out_int": 551317,
"in_ext": 10329,
"out_ext": 17161
}
},
{
"iops_used": 0,
"duration": 3601,
"cpumhz": 3,
"start": "2016-05-03 11:00:00",
"disk": 2141,
"mem": 172610,
"cpu": 10032,
"capacity": 0,
"net": {
"in_int": 153394,
"out_int": 310694,
"in_ext": 10285,
"out_ext": 11210
}
}
#dlearious, for using interval equal 60 you should set duration value to 3600. This is due to the fact that Jelastic keeps detailed data hourly.
Also, you can start from minimal interval = 20.
Jelastic shows cpu in milliseconds and mem in Bytes.

Slow rails performance when POSTing specific JSON

I've been testing my app's performance and found out that it takes 1 second from the time that data was posted to executing first line of action method. I'm testing this on empty rails 4 app (created using rails new app_name) and ruby 1.9.3-p448. I've only added one controller:
class TestController < ApplicationController
skip_before_filter :verify_authenticity_token
def testt
render json: {new: true}
end
end
and a route:
post "api/v1/tt" => "test#testt"
Here's the JSON that I'm posting:
{
"params": {
"updatedBy": "f092d32a-1e38-4f07-8b76-185393138d86",
"data": [
{
"typeName": "test",
"total": 995,
"timeOffset": 13,
"timestamp": 1404323549565,
"hidden": false,
"guid": "9fc91203-e558-43e1-b585-aefbd281c5f5",
"modificationDate": 1404316375054,
"deleted": false
},
{
"typeName": "test",
"total": 995,
"timeOffset": 13,
"timestamp": 1404323549565,
"hidden": false,
"guid": "9fc91203-e558-43e1-b585-aefbd281c5f5",
"modificationDate": 1404316375054,
"deleted": false
},
{
"typeName": "test",
"total": 995,
"timeOffset": 13,
"timestamp": 1404323549565,
"hidden": false,
"guid": "9fc91203-e558-43e1-b585-aefbd281c5f5",
"modificationDate": 1404316375054,
"deleted": false
},
{
"typeName": "test",
"total": 995,
"timeOffset": 13,
"timestamp": 1404323549565,
"hidden": false,
"guid": "9fc91203-e558-43e1-b585-aefbd281c5f5",
"modificationDate": 1404316375054,
"deleted": false
},
{
"typeName": "test",
"total": 995,
"timeOffset": 13,
"timestamp": 1404323549565,
"hidden": false,
"guid": "9fc91203-e558-43e1-b585-aefbd281c5f5",
"modificationDate": 1404316375054,
"deleted": false
}
]
}
}
using this command:
curl -H "Content-Type: application/json" -b cookies -c cookies --request POST "http://localhost/api/v1/tt" --data "#upload.json" -w "#timings-format.txt"
timings-format.txt contains:
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
When I run the command, I get something like this:
{"new":true}
time_namelookup: 0,001
time_connect: 0,001
time_appconnect: 0,000
time_pretransfer: 0,001
time_redirect: 0,000
time_starttransfer: 1,003
----------
time_total: 1,010
If I minify the JSON like this:
{"params":{"updatedBy":"f092d32a-1e38-4f07-8b76-185393138d86","data":[{"typeName":"test","total":995,"timeOffset":13,"timestamp":1404323549565,"hidden":false,"guid":"9fc91203-e558-43e1-b585-aefbd281c5f5","modificationDate":1404316375054,"deleted":false},{"typeName":"test","total":995,"timeOffset":13,"timestamp":1404323549565,"hidden":false,"guid":"9fc91203-e558-43e1-b585-aefbd281c5f5","modificationDate":1404316375054,"deleted":false},{"typeName":"test","total":995,"timeOffset":13,"timestamp":1404323549565,"hidden":false,"guid":"9fc91203-e558-43e1-b585-aefbd281c5f5","modificationDate":1404316375054,"deleted":false},{"typeName":"test","total":995,"timeOffset":13,"timestamp":1404323549565,"hidden":false,"guid":"9fc91203-e558-43e1-b585-aefbd281c5f5","modificationDate":1404316375054,"deleted":false},{"typeName":"test","total":995,"timeOffset":13,"timestamp":1404323549565,"hidden":false,"guid":"9fc91203-e558-43e1-b585-aefbd281c5f5","modificationDate":1404316375054,"deleted":false}]}}
and the run the command again, I get:
{"new":true}
time_namelookup: 0,001
time_connect: 0,002
time_appconnect: 0,000
time_pretransfer: 0,002
time_redirect: 0,000
time_starttransfer: 0,008
----------
time_total: 0,008
Does anyone have an idea of what is going on? I also have larger JSON that is minified but it still takes 1 second to execute empty action...
Your are in development mode and things are just slower by nature on it. Also your are probably using Webrick as development server and it is slow too. Check on real production deployment in production mode with nginx + unicorn or nginx + passenger. Perform several requests, as it needs also to warm up.
And i really would grab rvm and get ruby-2.1.2, the most rock solid MRI Ruby implementation ever
Turns out the problem was in cURL. After implementing the same thing in ruby (just reading file and POSTing the date), I get saner times (cca 20-30ms).
Here is detailed explanation:
https://stackoverflow.com/a/17390776/579843

Resources