Mongodb total rows limit

Mongodb total rows limit - ajax

I have some problem with a Restfull interface of mongoDB.
I have submitted this query --> http://127.0.0.1:28017/db/collection/?limit=0(I used limit = 0 because I want to find all my result with an ajax request),
and the result in terms of number of rows is "total_rows" : 38185.
But if in my shell if I execute db.collection.count() the result was 496519.
Why I have these difference? Is it possible to get the same result with an ajax request?
Thanks in advance for your help.

I'm sure that results were not impacted by numbers of rows nor directly MongoDB, but indicates be Webserver (at time created to tasks admin). It’s possibly to be the payload size of response break by Webserver something like HTTP error 413 (entity to larger).
In my tests i see entries in log as "[websvr] killcursors: found 1 of 1". This will kill opened cursor between the client (in the case web server) and MongoDB. Most drivers not need call OP_KILL_CURSORS because the MongoDB define a timeout of 10 minutes by default.
Go back for tests i conclude that size payload of response of web server (built-in MongoDB) is limited 38~40MB. Let me show my analyze.
I created a collections with 1,260,000 documents. In REST web interface make query that results total_rows: 379,677 (or avgObjSize * total_rows = 38MB).
db.manyrows.stats()
{
"ns" : "forum.manyrows",
"count" : 1260000,
"size" : 125101640,
"avgObjSize" : 99.28701587301587,
"storageSize" : 174735360,
"numExtents" : 12,
"nindexes" : 1,
"lastExtentSize" : 50798592,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 48753488,
"indexSizes" : {
"_id_" : 48753488
},
"ok" : 1
}
======
web output
{"total_rows" : 379677 , "query" : {} , "millis" : 6793}
Continuing... dropped/removed some documents of collection to fit 38MB. Do new query results in all documents thats results 379642 of 379642 or 38MB.
> db.manyrows.stats()
{
"ns" : "forum.manyrows",
"count" : 379678,
"size" : 38172128,
"avgObjSize" : 100.53816128403543,
"storageSize" : 174735360,
"numExtents" : 12,
"nindexes" : 1,
"lastExtentSize" : 50798592,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 12329408,
"indexSizes" : {
"_id_" : 12329408
},
"ok" : 1
}
===
web output
{"total_rows" : 379678 , "query" : {} , "millis" : 27325}
New samples with other collections: Results 39MB with (“avgObjSize": 3440.35 * "total_rows": 11395 = 39MB)
> db.messages.stats()
{
"ns" : "enron.messages",
"count" : 120477,
"size" : 414484160,
"avgObjSize" : 3440.3592386928626,
"storageSize" : 518516736,
"numExtents" : 14,
"nindexes" : 2,
"lastExtentSize" : 140619776,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 436434880,
"indexSizes" : {
"_id_" : 3924480,
"body_text" : 432510400
},
"ok" : 1
}
=== web output:
{
"total_rows" : 11395 ,
"query" : {} ,
"millis" : 2956
}
You can try make query with a microframework like Bottle.

Related

Differentiating _delete_by_query tasks in a multi-tenant index

Scenario:
I have an index with a bunch of multi-tenant data in Elasticsearch 6.x. This data is frequently deleted (via _delete_by_query) and populated by the tenants.
When issuing a _delete_by_query request with wait_for_completion=false, supplying a query JSON to delete a tenants' data, I am able to see generic task information via the _tasks API. Problem is, with a large number of tenants, it is not actively clear who is deleting data at any given time.
My question is this:
Is there a way I can view the query for which the _delete_by_query task is operating on? Or can I attach an additional param to the URL that is cached in the task to differentiate them?
Side note: looking at the docs: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/tasks.html I see there is a description field in the _tasks API response that has the query as a String, however, I do not see that level of detail in my description field:
"description" : "delete-by-query [myindex]"
Thanks in advance

One way to identify queries is to add the X-Opaque-Id HTTP header to your queries:
For instance, when deleting all tenant data for (e.g.) User 3, you can issue the following command:
curl -XPOST -H 'X-Opaque-Id: 3' -H 'Content-type: application/json' http://localhost:9200/my-index/_delete_by_query?wait_for_completion=false -d '{"query":{"term":{"user": 3}}}'
You then get a task ID, and when checking the related task document, you'll be able to identify which task is/was deleting which tenant data thanks to the headers section which contains your HTTP header:
"_source" : {
"completed" : true,
"task" : {
"node" : "DB0GKYZrTt6wuo7d8B8p_w",
"id" : 20314843,
"type" : "transport",
"action" : "indices:data/write/delete/byquery",
"status" : {
"total" : 3,
"updated" : 0,
"created" : 0,
"deleted" : 3,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0
},
"description" : "delete-by-query [deletes]",
"start_time_in_millis" : 1570075424296,
"running_time_in_nanos" : 4020566,
"cancellable" : true,
"headers" : {
"X-Opaque-Id" : "3" <--- user 3
}
},

Firebase Realtime Database: Do I need extra container for my queries?

I am developing an app, where I retrieve data from a firebase realtime database.
One the one hand, I got my objects. There will be around 10000 entries when it is finished. A user can select for each property like "Blütenfarbe" (flower color) 1 (or more) characteristics, where he will then get the result plants, on which these constraints are true. Each property has 2-10 characteristics.
Is querying here powerful enough, to get fast results ? If not, my thought would be to also setup container for each characteristic and put every ID in that, when it is a characteristic of that plant.
This is my first project, so any tip for better structure is welcome. I don't want to create this database and realize afterwards, that it is not well enough structured.
Thanks for your help :)
{
"Pflanzen" : {
"Objekt" : {
"00001" : {
"Belaubung" : "Sommergrün",
"Blütenfarbe" : "Gelb",
"Blütezeit" : "Februar",
"Breite" : 20,
"Duftend" : "Ja",
"Frosthärte" : "Ja",
"Fruchtschmuck" : "Nein",
"Herbstfärbung" : "Gelb",
"Höhe" : 20,
"Pflanzengruppe" : "Laubgehölze",
"Standort" : "Sonnig",
"Umfang" : 10
},
"00002" : {
"Belaubung" : "Sommergrün",
"Blütenfarbe" : "Gelb",
"Blütezeit" : "März",
"Breite" : 25,
"Duftend" : "Nein",
"Frosthärte" : "Ja",
"Fruchtschmuck" : "Nein",
"Herbstfärbung" : "Ja",
"Höhe" : 10,
"Pflanzengruppe" : "Nadelgehölze",
"Standort" : "Schatten",
"Umfang" : 10
},
"Eigenschaften" : {
"Belaubung" : {
"Sommergrün" : [ "00001", "00002" ],
"Wintergrün" : ["..."]
},
"Blütenfarbe" : {
"Braun": ["00002"],
"Blau" : [ "00001" ]
},
}
}
}
}

RepositoryRestResource returns results in a different format than RepositoryRestController

I am facing an issue with using different Spring Controllers.
We are using a standard Spring PagingAndSortingRepository annotated with RepositoryRestResource to serve search responses . One of the methods is
Page<Custodian> findByShortNameContainingIgnoreCase(#Param("shortName") String shortName, Pageable p);
It returns all entities of Custodian that satisfy the conditions grouped in Pages.
The result looks like this:
{
"_embedded" : {
"custodians" : [ {
"shortName" : "fr-custodian",
"name" : "french custodian",
"contact" : "Francoir",
"_links" : {
"self" : {
"href" : "http://127.0.0.1:9000/api/custodians/10004"
},
"custodian" : {
"href" : "http://127.0.0.1:9000/api/custodians/10004"
}
}
} ]
},
"_links" : {
"self" : {
"href" : "http://127.0.0.1:9000/api/custodians/search/findByShortNameContainingIgnoreCase?shortName=fr&page=0&size=3&sort=shortName,asc"
}
},
"page" : {
"size" : 3,
"totalElements" : 1,
"totalPages" : 1,
"number" : 0
}
}
This is the format our frontend expects.
However, we need another query that results in a pretty long function (and thus URL) because it takes multiple parameters.
To be specific, it globally searches for a string in Custodian. So every parameter has the same value.
In order to shorten the URL we created a RepositoryRestController annotated with ResponseBody and implemented a function that takes only one parameter, calls the long URL internally and re-returns the result (a Page).
#RequestMapping(value = "/custodian", method = RequestMethod.GET)
public Page<Custodian> search(#RequestParam(value = "keyWord") String keyWord, Pageable p) {
return repo.LONGURL(keyWord, keyWord, p);
}
Unfortunately, Spring doesn't apply the same format to the result of our function.
It looks like this:
{
"content" : [ {
"id" : 10004,
"shortName" : "fr-custodian",
"name" : "french custodian",
"contact" : "Francoir",
} ],
"pageable" : {
"sort" : {
"sorted" : true,
"unsorted" : false
},
"offset" : 0,
"pageSize" : 3,
"pageNumber" : 0,
"unpaged" : false,
"paged" : true
},
"totalElements" : 3,
"totalPages" : 1,
"last" : true,
"size" : 3,
"number" : 0,
"sort" : {
"sorted" : true,
"unsorted" : false
},
"numberOfElements" : 3,
"first" : true
}
How do you get Spring to deliver the same format in our custom method?

Elasticsearch query builder `IN` query

I am trying to replicate a in SQL query in elastic search... something similar to select * from products where id in ('123, '345');
I have something like the below -
Set<String> searchIds = new HashSet<String>();
searchIds.add("123");
searchIds.add("456");
response = client.prepareSearch("id_index")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termsQuery("product_id", searchIds))
.setFrom(0).setSize(50).setExplain(false)
.execute().actionGet();
hits = response.getHits();
hits.forEach((h) -> {
h.getSource().entrySet().stream().forEach((e)->{
System.out.println(e.getKey()+" : "+String.valueOf(e.getValue()).replace("\n", "").replace("\t", ""));
});
});
System.out.println("Response : "+response.toString());
The response I am getting from the last System.out.println is as follows,
Response : {
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
I know for sure there are two products with product id's 123 and 345 as I was able to see them via Kibana. But the above code returns no hits. Am I replicating the SQL IN query in the right way?

Slow (high latency) upon queries with sort mongoDB

I am currently migrating from couchdb to mongodb, still learning this stuff and have a problem anytime I do some queries with sorting in my webpage.
I am using codeigniter as the framework and using alexbilbie lib for mongodb php library.
So, here is my problem:
I intend to do queries from sensors in a room that updated each second (thus, already saved thousands docs in collection) and to get each latest sensor value I use this query from model:
function mongoGetDocLatestDoc($sensorid){
$doc = $this->mongo_db->where(array('SensorId'=>$sensorid))->limit(1)->order_by(array('_id'=>'DESC'))->get('my_mongo');
return $doc;
}
if I called this with my controller, it took a lot of time to process the query and even worse if I change the sort by timestamp. and it is double the latency each time I called this again for the second sensor, let alone I have more than 10 sensors that need this query in the same page. Am I doing it wrong or there is some more efficient way to get the latest data from collection?
edit:
#Sammaye : I tried making an index based on your suggestion and here is the explain generated after I executed the query:
"cursor" : "BtreeCursor timestamp_desc",
"nscanned" : 326678,
"nscannedObjects" : 326678,
"n" : 50,
"millis" : 4402,
"nYields" : 7,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
}
as per comparison, this explain the first query without using index (that executed faster) :
"cursor" : "BasicCursor",
"nscanned" : 385517,
"nscannedObjects" : 385517,
"n" : 50,
"millis" : 1138,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}

Okay thank you all for your great response :) Here is my finding after trying with indexes:
Update the mongodb to the latest version. I found this slightly improve my query. I am updating from default 2.0.3 that provided as default in ubuntu 12.04.02 to mongodb version 2.4.3
Build indexes/compound indexes based exactly on how you mostly needed in your query. as an example, in my question, my query based on SensorId and _id: DESCENDING so the best strategy to optimize my query would be something like :
db.your_collection.ensureIndex({ "SourceId" : 1, "_id" : -1 },{ "name" : "sourceid_idx", "background" : true });
or in other case if I needed it based on timestamp:
db.your_collection.ensureIndex({ "SourceId" : 1, "timestamp" : -1 },{ "name" : "source_idx", "background" : true });
I found a very good explanation about mongodb indexes here
Hope this will help other people that stumbled upon similar problem...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Mongodb total rows limit - ajax

Related

Differentiating _delete_by_query tasks in a multi-tenant index

Firebase Realtime Database: Do I need extra container for my queries?

RepositoryRestResource returns results in a different format than RepositoryRestController

Elasticsearch query builder `IN` query

Slow (high latency) upon queries with sort mongoDB

Categories

Resources