I am working on log managment system where user will be able to upload logs from file. I have 'event' collection where I store all events from all sources (each source can have different log format and in one collection I can have e.g. 10 000 000 records - 5 000 000 for 'source1' and 5 000 000 for 'source2'). I want to provide filter option to the user (filter option will be only for source so user can filter data by level, data etc) and for better performance I want to create indexes and also compound indexes. Before user upload logs to the system he/she will decide what filter queries wants to use during filtering. So I can have different queries. The problem is that I can have many different sources in one collection and that mean many different filter queries but mongoDB only allow 64 indexes per collection.
So what is the best solution if I want have good read performance and I want let user to filter logs (the user will decide how he wants to filter data before logs will be uploaded to the sytem)? I was thinking to create new collection for each source as I will never reach 64 indexes per collection.
Queries sample:
db.events.ensureIndex({"source_id": 1, "timestamp" : 1})
db.events.ensureIndex({"source_id": 1, "timestamp" : 1, "level": 1})
db.events.ensureIndex({"source_id": 1, "diagnostic_context": 1})
db.events.ensureIndex({"source_id": 1, "timestamp" : 1, "statusCode": 1})
db.events.ensureIndex({"source_id": 1, "host" : 1})
Event collection sample:
{ _id: ObjectId("507f1f77bcf86cd799439011"),
timestamp: ISODate("2012-09-27T03:42:10Z"),
thread: "[http-8080-3]",
level: "INFO",
diagnostic_context: "User 99999",
message: "existing customer saving"},
source_id: "source1"
{ _id: ObjectId("507f1f77bcf86cd799439012"),
host: "144.18.39.44",
timestamp: ISODate("2012-09-01T03:42:10Z"),
request: "GET /resources.html HTTP/1.1",
statusCode: 200
bytes_sent: 3458,
url: "http://www.aivosto.com/",
agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
source_id: "source2"
}
Related
I have been building a simple application that uses Redis as cache to store data regarding a game where each user has a score and after a user completes a task the score is updated for the user.
My problem is when a user completes a task his score is updated which means that it will update the record in redis by replacing the previous value with the new one (in my case it will replace the entire room object with the new one even though the room has not changed but only the score of the player inside the room has changed).
The thing is if multiple users complete a task at the same time they will send each at the same time the new record to redis and only the last one will receive the update.
For example:
In the redis cache this is the starting value: { roomId: "...", score:[{ "player1": 0 }, { "player2": 0 }] }
Player 1 completes a task and sends:
{ roomId: "...", score:[{ "player1": 1 }, { "player2": 0 }] }
At the same time Player 2 completes a task and sends:
{ roomId: "...", score:[{ "player1": 0 }, { "player2": 1 }] }
In the redis cache first it will be saved the value received from Player1 let's say and then the value from player 2 which means that the new value in the cache will be:
{ roomId: "...", score:[{ "player1": 0 }, { "player2": 1 }] }
Even though this is wrong because the correct value would be: { roomId: "...", score:[{ "player1": 1 }, { "player2": 1 }] } where both changes are present.
At the moment I am also using a pub/sub system to keep track of changes so that does are reflected to every server and each user connected to the server.
What can I do to fix this? For reference consider the following image as the architecture of the system:
The issue appears to be that you are interleaving one read/write set of operation with others, which leads to using stale data while updating keys. Fortunately, the fix is (relatively) easy: just combine your read/write chunk of operations into a single atomic unit, using either a Lua script, a transaction or, even easier, through a single RedisJSON command.
Here is an example using RedisJSON. Prepare your JSON key/document which will hold all the scores for the room first, using the JSON.SET command:
> JSON.SET room:foo $ '{ "roomId": "foo", "score": [] }'
OK
After that, use the JSON.ARRAPPEND command once you need to append an item to the score array:
> JSON.ARRAPPEND room:foo $.score '{ "player1": 123 }'
1
...
> JSON.ARRAPPEND room:foo $.score '{ "player2": 456 }'
2
Getting back the whole JSON document is as easy as running:
> JSON.GET room:foo
"{\"roomId\":\"foo\",\"score\":[{\"player1\":123},{\"player2\":456}]}"
I am using below API and listing 200 files per page.
https://slack.com/api/files.list?count=200&page={{pageNumber}}
I have 60000 files in my slack account. So on first API call received 200 files with pagination response like below.
"paging": {
"count": 200,
"total": 60000,
"page": 1,
"pages": 300
}
We continue fetching files with increasing page number in API query parameter like 2,3,4,.......
https://slack.com/api/files.list?count=200&page=2
"paging": {
"count": 200,
"total": 60000,
"page": 2,
"pages": 300
}
When we reached page number 101 the page parameter in paging response becomes 1 with warning max_page_limit. Can't we list all files with same pagination fashion? or Slack file list API allows us to list files till page 100 only? We didn't find anything in Slack documentation for this use case. Any help regarding this issue will be much appreciated.
https://slack.com/api/files.list?count=200&page=101
"paging": {
"count": 200,
"total": 60000,
"page": 1,
"pages": 300,
"warnings": [
"max_page_limit"
]
}
Here is what I got reply from slack forum.
There is indeed a page limit of 100 pages on files.list. I've contacted the documentation team to add this detail to the documentation for the method. You should be able to get your 60000 files with a highter count of 600 though.
There are other ways to filter down the expected number of results. For example, you could specify a time period for file creation date using the ts_from and ts_to arguments and do batches of calls within specified time periods, or batch your searches by channel by passing the channel argument. These techniques should always allow you to keep a batch within 100,000 files, as 1000 would be the max accepted limit.
I am using Mindmeld blueprint application (kwik_e_mart) to understand how the Question Answerer retrieves data from relevant knowledge base data file (newbie to Mindmeld, OOP and Elasticsearch).
See code snippet below:
from mindmeld.components import QuestionAnswerer
config = {"model_type": "keyword"}
qa = QuestionAnswerer(app_path='kwik_e_mart', config=config)
qa.load_kb(app_namespace='kwik_e_mart', index_name='stores',
data_file='kwik_e_mart/data/stores.json', app_path='kwik_e_mart', config=config, clean = True)
Output - Loading Elasticsearch index stores: 100%|██████████| 25/25 [00:00<00:00, 495.28it/s]
Output -Loaded 25 documents
Although Elasticsearch is able to load all 25 documents (see output above), unable to retrieve any data with index greater than 9.
stores = qa.get(index='stores')
stores[0]
Output: - {'address': '23 Elm Street, Suite 800, Springfield, OR, 97077',
'store_name': '23 Elm Street',
'open_time': '7:00',
'location': {'lon': -123.022029, 'lat': 44.046236},
'phone_number': '541-555-1100',
'id': '1',
'close_time': '19:00',
'_score': 1.0}
However, stores [10] gives an error
`stores[10]`
Output: - IndexError Traceback (most recent call last)
<ipython-input-12-08132a2cd460> in <module>
----> 1 stores[10]
IndexError: list index out of range
Not sure why documents at index higher than 9 are unreachable. My understanding is that the elasticsearch index is still pointing to remote blueprint data (http/middmeld/blueprint...) and not pointing to the folder locally.
Not sure how to resolve this. Any help is much appreciated.
By default, the get() method only returns 10 records per search - so only stores[0] through stores[9] will be valid.
You can add the size= option to your get() to increase the number of records it returns:
stores = qa.get(index='stores', size=25)
See the bottom of this section for more info.
is it possible to disable your cache system?
I have an error when I have different object in my Edit page
for example I have this as my list in my API:
domain.com/api/products
list = [
{id: 1 , value: 'foo'},
{id: 2 , value: 'bar'},
]
and this for my single object:
domain.com/api/products/1
item = {id: 1 , value: 'foo' , user: 'baz'}
it causes an error in edit page since your system is using old data in list before rest API response and we don't have user data on the list
so I want to disable the cache system if its possible and just load the api result each time
I have imported test results according to https://docs.sonarqube.org/display/SONAR/Generic+Test+Data into SonarQube 6.2.
I can look at the detailed test results in sonar by navigating to the test file and then by clicking menu "Show Measures". The opened page then shows me the correct total number of tests 293 of which 31 failed. The test result details section however only shows 100 test results.
This page seems to get its data through a request like: http://localhost:9000/api/tests/list?testFileId=AVpC5Jod-2ky3xCh908m
with a result of:
{
paging: {
pageIndex: 1,
pageSize: 100,
total: 293
},
tests: [
{
id: "AVpDK1X_-2ky3xCh91QQ",
name: "GuiButton:Type Checks->disabledBackgroundColor",
fileId: "AVpC5Jod-2ky3xCh908m",
fileKey: "org.sonarqube:Scripting-Tests-Publishing:dummytests/ScriptingEngine.Objects.GuiButtonTest.js",
fileName: "dummytests/ScriptingEngine.Objects.GuiButtonTest.js",
status: "OK",
durationInMs: 8
...
}
From this I gather that the page size is set to 100 in the backend. Is there a way to increase it so that I can see all test results?
You can certainly call the web service with a larger page size parameter value, but you cannot change the page size requested by the UI