Global secondary index: Number of projected attributes in all indexes exceeds limit of 20 - ruby

I'm trying to create a GSI on a table with 30 columns (using ruby SDK). I use the projection_type: 'ALL' - but I still get the following exception:
Aws::DynamoDB::Errors::ValidationException: One or more parameter values were invalid: Number of projected attributes in all indexes exceeds limit of 20, number of projected attributes:30
As far as I read, this should only happen when using the INCLUDE projection_type:
This limit does not apply for secondary indexes with a ProjectionType of KEYS_ONLY or ALL.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-secondary-indexes
The create statement looks something like:
connection.update_table({
table_name: "my-table", # required
attribute_definitions: [
{
attribute_name: "indexDate",
attribute_type: "S",
},
{
attribute_name: "createdAt",
attribute_type: "S",
},
],
global_secondary_index_updates: [
{
create: {
index_name: "my-new-index", # required
key_schema: [
{
attribute_name: "indexDate",
key_type: "HASH",
},
{
attribute_name: "createdAt",
key_type: "RANGE",
},
],
projection: { # required
projection_type: "ALL"
},
provisioned_throughput: { # required
read_capacity_units: 10, # required
write_capacity_units: 300, # required
}
}
}
]
})

Turned out that the attribute limit restriction goes across all GSI's on the table. I had another one that caused this one to fail. Deleted that one, and then it worked.

Related

Sorting ElasticSearch query by multiple fields

I have some data that I'm trying to sort in a very specific order.
I've looked over a few questions here on SO and Elasticsearch sort on multiple queries was pretty helpful. From what I can tell I'm getting the data back in the correct order but it's not always the same data and appears to be very random as to what is returned from the query.
My question is, how do I get my data sorted correctly and get the expected data each time?
Example Data
[
{
id: 00,
...
current_outage: {
device_id: 00,
....
},
forecasted_outages: [
{
device_id: 00
}
]
},
{
id: 01,
...
current_outage: {
device_id: 01,
....
},
forecasted_outages: []
},
{
id: 02,
...
current_outage: null,
forecasted_outages: [
{
device_id: 02
}
]
},
{
id: 03,
...
current_outage: null,
forecasted_outages: []
},
]
Current Query
bool: {
should: [
{
constant_score: {
boost: 6,
filter: {
nested: {
path: 'current_outage',
query: {
exists: {
field: 'current_outage'
}
}
}
}
}
},
{
nested: {
path: 'forecasted_outages',
query: {
exists: {
field: 'forecasted_outages'
}
}
}
}
]
}
Just to reiterate, the above query returns the data in the format/sorted method I expect but it does NOT return the data that I expect each time. The returned data is very random as far as I can tell.
Sort Criteria:
First: Data with both current_outage and one or more forecasted_outages
Second: Data with only current_outage
Third: Data with only forecasted_outages
Edit
The data returning can be anything from zero to thousands of results depending on a user. The user has an option to paginate the data or return all of their relevant data.
Edit 2
The data returned will be anywhere from zero to 1,000 hits.
If the search hits is more than 10 (default result size) and all documents have same score (in your case it could be as you are provided constant score), then the data returned could be different for each run (giving randomness feeling).
The reason for this is, the search results are merged from different shards till the hit count reaches 10 and rest of the results are ignored. So every run can have different result based on the shards merged.
Increasing the result size to include all the search result can provide same data for every run.
UPDATE
Changing the Shard count to 1 might help (you have close and reopen the index if the index is already created).
PUT /twitter/_settings
{
"index" : {
"number_of_shards" : 1
}
}

How to access celldata objects in sheets api

I'm working on a google sheets integration project where I'd like to add formatted text to cells (bold, italic). This needs to be for only part of the cell (e.g. only some of the text in the cell is bold ) I can see that this can be done though the CellData object, documented in the sheets api here:
CellData
But I can't work out how to get an instance of these objects. I'm using the sheets service to successfully get a SpreadSheet, Sheet and ValueRange objects, but I can't work out how to get through to the cell data objects themselves to use these methods.
When a part of value of a cell has several formats, you want to retrieve the formats.
You want to put a value with several formats to a cell.
I understand your question as above. If my understanding is correct, how about these samples?
1. Retrieve value
When a part of value of a cell has several formats like below image,
the script for retrieving the values with the formats is as follows.
Sample script:
This sample script retrieves the value from the cell "A1" of "Sheet1".
spreadsheet_id = '### spreadsheet ID ###'
ranges = ['Sheet1!A1']
fields = 'sheets(data(rowData(values(textFormatRuns,userEnteredValue))))'
response = service.get_spreadsheet(spreadsheet_id, ranges: ranges, fields: fields)
Result:
{
"sheets": [
{
"data": [
{
"rowData": [
{
"values": [
{
"userEnteredValue": {
"stringValue": "abcdefg"
},
"textFormatRuns": [
{
"format": {}
},
{
"format": {
"fontSize": 24,
"foregroundColor": {
"red": 1
},
"bold": true
},
"startIndex": 2
},
{
"format": {},
"startIndex": 5
}
]
}
]
}
]
}
]
}
]
}
2. Put value
When a value with several formats is put to a cell, the script is as follows.
Sample script:
This sample script puts the value to the cell "B1" of "Sheet1". As a sample, update_cells is used for this situation.
spreadsheet_id = '### spreadsheet ID ###'
requests = {requests: [
update_cells: {
fields: 'userEnteredValue,textFormatRuns',
range: {sheet_id: 0, start_row_index: 0, end_row_index: 1, start_column_index: 1, end_column_index: 2},
rows: [{values: [{user_entered_value: {
string_value: 'abcdefg'},
text_format_runs: [{format: {}}, {format: {font_size: 24, foreground_color: {red: 1}, bold: true}, start_index: 2}, {format:{}, start_index: 5}]
}]}]
}
]}
response = service.batch_update_spreadsheet(spreadsheet_id, requests, {})
About sheet_id: 0, if you want to other sheet, please modify it.
Result:
Note:
These sample scripts supposes that your environment can use Sheets API.
These are simple samples. So please modify them to your situation.
References:
spreadsheets.get
spreadsheets.batchUpdate
textFormatRuns
updateCells

Filter with complex key not work (using startkey and endkey)

I create a view with Map function:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.logTime,doc.dbName,doc.tableName], 1);
}
}
I want to filter the data with multi-keys:
_design/select_data/_view/new-view/?limit=10&skip=0&include_docs=false&reduce=false&descending=true&startkey=["2018-06-19T09:16:47,527","stage"]&endkey=["2018-06-19T09:16:43,717","stage"]
but I still got:
{
"total_rows": 248133,
"offset": 248129,
"rows": [
{
"id": "01CGBPYVXVD88FPDVR3NP50VJW",
"key": [
"2018-06-19T09:16:47,527",
"ods",
"o_ad_dsp_pvlog_realtime"
],
"value": 1
},
{
"id": "01CGBQ6JMEBR8KBMB8T7Q7CZY3",
"key": [
"2018-06-19T09:16:44,824",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ4BKT8S2VDMT2RGH1FQ71",
"key": [
"2018-06-19T09:16:44,707",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ18CBHQX3F28649YH66B9",
"key": [
"2018-06-19T09:16:43,717",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
}
]
}
the key "ods" should not in the results.
What did I do wrong?
Your query is not multi-key .. ist start and endkey.
if you want to have results by dbname in a special time range.. you need to change the emit to [doc.dbName,doc.logTime,doc.tableName]
then you query startkey=["stage","2018-06-19T09:16:43,717"]&endkey=["stage","2018-06-19T09:16:47,527"]
(btw. are you sure that your timestamp is in the right order ? In your example the second TS is larger than the first..)
As you have chosen a full date/time stamp as the first level of your key, down to millisecond precision, there are unlikely to be any repeating values in the first level of your compound key. If you indexed just the date, say, as the first key, your date would be grouped by date, dbame and table name in a more predictable way
e.g.
["2018-06-19","ods","o_ad_dsp_pvlog_realtime"]
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"]
["2018-06-19",stage","s_ad_ztc_realpv_base_indirect"
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"
With this key structure, the hierarchical grouping of keys works in your favour i.e. all the data from "2018-06-19" is together in the index, with all the data matching ["2018-06-19","stage"] adjacent to each other.
If you need to get to millisecond precision, you could index the data as follows:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.dbName,doc.logTime], 1);
}
}
This would create index organised by dbName, but with a secondary sort on time. You can then extract the data for specified dbName between two timestamps.

how to groupBY using spring data

hi i'm using spring data in My project and I'm trying group by two fields, heres the request:
#Query( "SELECT obj from Agence obj GROUP BY obj.secteur.nomSecteur,obj.nomAgence" )
Iterable<Agence> getSecteurAgenceByPc();
but it doesnt work for me..what i want is this result:
-Safi
-CTM
CZC1448YZN
2UA13817KT
-Rabat
-CTM
CZC1349G1B
2UA0490SVR
-Agdal
G3M4NOJ
-Essaouira
-CTM
CZC1221B85
-Gare Routiere Municipale
CZC145YL3
What I get is
{
"status": 0,
"data":
[
{
"secteur": "Safi",
"agence": "CTM"
},
{
"secteur": "Safi",
"agence": "Dep"
},
{
"secteur": "Rabat",
"agence": "Agdal"
},
{
"secteur": "Rabat",
"agence": "CTM"
},
{
"secteur": "Essaouira",
"agence": "CTM"
},
{
"secteur": "Essaouira",
"agence": "Gare Routiere Municipale"
}
]
}
What you want is not possible with JPQL.
What does Group By do?
It combines all rows that are identical in the columns in the group by clause in to one row. Since it combines multiple rows into one, data in other columns can only be present in some combined fashion. For example, you can include MIN/MAX or AVG values, but never the orginal values.
Also the result with always be a table, never a tree.
Also note: there is no duplicated data. Every combination of secteur and agence appears exactly once.
If you want a tree structure, you have to write some java code for that.

How can I add heterogeneous data to Elasticsearch?

I am trying to add heterogenous data (i.e. of different "types") to Elasticsearch. Each (top-level) object contains a user's settings for an application. A simplified example is:
{
'name':'test',
'settings': [
{
'key':'color',
'value':'blue'
},
{
'key':'isTestingMode',
'value':true
},
{
'visibleColumns',
'value': [
'column1',
'column3',
'column4',
]
},
...
...
}
When I try to add this, the POST fails with an MapperParsingException. Searching around, it seems like this is because the 'value' field has different types.
Is there any way to just store arbitrary data like this?
This is not possible.
Mapping is per field and mapping is not array aware.
This means that you can keep settings.value as string or array but not both.
An easy tweak would be to define all value as array -
{
'name':'test',
'settings': [
{
'key':'color',
'value': [ 'blue' ]
},
{
'key':'isTestingMode',
'value': [ true ]
},
{
'visibleColumns',
'value': [
'column1',
'column3',
'column4',
]
},
...
...
}
If that is not acceptable , then another idea would be to apply source transform which will do this normalization to the settings.value field before it is indexed. This way , the source is kept as it is AND you will get what you want.

Resources