SonarQube Component Tree response data - sonarqube

I'm having some trouble understanding some of the data in the response from the SonarQube GET api/measures/component_tree API.
Some metrics have a value attribute while others don't. I've figured out that the value displayed in the UI is the "value" unless it does not exist, then the value at the earliest period is used. The other periods are then basically deltas between measurements. Would anyone be able to provide some details around what the response values actually mean? Unfortunately, the actual API documentation that SonarQube provides doesn't give any detail around response data. Specifically, I'm wondering when a value attribute would and would not be there, what the index means since not all have the same indexes (ie. some go 1-4, others have just 3,4), and what the period data represents.
{
"metric": "new_lines_to_cover",
"periods": [
{
"index": 1,
"value": "572"
},
{
"index": 2,
"value": "572"
},
{
"index": 3,
"value": "8206"
},
{
"index": 4,
"value": "186574"
}
]
},
{
"metric": "duplicated_lines",
"value": "80819",
"periods": [
{
"index": 1,
"value": "-158"
},
{
"index": 2,
"value": "-158"
},
{
"index": 3,
"value": "-10544"
},
{
"index": 4,
"value": "-6871"
}
]
},
{
"metric": "new_line_coverage",
"periods": [
{
"index": 3,
"value": "3.9900249376558605"
},
{
"index": 4,
"value": "17.221615720524017"
}
]
},

The heuristic is very close from the truth:
if the metric starts with "new_", it means it's a metric that compute new elements on a period of time. Starting with 6.3, only the leak period is supported
otherwise, the "value" represents the raw value.
For example, to compute the number of issues:
violations computes the total number of issues
new_violations computes the number of new issues on the leak period
To know more about the leak period concept in SonarQube, please check this article.

Related

Indexing In ElasticSearch For Auditing

There is a microservice-based architecture wherein each service has a different type of entity. For example:
Service-1:
{
"entity_type": "SKU",
"sku": "123",
"ext_sku": "201",
"store": "1",
"product": "abc",
"timestamp": 1564484862000
}
Service-2:
{
"entity_type": "PRODUCT",
"product": "abc",
"parent": "xyz",
"description": "curd",
"unit_of_measure": "gm",
"quantity": "200",
"timestamp": 1564484863000
}
Service-3:
{
"entity_type": "PRICE",
"meta": {
"store": "1",
"sku": "123"
},
"price": "200",
"currency": "INR",
"timestamp": 1564484962000
}
Service-4:
{
"entity_type": "INVENTORY",
"meta": {
"store": "1",
"sku": "123"
},
"in_stock": true,
"inventory": 10,
"timestamp": 1564484864000
}
I want to write an Audit Service backed by elasticsearch, which will ingest all these entities and it will index based on entity_type, store, sku, timestamp.
Will elasticsearch be a good choice here? Also, how will the indexing work? So, for example, if I search for store=1, it should return all the different entities that have store as 1. Secondly, will I be able to get all the entities between 2 timestamps?
Will ES and Kibana (to visualize) be good choices here?
Yes. Your use case is pretty much exactly what is described in the docs under filter context:
In filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Does this timestamp fall into the range 2015 to 2016?
Is the status field set to published?

Is this expected Query Performance from CosmosDB for "between" queries on an integer property

I have a cosmosdb collection (sql api) that I've populated with documents representing CIDR Network Ranges.
The relevant part of each document is
{
"Network": "31.216.102.0/23",
"IPRangeStart": 534275584,
"IPRangeEnd": 534276095,
Each CIDR block has it's start and end IP addresses converted to uint and stored in hte RangeStart and RangeEnd properties.
When I run a query to search for a specific entry by it's start range, it works as expected and is quite fast.
SELECT top 1 * FROM c WHERE c.IPRangeStart = 532361216
Request Charge: 3.02 RUs
However when I introduce a between query using <= / => operators, it gets VERY expensive.
SELECT top 1 * FROM c WHERE c.IPRangeStart <= 534275590 AND c.IPRangeEnd >= 534275590
Request Change: 1647.99 RUs
I've reviewed the index setup on the collection
I've also applied 2 additional integer range indices on the collection for the two specific properties in question. Though there doesn't appear to be a way to check for progress of these indices being applied/created in the background.
Is there something obvious that I might be missing.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
},
{
"path": "/IPRangeStart/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
},
{
"path": "/IPRangEnd/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
],
"excludedPaths": []
}
Think I solved it. The problem stemmed from the fact that I had a greater than query on one property and a less than query on a different property.
It appears that cosmos was merging the full set of documents that satisfied each independent filter clause.
Since the largest CIDR range in the set was a /18 (16k address block) was able to get it working by saying.
Where start <= value
And start >= value-32786
And end >= value
And end <= value+32768

Elastic query group by

I've started the process of learning ElasticSearch and I was wondering if somebody could help me shortcut the process by providing some examples of how I would a build couple of queries.
Here's my example schema...
PUT /sales/_mapping
{
"sale": {
"properties": {
"productCode: {"type":"string"},
"productTitle": {"type": "string"},
"quantity" : {"type": "integer"},
"unitPrice" : {"type": double}
}
}
}
POST /sales/1
{"productCode": "A", "productTitle": "Widget", "quantity" : 5, "unitPrice":
5.50}
POST /sales/2
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 10, "unitPrice": 1.10}
POST /sales/3
{"productCode": "C", "productTitle": "Spanner", "quantity" : 5, "unitPrice":
9.00}
POST /sales/4
{"productCode": "A", "productTitle": "Widget", "quantity" : 15, "unitPrice":
5.40}
POST /sales/5
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 20, "unitPrice":
1.00}
POST /sales/6
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 30, "unitPrice":
0.90}
POST /sales/7
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 40, "unitPrice":
0.80}
POST /sales/8
{"productCode": "C", "productTitle": "Spanner", "quantity" : 100,
"unitPrice": 7.50}
POST /sales/9
{"productCode": "C", "productTitle": "Spanner", "quantity" : 200,
"unitPrice": 5.50}
What query would I need to generate the following results?
a). Show the show the number of documents grouped by product code
Product code Title Count
A Widget 2
B Gizmo 4
C Spanner 3
b). Show the total units sold by product code, i.e.
Product code Title Total units sold
A Widget 20
B Gizmo 100
C Spanner 305
TIA
You can accomplish that using aggregations, in particular Terms Aggregations. And it can be done in just one run, by including them within your query structure; in order to instruct ES to generate analytic data based in aggregations, you need to include the aggregations object (or aggs), and specify within it the type of aggregations you would like ES to run upon your data.
{
"query": {
"match_all": {}
},
"aggs": {
"group_by_product": {
"terms": {
"field": "productCode"
},
"aggs": {
"units_sold": {
"sum": {
"field": "quantity"
}
}
}
}
}
}
By running that query, besides the resulting hits from your search (in this case we are doing a match all), and additional object will be included, within the response object, holding the corresponding resulting aggregations. For example
{
...
"hits": {
"total": 6,
"max_score": 1,
"hits": [ ... ]
},
"aggregations": {
"group_by_product": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "b",
"doc_count": 3,
"units_sold": {
"value": 60
}
},
{
"key": "a",
"doc_count": 2,
"units_sold": {
"value": 20
}
},
{
"key": "c",
"doc_count": 1,
"units_sold": {
"value": 5
}
}
]
}
}
}
I omitted some details from the response object for brevity, and to highlight the important part, which is within the aggregations object. You can see how the aggregated data consists of different buckets, each representing the distinct product types (identified by the key key) that were found within your documents, doc_count has the number of occurrences per product type, and the unit_sold object, holds the total sum of units sold per each of the product types.
One important thing to keep into consideration is that in order to perform aggregations on string or text fields, you need to enable the fielddata setting within your field mapping, as that setting is disabled by default on all text based fields. In order to update the mapping, for ex. of the product code field, you just need to to a PUT request to the corresponding mapping type within the index, for example
PUT http://localhost:9200/sales/sale/_mapping
{
"properties": {
"productCode": {
"type": "string",
"fielddata": true
}
}
}
(more info about the fielddata setting)

Elasticsearch groovy script not working as expected

My partial mapping of an index listing elasticsearch 2.5 (I know I have to upgrade to newer version and start using painless, let's keep that aside for this question)
"name": { "type": "string" },
"los": {
"type": "nested",
"dynamic": "strict",
"properties": {
"start": { "type": "date", "format": "yyyy-MM" },
"max": { "type": "integer" },
"min": { "type": "integer" }
}
}
I have only one document in my storage and that is as follows:
{
"name": 'foobar',
"los": [{
"max": 12,
"start": "2018-02",
"min": 1
},
{
"max": 8,
"start": "2018-03",
"min": 3
},
{
"max": 10,
"start": "2018-04",
"min": 2
},
{
"max": 12,
"start": "2018-05",
"min": 1
}
]
}
I have a a groovy script in my elastic search query as follows:
los_map = [doc['los.start'], doc['los.max'], doc['los.min']].transpose()
return los_map.size()
This groovy query ALWAYS returns 0, which is not possible, as I have one document, as mentioned above (even if I add multiple documents, it still returns 0) and los field is guaranteed to be present in every doc with multiple objects in it. So it seems the transpose which I am doing is not working correctly?
I also tried changing this line los_map = [doc['los.start'], doc['los.max'], doc['los.min']].transpose() to los_map = [doc['los'].start, doc['los'].max, doc['los'].min].transpose() then I get this error "No field found for [los] in mapping with types [listing]"
Does anyone have any idea how to get the transpose work?
By the way, if you are curious, my complete script is as follows:
losMinMap = [:]
losMaxMap = [:]
los_map = [doc['los.start'], doc['los.max'], doc['los.min']].transpose()
los_map.each {st, mx, mn ->
losMinMap[st] = mn
losMaxMap[st] = mx
}
return los_map['2018-05']
Thank you in advance.

elasticsearch - nested document sort/score with typical e-commerce data

I am trying to move our ecommerce search system to elastic search. We have a bunch of products and each product can have multiple offers (sold by merchants). Roughly the format of the document is
{
"productId": 1234,
"title": "Apple Macbook Pro",
"description": "Macbook Pro ModelNo:ABC 2.4GHz 8GB RAM",
"offers": [
{
"offer_id": "123",
"offer_seller": "ebay"
"offer_price": 900
"condition": "refurb"
"times_bought": 25,
},
{
"offer_id": "124",
"offer_seller": "amazon"
"offer_price": 1200,
"condition": "new",
"times_bought": 35,
},
{
"offer_id": "125",
"offer_seller": "bestbuy"
"offer_price": 1400
"condition": "new",
"times_bought": 10,
}
]
}
{
"productId": 1235,
"title": "Apple Macbook Air",
"description": "Macbook Air ModelNo:ABC 1.2GHz 4GB RAM",
"offers": [
{
"offer_id": "123",
"offer_seller": "ebay"
"offer_price": 600
"condition": "refurb"
"times_bought": 50,
},
{
"offer_id": "124",
"offer_seller": "amazon"
"offer_price": 999,
"condition": "new",
"times_bought": 55,
},
{
"offer_id": "125",
"offer_seller": "bestbuy"
"offer_price": 1100
"condition": "new",
"times_bought": 20,
}
]
}
Some more facts:
The offers are updated at a higher rate than products.
There are 50 offers on an average for every product.
Here is the query I have
{
"query" : {
"function_score": {
"boost_mode": "replace",
"multi_match": {
"query": "macbook",
"fields": [
"title^10",
"description^5"
]
},
"script_score": {
"params": {
"param1": 2,
"param2": 3.1
},
"script": "_score * doc['offers.times_bought'].value / pow(param1, param2)"
}
}
}
}
My Questions
1. I went with nested type for offers as I would like to use the offer_price to sort the products. I read that parent/child doesnt support sorting, but the fact every update to a offer will reindex the whole product makes me wonder whether parent/child is a better choice.
2. I would like to show the best(1 or 2) offers for each products that are returned. Is there a way to sort the nested documents for each returned result or should i do that by myself ?
3. If I want to store the 'times_bought' outside the index as it gets updated more frequently than anything else in the index. how do i plug it into ranking ? Can I extend elastic search scoring classes and modify it use this external data structure ?
Any suggestions/recommendation would be appreciated.
How about:
Doing the 'best offer' aggregation in your indexing code. Which would mean you store it in a KV-store (redis, couchbase, whatever) and repopulate every time an offer for a particular product changes. You probably have the data available anyhow to do this.
That way you just index _price to refer to the price of the best offer.
THis correctly returns the products in the right order.
Lastly (after ES has returned the products in order) you do a call on product-id(s) to your kv-store to fetch the entire top (1 or 2) offers for each of the returned products.
This combination of ES and a KV-store may seem more trouble than it's worth, but trust me it works wonders to keep complexity down in the end.
That answers 1 and 2.
As for 3:
You could probably model that as a parent/child, which would allow for indexing with separate intervals, performance trade-off but I'm not really sure to be honest.
hth a bit

Resources