What does "Lines should have sufficient coverage by tests sonarqube" mean? - sonarqube

I analyzed a project with SonarQube 6.3, and it gave me the error:
32 more lines of code need to be covered by tests to reach the minimum
threshold of 65.0% lines coverage
It's related to the rule:
Lines should have sufficient coverage by tests
I would like to know if this rule covers all type of tests that I make, or a specific one or does it mean that SonarQube could not reach those lines to analyze.
The reason I am asking this is I don't have tests at all, so this issue message could mean that SonarQube could recognize some tests for other lines which is not the case, so how could that happen?

Starting from 6.2, SonarQube enables the ability to recognize "executable lines" in code files whether or not there are any tests in the files. The feature must also be supported and fed by your analyzer. I'm guessing you're using an analyzer version that does provide that data, and that's where you're getting the calculation of missing coverage on these files that are untouched by unit tests.
Note that before this functionality was added, the situation looked something like this
+--------------+-----------+-------+
| File | Cvd lines | Cvg % |
+--------------+-----------+-------+
| 100LineFile | 75 | 75 |
+--------------+-----------+-------+
| Total | 75 | 75 |
+--------------+-----------+-------+
AND
+--------------+-----------+-------+
| File | Cvd lines | Cvg % |
+--------------+-----------+-------+
| 100LineFile | 75 | 75 |
| 100LineFile2 | 0 | - |
+--------------+-----------+-------+
| Total | 75 | 75 |
+--------------+-----------+-------+
Because files that weren't touched by any unit tests were simply omitted from the calculations, giving a falsely rosy picture of overall coverage. Now it looks like this:
+--------------+-----------+-------+
| File | Cvd lines | Cvg % |
+--------------+-----------+-------+
| 100LineFile | 75 | 75 |
| 100LineFile2 | 0 | 0 |
+--------------+-----------+-------+
| Total | 75 | 37.5|
+--------------+-----------+-------+

Related

Converting Raw Data to Event Log

I do research in the field of Health-PM and facing an unstructured big data which needs a preprocessing phase for converting to suitable event log.
I've just googled and understood no ProM plug-in, stand-alone code, or script has developed specially for this task. Except Celonis, which has claimed developed an event log convertor. I'm also writing an event log generator code for my specific case study.
I just want to know, is there any business solution, case study or article on this topic which investigated this issue?
Thanks.
Soureh
What do you exactly mean with unstructured? Is this a bad-structured table like the example you provided, or is it data that is not structured at all (e.g. a hard disk with files)?
In the first situation, Celonis indeed provide an option to extract events based on tables using Vertica SQL. In their free SNAP environment you can learn how to do that.
In the latter, I quess that at least semi-structured data is needed to extract events on large scale, otherwise your script has no clue where to look for.
Good question! Many process mining papers mention that most of the existing information systems are PAIS (process-aware information system) hence, qualified to perform process mining on them. This is true, BUT, it does not mean you can get the data out-of-the-box!
What's the solution? You may transform the existing data (typically from a relational database of your business solution, e.g., an ERP or HIS system) into an event log that process mining can understand.
It works like this: you look into the table containing, e.g., patient registration data. You need the patient ID of this table and the timestamp of registration for each ID. You create an empty table for your event log, typically called "Activity_Table". You consider giving a name to each activity depending on the business context. In our example "Patient Registration" would be a sound name. You insert all the patient IDs with their respective timestamp into the Activity_Table followed by the same activity name for all rows, i.e., "Patient Registration". The result looks like this:
|Patient-ID | Activity | timestamp |
|:----------|:--------------------:| -------------------:|
| 111 |"Patient Registration"| 2021.06.01 14:33:49 |
| 112 |"Patient Registration"| 2021.06.18 10:03:21 |
| 113 |"Patient Registration"| 2021.07.01 01:20:00 |
| ... | | |
Congrats! you have an event log with one activity. The rest is just the same. You create the same table for every important action that has a timestamp in your database, e.g., "Diagnose finished", "lab test requested", "treatment A finished".
|Patient-ID | Activity | timestamp |
|:----------|:-----------------:| -------------------:|
| 111 |"Diagnose finished"| 2021.06.21 18:03:19 |
| 112 |"Diagnose finished"| 2021.07.02 01:22:00 |
| 113 |"Diagnose finished"| 2021.07.01 01:20:00 |
| ... | | |
Then you UNION all these mini tables and sort it based on Patient-ID and then by timestamp:
|Patient-ID | Activity | timestamp |
|:----------|:--------------------:| -------------------:|
| 111 |"Patient Registration"| 2021.06.01 14:33:49 |
| 111 |"Diagnose finished" | 2021.06.21 18:03:19 |
| 112 |"Patient Registration"| 2021.06.18 10:03:21 |
| 112 |"Diagnose finished" | 2021.07.02 01:22:00 |
| 113 |"Patient Registration"| 2021.07.01 01:20:00 |
| 113 |"Diagnose finished" | 2021.07.01 01:20:00 |
| ... | | |
If you notice, the last two rows have the same timestamp. This is very common when working with real data. To avoid this, we need an extra column called "sorting" which helps the process mining algorithm to understand the "normal" order of activities with the same timestamp according to the nature of the underlying business. In this case, we can easily know that registration happens before diagnosis hence, we assign a low value (e.g., 1) to all "Patient Registration" activities. The table might look like this:
|Patient-ID | Activity | timestamp |Order |
|:----------|:--------------------:|:-------------------:| ----:|
| 111 |"Patient Registration"| 2021.06.01 14:33:49 | 1 |
| 111 |"Diagnose finished" | 2021.06.21 18:03:19 | 2 |
| 112 |"Patient Registration"| 2021.06.18 10:03:21 | 1 |
| 112 |"Diagnose finished" | 2021.07.02 01:22:00 | 2 |
| 113 |"Patient Registration"| 2021.07.01 01:20:00 | 1 |
| 113 |"Diagnose finished" | 2021.07.01 01:20:00 | 2 |
| ... | | | |
Now, you have an event log that process mining algorithms undertand!
Side note:
there has been many attempts to automate event log extraction process. The works of "Eduardo González López de Murillas" are really interesting if you want to follow this topic. I could also recommend this open-access paper by Eduardo et al. 2018:
"Connecting databases with process mining: a meta model and toolset" (https://link.springer.com/article/10.1007/s10270-018-0664-7)

How to do First Pass Yield analysis using Elasticsearch?

I'm starting to explore using Elasticsearch to help analyze engineering data produced in a manufacturing facility. One of the key metrics we analyze if the First Pass Yield (FPY) of any given process. So imagine I had some test data like the following:
Item | Process | Pass/Fail | Timestamp
+-----+---------+-----------+----------
| A | 1 | Fail | 1 | <-- First pass failure
| A | 1 | Pass | 2 |
| A | 2 | Pass | 3 |
| A | 3 | Fail | 4 | <-- First pass failure
| A | 3 | Fail | 5 |
| A | 3 | Pass | 6 |
| A | 4 | Pass | 7 |
---------------------------------------
What I'd like to get out of this is the ability to query this index/type and determine what the first pass yield is by process. So conceptually I want to count the following in some time period using a set of filters:
How many unique items went through a given process step
How many of those items passed on their first attempt at a process
With a traditional RDBMS I can do this easily with subqueries to pull out and combine these counts. I'm very new to ES, so I'm not sure how to query the process data to count how many failures occurred for the first time an item went through that process
My real end goal is to include this on a Kibana dashboard so my customers can quickly analyze the FPY data for different processes over various time periods. I'm not there yet, but I think Kibana will let me use a JSON query if this query requires that today.
Is this possible with Elasticsearch, or am I trying to use the wrong tool for the job here?

Google Compute Engine snapshots not displaying actual space used

If I take a snapshot of a persistent disk, then try to see get information about the snapshot in gcutil, the data is always incomplete. I need to see this data since snapshots are differential.:
server$ gcutil getsnapshot snapshot-3
+----------------------+-----------------------------------+
| name | snapshot-3 |
| description | |
| creation-time | 2014-07-30T06:52:56.223-07:00 |
| status | READY |
| disk-size-gb | 200 |
| storage-bytes | |
| storage-bytes-status | |
| source-disk | us-central1-a/disks/app-db-1-data |
+----------------------+-----------------------------------+
Is there a way to determine what this snapshot is actually occupying? gcutil and the web UI are the only resources I know of, and they are both not displaying this information.
unfortunately it's a bug, known by google developers. They are working on that....

magento compilation mode vs apc

Magento has a compilation mode in which you can compile all files of a Magento installation in order to create a single include path to increase performance. http://alanstorm.com/magento_compiler_path http://www.magentocommerce.com/wiki/modules_reference/english/mage_compiler/process/index
In my current shop setup, I have already configured apc to be used as an opcode cache, and am leveraging its performance gains. http://www.aitoc.com/en/blog/apc_speeds_up_Magento.html
My question are:
1) Is there any advantage of using apc over magento compilation mode, or vice versa? I have a dedicated server for magento, and am looking for maximum performance gains.
2) Will it be useful to use both of these togather? Why, or why not?
These do different things so both together is fine. APC will usually give the greater performance gain that simply enabling compilation, but doing both gives you the best of both worlds.
Just remember when you have enabled compilation you need to disable it before making any code changes or updating/installing modules, then recompile after.
As #JohnBoy has already said in his answer, both can be used in conjunction.
Beyond that, another concern was, if using apc would make the compilation redundant.
So I verified the scenario with some siege load tests and overall, there is definite improvement happening.
Here are the test results
siege --concurrent=50 --internet --file=urls.txt --verbose --benchmark --reps=30 --log=compilation.log
-------------|-------------------------------------------------------------------------------------------------------------------------|
|Compilation |Date & Time |Trans |Elap Time |Data Trans |Resp Time |Trans Rate |Throughput |Concurrent |OKAY |Failed |
-------------|-------------------------------------------------------------------------------------------------------------------------|
|No |2013-09-26 12:27:23 | 600 | 202.37 | 6 | 9.79 | 2.96 | 0.03 | 29.01 | 600 | 0|
-------------|-------------------------------------------------------------------------------------------------------------------------|
|Yes |2013-09-26 12:34:05 | 600 | 199.78 | 6 | 9.73 | 3.00 | 0.03 | 29.24 | 600 | 0|
-------------|-------------------------------------------------------------------------------------------------------------------------|
|No |2013-09-26 12:59:42 | 1496 | 510.40 | 17 | 9.97 | 2.93 | 0.03 | 29.23 | 1496 | 4|
-------------|-------------------------------------------------------------------------------------------------------------------------|
|Yes |2013-09-26 12:46:05 | 1500 | 491.98 | 17 | 9.59 | 3.05 | 0.03 | 29.24 | 1500 | 0|
-------------|-------------------------------------------------------------------------------------------------------------------------|
There was a certain amount of variance; however, the good thing was that there was always some improvement, however miniscule be it.
So we can use both.
The only extra overhead here is disabling and recompiling after module changes.

How to get all projects on a dashboard in multiple columns?

We have a TV displaying our Sonar stats for all our projects, but now that we have 20+ projects, and it doesn't all fit in the screen. We would like our dashboard to look like this (so all the projects fit on one screen):
+----------+----------+ +----------+----------+
| Name | Coverage | | Name | Coverage |
+----------+----------+ +----------+----------+
| Project1 | 45 | | Project5 | 18 |
| Project2 | 15 | | Project6 | 22 |
| Project3 | 45 | | Project7 | 45 |
| Project4 | 15 | | Project8 | 22 |
+----------+----------+ +----------+----------+
Is this possible? Right now we are using the widget "Measure Filter as List", so that we don't have to hard-code the project names into a widget. As new projects get added to Sonar, we don't have to manually add them to any dashboard... they should automatically get added.
Thanks!
This is currently not possible. But you can implement your own widget that displays the list of project using a "fluid" layout in order to meet your needs.
See our sample plugin to learn how to write your own plugin that adds a widget in SonarQube.

Resources