How to analyse data set of information using bash? - bash

I have got a data set:
16
18
1
24
7
13
13
24
17
15
10
16
2
16
11
21
15
24
6
13
2
2
21
16
8
16
11
0
19
Background: These values represent the number of hours a device is operating per given day. I have about a months worth of data presented above, and the company has requested of me to 'analyse' this data, ie see if any patterns are emerging. Ideally I would like to use bash to write a code that can do this for me - but I don't know how to do it.
Research: I have looked at various question in analysing data. Using a counter could work, other suggestions include machine learning, but this may be too difficult in bash.
My Approach: I'm thinking of having two parameters (a) easy working (<=12hrs) & (b) hard working (>12hrs). So the code then compare the pattern of a and b, so then the data would be for example, aaaabbababaababba, so from this the software should be able to recognize if a pattern is emerging - or there is a pattern present.
Questions: (Q1) How can I make software 'realize' a pattern is emerging? (Q2) Can such software be written in bash? (Q3) Is there a better approach to mine? (Q4) What would be a better language to write such a code in?

Related

how do you work out how many bits are needed for the opcode?

There is 16 bits/word, and the instruction set consists of 17 different operations.
I know that there is 5 bits needed for the opcode, but I have no idea why. Why is 5 bits needed for the opcode?
You need 17 different values to represent the 17 different opcodes. For example, with 1 bit, you can represent two (21) different values: 0 and 1. The minimum number of bits to represent 17 different values is 5 because 24 is only 16 but 25 is 32 (which is >= 17). These calculations are based on a fundamental counting principle called the rule of product.
There is 16 bits/word
This piece of information is irrelevant.

Lossless compression of an ordered series of 29 digits (each 0 to 5 Likert scale)

I have a survey with 29 questions, each with a 5-point Likert scale (0=None of the time; 4=Most of the time). I'd like to compress the total set of responses to a small number of alpha or alphanumeric characters, adding a check digit to the end.
So, the set of responses 00101244231023110242231421211 would get turned into something like A2CR7HW4. This output would be part of a printout that a non-techie user would enter on a website as a shortcut to entering the entire string. I'd want to avoid ambiguous characters, such as 0,O,D,I,l,5,S, leaving me with 21 or 22 characters to use (uppercase only). Alternatively, I could just stick with capital alpha only and use all 26 characters.
I'm thinking to convert each pair of digits to a letter (5^2=25, so the whole alphabet is adequate). That would reduce the sequence to 15 characters, which is still longish to type without errors.
Any other suggestions on how to minimize the length of the output?
EDIT: BTW, for context, the survey asks 29 questions about mental health symptoms, generating a predictive risk for 4 psychiatric conditions. Need a code representing all responses.
If the five answers are all equally likely, then the best you can do is ceiling(29 * log(5) / log(n)) symbols, where n is the number of symbols in your alphabet. (The base of the logarithm doesn't matter, so long as they're both the same.)
So for your 22 symbols, the best you can do is 16. For 26 symbols, the best is 15, as you described for 25. If you use 49 characters (e.g. some subset of the upper and lower case characters and the digits), you can get down to 12. The best you'll be able to do with printable ASCII characters would be 11, using 70 of the 94 characters.
The only way to make it smaller would be if the responses are not all equally likely and are heavily skewed. Though if that's the case, then there's probably something wrong with the survey.
First, choose a set of permissible characters, i.e.
characters = "ABC..."
Then, prefix the input-digits with a 1 and interpret it as a quinary number:
100101244231023110242231421211
Now, convert this quinary number to a number in base-"strlen(characters)", i.e. base26 if 26 characters are to be used:
02 23 18 12 10 24 04 19 00 15 14 20 00 03 17
Then, use these numbers as index in "characters", and you have your encoding:
CVSMKWETAPOUADR
For decoding, just reverse the steps.
Are you doing this in a specific language?
If you want to be really thrifty about it you might want to consider encoding the data at bit level.
Since there are only 5 possible answers per question you could do this with only 3 bits:
000
001
010
011
100
Your end result would be a string of bits, at 3-bits per answer so a total of 87 bits or 10 and a bit bytes.
EDIT - misread the question slightly, there are 5 possible answers not 4, my mistake.
The only problem now is that for 4 of your 5 answers you're wasting a bit...you ain't gonna benefit much from going to this much trouble I wouldn't say but it's worth considering.
EDIT:
I've been playing about with it and it's difficult to work out a mechanism that allows you to use both 2 and 3 bit values.
Since your output would be a 97 bit binary value you'd need ot be able make the distinction between 2 and 3 bits values when converting back to the original values.
If you're working with a larger number of values there are some methods you could use, like having a reserved bit for each values that can be used to sort of type a value and give it some meaning. But working with so few bits as it is, it's hard to shave anything off.
Your output at 97 bits could be padded out to 128 bits, which would give you 4 32-bit values if you wanted to simplify it. this 128 bit value would be like a unique fingerprint representing a specific set of answers. There are many ways you can represnt 128 bits.
But in the end borking at bit-level is about as good as it gets when it comes to actual compression and encoding of data...if you can express 5 unique values in less than 3 bits I'd be suitably impressed.

How to improve the quality of the `concorde` TSP solver? Am I misusing it?

I'm trying to use the concorde TSP solver in a file using the following format:
NAME : p5
COMMENT : Nada
TYPE : TSP
DIMENSION : 20
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
0 0.329733 0.67714
1 0.823944 0.035369
2 0.002488 0.866692
3 0.241964 0.671822
4 0.98876 0.134457
5 0.879147 0.457779
6 0.021017 0.271951
7 0.221737 0.367143
8 0.549802 0.523319
9 0.363839 0.22359
10 0.696631 0.495935
11 0.279072 0.100501
12 0.660156 0.860675
13 0.251769 0.029172
14 0.32112 0.207704
15 0.821433 0.507387
16 0.095411 0.953448
17 0.115897 0.269363
18 0.704484 0.411328
19 0.705198 0.795917
Since I couldn't find a guide about the format, I just modified a sample file I downloaded. I am running the following command:
concorde myFile.tsp
It quickly (~45ms) outputs the solution as a .sol file, which results in something like that:
20
0 10 19 8 12 15 5 4 18 1
9 17 6 11 7 13 14 3 2 16
Graphing, I get:
Which, by visual inspection, is too far from an ideal solution. Thus,
Am I doing something wrong with the file format or command?
If not, considering how fast it computed the solution, can I prompt it to spend more time looking for better solutions?
EUC_2D is the rounded L2 norm. That is, the distance between two points is taken to be their Euclidean distance rounded to the nearest integer. Your points are all going to be at distances 0 or 1 to one another and Concorde is going to generate a daft tour like the one you drew.
Scale your problem up until the rounding stops making a difference.

How to print the calculated process of a game like 24point? [duplicate]

This question already has answers here:
Writing an algorithm to decide whether a target number can be reached with a set of other numbers and specific operators?
(3 answers)
Closed 8 years ago.
Here's the problem:
Given 4 numbers, I need to give a calculated process which results 24. All the operations I can use are addition, subtraction, multiplication, division. How to print the calculated process?
Ex:
Input: 4,7,8,8
Output: (7-(8/8))*4=24.
(The following is an expansion on an idea suggested by Sayakiss)
One option would be enumerating all possible combinations of numbers and arithmetic operations performed on them.
If you have 4 numbers, there are only 24 different ways to write them in a list (the following example is for numbers 4, 7, 8, 9 - i changed the last number in your example to make them all different):
4 7 8 9
4 7 9 8
4 8 7 9
4 8 9 7
...
9 8 7 4
If some numbers are identical, some of the above lists will appear twice (not a problem).
For each of the above orderings, there are 64 different ways to insert an arithmetic operation between the numbers:
4+7+8+9
4+7+8-9
4+7+8*9
4+7+8/9
4+7-8+9
...
4/7/8/9
For each of the above sequences, there are 5 ways to place parentheses:
((4-7)-8)-9
(4-7)-(8-9)
(4-(7-8))-9
4-((7-8)-9)
4-(7-(8-9))
When you combine all 3 "aspects" mentioned above, you get 24 * 64 * 5 = 7680 expressions; evaluate each one and check whether its value is 24 (or whatever number you need it to be).
It may be convenient to generate the expressions in a tree form, to simplify evaluation (this depends on the programming language you want to use; e.g. in C/C++ there is no eval function) . For example, the expression 4*((7-8)+9) may be represented by the following tree:
*
/ \
4 +
/ \
- 9
/ \
7 8
Some notes:
You may want to tweak the choice of arithmetic operations to allow for expressions like 47+88 - not sure whether the rules of your game permit that.
Many of the evaluated expressions may be annoyingly verbose, like ((4+7)+8)+8 and 4+(7+(8+8)) (which are also examined twice, with the order of the 8's switched); you could prevent that by inserting some dedicated checks into your algorithm.

Creating a B-tree

I am reviewing for my exam tomorrow and was stuck on a question. I have to draw a valid B-tree where M = 4 and L = 3 containing the values 1-25. The problem is that I can't get my tree to look like the answer. The answer tree looks like this:
9 14 22
/ | | \
4 7 12 17 20 24
/ | \ / \ / | \ / \
1 4 7 9 12 14 17 20 22 24
2 5 8 10 13 15 18 21 23 25
3 6 11 16 19 21
Sorry if this is difficult to read. Perhaps I copied the answer wrong but can anyone confirm if this the correct answer? If so how was this answer reached?
Looks like you're talking about a B+ Tree rather than a BTree, and there is a small typo: you have key 21 duplicated in the leaf [20,21,21]. As you say, the Order is 4.
The answer is a valid B+ Tree, but not the one you'd get by adding values 1-25 in sequence. Did the question give a specific order in which the keys were to be added, or was the question to try and determine that for yourself? Other than a lengthy trial and error process I'm not sure how you'd determine the sequence, but you can try it out by using the demo page here:
http://goneill.co.nz/btree-demo.php
If you want to try various sequences of insert you'd do better to download the offline version and edit the Hardcoded() function:
http://goneill.co.nz/btree.php
It's all in JavaScript which might not be useful to you though.

Resources