How to convert multiple record updates into periodic snapshot in kusto - snapshot

I have a mechanism that is posting an update to Azure Data Explorer each time a record changes at source. So the data end up as a series of versions of the record in ADX. I would like to turn it into a daily snapshot with the most recent version being used at the snapshot time. I have managed to do something close with
let visits = datatable(id:guid, timestamp:datetime, category:string, start:datetime, end:datetime, row:int)
[
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-01T01:02:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 1,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-02T02:02:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 2,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-02T02:05:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 3,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-04T04:05:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 4,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-05T07:05:03), "SRU", datetime(2021-09-30T01:02:03), datetime(2021-10-01T07:00:00), 5
];
let binsize = 1d;
let min_date_time = toscalar(visits | summarize startofday(min(timestamp)));
let max_date_time = toscalar(visits | summarize endofday(max(timestamp)));
//
range hour from min_date_time to max_date_time step binsize
| join kind=leftouter (
visits
| summarize arg_max(timestamp, *) by id, bin(timestamp, binsize)
| extend hour = bin(timestamp, binsize)
) on hour
| project-away hour1
This gives the following:
hour
id
timestamp
timestamp1
category
start
end
row
2021-10-01 00:00:00.0000000
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-01 00:00:00.0000000
2021-10-01 01:02:03.0000000
SRU
2021-09-30 01:02:03.0000000
1
2021-10-02 00:00:00.0000000
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-02 00:00:00.0000000
2021-10-02 02:05:03.0000000
SRU
2021-09-30 01:02:03.0000000
3
2021-10-03 00:00:00.0000000
2021-10-04 00:00:00.0000000
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-04 00:00:00.0000000
2021-10-04 04:05:03.0000000
SRU
2021-09-30 01:02:03.0000000
4
2021-10-05 00:00:00.0000000
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-05 00:00:00.0000000
2021-10-05 07:05:03.0000000
SRU
2021-09-30 01:02:03.0000000
2021-10-01 07:00:00.0000000
5
This is sort of right as it correctly picks the latest value (row 3) for 2021-10-02, however it doesn't pick row 3 for the following time period and we get blanks.
I'm stumped on the last part.
If it helps, the next part of the puzzle is to aggregate grouped on category resulting in something like
day
category
total
started
ended
2021-10-01
SRU
1
1
0
2021-10-02
SRU
1
1
0
2021-10-03
SRU
1
1
0
2021-10-04
SRU
1
1
0
2021-10-05
SRU
1
0
1

Here is the solution for the first table, uncomment the last line to get the second table:
let visits = datatable(id:guid, timestamp:datetime, category:string, start:datetime, end:datetime, row:int)
[
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-01T01:02:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 1,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-02T02:02:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 2,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-02T02:05:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 3,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-04T04:05:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 4,
"b5ce180e-ce11-4936-b3f1-c817a261622e", datetime(2021-10-05T07:05:03), "SRU", datetime(2021-09-30T01:02:03), datetime(2021-10-01T07:00:00), 5,
"8acaffa4-3ab8-479c-8f13-191c016bff70", datetime(2021-10-01T01:02:03), "SRU", datetime(2021-09-30T01:02:03), datetime(null), 6,
"8acaffa4-3ab8-479c-8f13-191c016bff70", datetime(2021-10-02T02:02:03), "SRU", datetime(2021-09-30T01:02:03), datetime(2021-10-02T07:00:00), 7
];
let binsize = 1d;
let StartDate = datetime(2021-10-01);
let EndDate = datetime(2021-10-06);
visits
| summarize arg_max(timestamp, *) by ['id'], Day = bin(timestamp,1d)
| partition hint.strategy=native by ['id']
(
make-series timestamp = take_any(tolong(timestamp)) default=long(null),
start = take_any(tolong(start)) default=long(null),
end = take_any(tolong(end)) default=long(null),
row = take_any(row) default=long(null),total = count() default=long(null),
started=countif(isnull(end)) default=long(null),
ended=countif(isnotnull(end))
on Day from StartDate to EndDate step 1d by category, ['id']
| extend timestamp = series_fill_forward(timestamp),
start = series_fill_forward(start),
end = series_fill_forward(end),
row = series_fill_forward(row),
total = series_fill_forward(total),
started=series_fill_forward(started),
ended=series_fill_forward(ended)
)
| mv-expand timestamp to typeof(long), start to typeof(long), end to typeof(long), Day to typeof(datetime), row to typeof(int), total to typeof(int), started to typeof(int), ended to typeof(int)
| extend timestamp = todatetime(timestamp), start=todatetime(start), end=todatetime(end)
| project-reorder id, Day, timestamp, start, end, row, category
//| summarize Total = sum(total), sum(started), sum(ended) by Day
id
Day
timestamp
start
end
row
category
total
started
ended
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-01 00:00:00.0000000
2021-10-01 01:02:03.0000000
2021-09-30 01:02:03.0000000
1
SRU
1
1
0
8acaffa4-3ab8-479c-8f13-191c016bff70
2021-10-01 00:00:00.0000000
2021-10-01 01:02:03.0000000
2021-09-30 01:02:03.0000000
6
SRU
1
1
0
8acaffa4-3ab8-479c-8f13-191c016bff70
2021-10-02 00:00:00.0000000
2021-10-02 02:02:03.0000000
2021-09-30 01:02:03.0000000
2021-10-02 07:00:00.0000000
7
SRU
1
0
1
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-02 00:00:00.0000000
2021-10-02 02:05:03.0000000
2021-09-30 01:02:03.0000000
3
SRU
1
1
0
8acaffa4-3ab8-479c-8f13-191c016bff70
2021-10-03 00:00:00.0000000
2021-10-02 02:02:03.0000000
2021-09-30 01:02:03.0000000
2021-10-02 07:00:00.0000000
7
SRU
1
0
1
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-03 00:00:00.0000000
2021-10-02 02:05:03.0000000
2021-09-30 01:02:03.0000000
3
SRU
1
1
0
8acaffa4-3ab8-479c-8f13-191c016bff70
2021-10-04 00:00:00.0000000
2021-10-02 02:02:03.0000000
2021-09-30 01:02:03.0000000
2021-10-02 07:00:00.0000000
7
SRU
1
0
1
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-04 00:00:00.0000000
2021-10-04 04:05:03.0000000
2021-09-30 01:02:03.0000000
4
SRU
1
1
0
8acaffa4-3ab8-479c-8f13-191c016bff70
2021-10-05 00:00:00.0000000
2021-10-02 02:02:03.0000000
2021-09-30 01:02:03.0000000
2021-10-02 07:00:00.0000000
7
SRU
1
0
1
b5ce180e-ce11-4936-b3f1-c817a261622e
2021-10-05 00:00:00.0000000
2021-10-05 07:05:03.0000000
2021-09-30 01:02:03.0000000
2021-10-01 07:00:00.0000000
5
SRU
1
0
1

Related

Deleting a section of file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I couldn't think of a solution in bash that could delete a section of a file and hence I am posting it here and looking for help.
so I have a file that looks like this:
track type=wiggle_0 name= description=
variableStep chrom=chr1
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr2
10203 3
10213 4
10223 5
10233 5
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr3
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
track type=wiggle_0 name= description=
variableStep chrom=chrM
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chrX
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
I want to delete/remove the section
track type=wiggle_0 name= description=
variableStep chrom=chrM
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
should be possible using a combination of
grep and cut but can't figure it out.
Just to clear it, I want to delete the block containing chrM
Thank you in advance for any solutions.
Note: What I am doing
$ cat tst.awk /^track/ { track=$0 ORS; next } /chrom/ { f=(/chrM/ ? 1 : 0) } if { print track $0; track="" }
and I get the error:
bash: !f: event not found
Based on one possible interpretation of your requirements (that you want to delete the block containing chrM), this will work using any awk in any shell on any UNIX box:
$ cat tst.awk
/^track/ { track=$0 ORS; next }
/chrom/ { f=(/chrM/ ? 1 : 0) }
!f { print track $0; track="" }
.
$ awk -f tst.awk file
track type=wiggle_0 name= description=
variableStep chrom=chr1
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr2
10203 3
10213 4
10223 5
10233 5
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr3
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
track type=wiggle_0 name= description=
variableStep chrom=chrX
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
You can use gnu awk as this:
kw='track type=wiggle_0 name= description='
awk -v kw="$kw" -v RS="$kw[[:space:]]*" -v ORS= 'NR>1 && !/^variableStep chrom=chrM/{print kw "\n" $0}' file
Output:
track type=wiggle_0 name= description=
variableStep chrom=chr1
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr2
10203 3
10213 4
10223 5
10233 5
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr3
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
track type=wiggle_0 name= description=
variableStep chrom=chrX
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
Unified awk solution:
awk '/^track type=wiggle_0 name= description=/{ if (f) f=0; t=$0; n=NR+1; next }
n && NR==n{
if (/variableStep chrom=chrM/) { f=1; next }
else { print t; f=t=n=0 }
}
f{ next }1' file
The output:
track type=wiggle_0 name= description=
variableStep chrom=chr1
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr2
10203 3
10213 4
10223 5
10233 5
10263 3
10366 6
10376 10
track type=wiggle_0 name= description=
variableStep chrom=chr3
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
track type=wiggle_0 name= description=
variableStep chrom=chrX
10203 3
10213 4
10223 5
10233 5
10243 4
10253 3
10263 3
10366 6
10376 10
you can use sed:
sed -n '/variableStep chrom=chrM/,/10376 10/\!p' file | uniq
1) sed -n '/str1/,/str2/\!p' file
2) -n - for silent
3) //,//p - used for getting section between str1 & str2 using \! will get the all file excluding the ection between str1 & str2
4) uniq - remove "track type=wiggle_0 name= description=" extra line after removing line will be written twice

How can we combine two files based on a condition in awk command?

I have two text files which has space seperated values, i want to combine the files based on a key column from both files and output in another file.
location.txt
1 21.5 23
2 24.5 20
3 19.5 19
4 22.5 15
5 24.5 12
6 19.5 12
data.txt which has milllion of data, but i will give simple few entries here,
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742
What i trying is to combine these two files based on the key value of column 1 from location.txt and column 4 from data.txt and get the result in format as below by combining all the data from data.txt and column 2 and 3 from location.txt..
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
I'm using awk command:
awk -F' ' "NR==FNR{label[$1]=$1;x[$1]=$2;y[$1]=$3;next}; ($2==label[$2]){print $0 "," x[$2] y[$3]}" location.txt data.txt > result.txt
But I'm not getting the output as i expected, Can anyone help me fix this?
can we get the result file in csv format with space replaced with comma?
In awk:
$ awk '
NR==FNR { # process location.txt
a[$1]=$2 OFS $3 # hash using $1 as key
next # next record
}
$4 in a { # process data.txt
print $0,a[$4] # output record and related location
}' location.txt data.txt # mind the file order
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
With bash and join
join -1 1 -2 4 <(sort -k1,1 -n location.txt) <(sort -k4,4 -n data.txt) -o 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,1.2,1.3
Output:
2004-03-31 03:38:15.757551 2 1 122.153 -3.91901 11.04 2.03397 21.5 23
2004-02-28 00:59:16.02785 3 2 19.9884 37.0933 45.08 2.69964 24.5 20
2004-02-28 01:03:16.33393 11 3 19.3024 38.4629 45.08 2.68742 19.5 19
2004-02-28 01:06:16.013453 17 4 19.1652 38.8039 45.08 2.68742 22.5 15
2004-02-28 01:06:46.778088 18 5 19.175 38.8379 45.08 2.69964 24.5 12
2004-02-28 01:08:45.992524 22 6 19.1456 38.9401 45.08 2.68742 19.5 12
See: man join

Print all the permutation of vector

Let's design a clock. This clock has a format of (hour: miniutes). Lets use bit to represent it.
For example,
10: 15 is represented as 1010:1111.
Question: given n as total number of bit that is 1, print all the possible clock time. (eg, in the 10:15 example, n = 6, but I am asking you to print all the other possible configuration that also has six bit 1 ).
My attempt: I stored hours into a 5-element vector (max 24). And minutes into a 6-element vector (max 60). And then I divide n into two numbers: n = n-i, i, for i in the [0, n].
The former n-i represent number of 1s bit to turn on for hour vector, the latter i represents number of 1s bit to turn on for the minute vector. Then you can use next_permutation to get the next ordering of the vector (given that they have same number of 1s).
However, this is more like a brute force solution. I am wondering if you guys have better algorithms in mind?
class Solution5 {
public:
void print_clock (int n) {
if (n == 0) {cout<<"0"<<endl; return;}
vector<vector<int>> res;
vector<int> hour (5, 0);
vector<int> min (6, 0);
for (int i=0; i< n; i++)
rec_print(n - i, i, res, hour, min);
cout<<endl<<"completed"<<endl;
}
void rec_print (int h, int m, vector<vector<int>> & res, vector<int> hour, vector<int> min) {
if (h > 5 || m > 6) return;
int z = hour.size() -1;
while (h-- > 0) hour[z--] = 1;
z = min.size() -1;
//vector<int> hour = {0,0,1,1,1};
while (m-- > 0) min[z--] = 1;
//while(next_permutation(hour.begin(), hour.end()) )
// cout<<"he";
while (get_vector(h, hour, 24) ) {
vector<int> tmp = min;
while (get_vector(m, tmp, 60)) {
cout<<"hour : ";
for (int i=0 ; i<hour.size(); i++)
cout<<hour[i];
cout<<endl<<"miniutes : ";
for (int i=0; i<min.size(); i++)
cout<<min[i];
cout<<endl<<"---------------"<<endl;
if(next_permutation(tmp.begin(), tmp.end()) == 0 ) break;
}
if (next_permutation(hour.begin(), hour.end())== 0) break;
}
//cout<<endl<<"completed"<<endl;
}
bool get_vector (int n, vector<int> & tmp, int maxi) {
int sum = 0;
for (int i = tmp.size() - 1; i >=0; i--) {
sum += tmp[i] * pow(2,tmp.size() -1 - i);
if (sum > maxi)
return false;
}
return true;
}
Using a function that counts the number of ones in the binary representation of an integer (see e.g. this question) you could iterate over the numbers 0 - 59, measure their number of ones (aka pop count or Hamming weight) and store them in a table like:
0: [0]
1: [1,2,4,8,16,32]
2: [3,5,6,9,10,12,17,18,20,24,33,34,36,40,48]
...
5: [31,47,55,59]
Then you iterate over the numbers 0 - 23, get their pop count c, and combine them with the numbers in the table row n-c, e.g.:
0: c=0, n-c=6 -> none
1: c=1, n-c=5 -> 1:31, 1:47, 1:55, 1:59
...
23: c=4, n-c=2 -> 23:03, 23:05 ... 23:40, 23:48
This gives you the times in chronological order. I think the code will be both shorter and easier to understand than what you currently have.
Here's a quick code example (pardon my amateur C++). You can use any popcount function, like the one built-in to the GNU compiler, or convert the integers to a bitset and use the bitset::count function.
int popcount(int i) {
return __builtin_popcount(i); // GNU compiler built-in
}
int pop_count_times(int count) {
std::vector<std::vector<int>> minutes(6);
for (int m = 0; m < 60; m++) {
minutes[popcount(m)].push_back(m);
}
for (int h = 0; h < 24; h++) {
int ones = count - popcount(h);
if (ones < 0 || ones > 5) continue;
for (int m = 0; m < minutes[ones].size(); m++) {
std::cout << h << ':' << minutes[ones][m] << ' ';
}
}
}
If you want to pad the numbers with leading zeros, you should add cout.fill('0') and cout.width(2) before printing each integer. In that case, the output for count=6 is:
01:31 01:47 01:55 01:59 02:31 02:47 02:55 02:59 03:15 03:23 03:27 03:29 03:30 03:39 03:43 03:45 03:46 03:51 03:53 03:54 03:57 03:58 04:31 04:47 04:55 04:59 05:15 05:23 05:27 05:29 05:30 05:39 05:43 05:45 05:46 05:51 05:53 05:54 05:57 05:58 06:15 06:23 06:27 06:29 06:30 06:39 06:43 06:45 06:46 06:51 06:53 06:54 06:57 06:58 07:07 07:11 07:13 07:14 07:19 07:21 07:22 07:25 07:26 07:28 07:35 07:37 07:38 07:41 07:42 07:44 07:49 07:50 07:52 07:56 08:31 08:47 08:55 08:59 09:15 09:23 09:27 09:29 09:30 09:39 09:43 09:45 09:46 09:51 09:53 09:54 09:57 09:58 10:15 10:23 10:27 10:29 10:30 10:39 10:43 10:45 10:46 10:51 10:53 10:54 10:57 10:58 11:07 11:11 11:13 11:14 11:19 11:21 11:22 11:25 11:26 11:28 11:35 11:37 11:38 11:41 11:42 11:44 11:49 11:50 11:52 11:56 12:15 12:23 12:27 12:29 12:30 12:39 12:43 12:45 12:46 12:51 12:53 12:54 12:57 12:58 13:07 13:11 13:13 13:14 13:19 13:21 13:22 13:25 13:26 13:28 13:35 13:37 13:38 13:41 13:42 13:44 13:49 13:50 13:52 13:56 14:07 14:11 14:13 14:14 14:19 14:21 14:22 14:25 14:26 14:28 14:35 14:37 14:38 14:41 14:42 14:44 14:49 14:50 14:52 14:56 15:03 15:05 15:06 15:09 15:10 15:12 15:17 15:18 15:20 15:24 15:33 15:34 15:36 15:40 15:48 16:31 16:47 16:55 16:59 17:15 17:23 17:27 17:29 17:30 17:39 17:43 17:45 17:46 17:51 17:53 17:54 17:57 17:58 18:15 18:23 18:27 18:29 18:30 18:39 18:43 18:45 18:46 18:51 18:53 18:54 18:57 18:58 19:07 19:11 19:13 19:14 19:19 19:21 19:22 19:25 19:26 19:28 19:35 19:37 19:38 19:41 19:42 19:44 19:49 19:50 19:52 19:56 20:15 20:23 20:27 20:29 20:30 20:39 20:43 20:45 20:46 20:51 20:53 20:54 20:57 20:58 21:07 21:11 21:13 21:14 21:19 21:21 21:22 21:25 21:26 21:28 21:35 21:37 21:38 21:41 21:42 21:44 21:49 21:50 21:52 21:56 22:07 22:11 22:13 22:14 22:19 22:21 22:22 22:25 22:26 22:28 22:35 22:37 22:38 22:41 22:42 22:44 22:49 22:50 22:52 22:56 23:03 23:05 23:06 23:09 23:10 23:12 23:17 23:18 23:20 23:24 23:33 23:34 23:36 23:40 23:48

How can I sort the output from an awk script?

When I run check script, I get the o/p as below,
-rw-rw-r-- 1 noper sbcprd 9175 Aug 6 03:36 opLogDir
-rw-rw-r-- 1 soper sbcprd 9104 Aug 6 03:04 opLogDir
-rw-rw-r-- 1 moper sbcprd 9561 Aug 6 02:18 opLogDir
-rw-rw-r-- 1 woper sbcprd 9561 Aug 6 05:06 opLogDir
-rw-rw-r-- 1 boper sbcprd 9834 Aug 6 03:34 opLogDir
-rw-rw-r-- 1 xoper sbcprd 9873 Aug 6 00:50 opLogDir
-rw-rw-r-- 1 doper sbcprd 9479 Aug 6 04:12 opLogDir
Now I can select data from it and sort using:
check | awk '{print $3,$8,$6,$7}'| sort
and get the o/p as below,
boper 03:34 Aug 6
doper 04:12 Aug 6
moper 02:18 Aug 6
noper 03:36 Aug 6
soper 03:04 Aug 6
woper 05:06 Aug 6
xoper 00:50 Aug 6
It sorts the o/p by column #1.
How can I sort the o/p according to the timings (column #2)?
Use the sort command. sort -k 2
Although, given this looks like ls output, you could probably change check to have ls -lt to sort by timestamp instead.
If you need something more extensive (e.g. that includes date and time based sorting) then it's harder - you'll need to use something that can parse the timestamp into a unix time.
E.g.:
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Piece;
my $year = Time::Piece->localtime -> year;
for ( <DATA> ) {
my ( $mon, $day, $time ) = (split)[5,6,7];
my $timestamp = Time::Piece->strptime("$mon $day $time $year", '%b %d %H:%M %Y');
print $timestamp -> epoch,"\n";
}
__DATA__
-rw-rw-r-- 1 noper sbcprd 9175 Aug 6 03:36 opLogDir
-rw-rw-r-- 1 soper sbcprd 9104 Aug 6 03:04 opLogDir
-rw-rw-r-- 1 moper sbcprd 9561 Aug 6 02:18 opLogDir
-rw-rw-r-- 1 woper sbcprd 9561 Aug 6 05:06 opLogDir
-rw-rw-r-- 1 boper sbcprd 9834 Aug 6 03:34 opLogDir
-rw-rw-r-- 1 xoper sbcprd 9873 Aug 6 00:50 opLogDir
-rw-rw-r-- 1 doper sbcprd 9479 Aug 6 04:12 opLogDir
Or with the sorting logic built int:
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Piece;
my $year = Time::Piece->localtime -> year;
sub sort_by_timestamp {
my ( $amon, $aday, $atime ) = (split( " ", $a ))[5,6,7];
my ( $bmon, $bday, $btime ) = (split( " ", $b ))[5,6,7];
my $at = Time::Piece->strptime("$amon $aday $atime $year", '%b %d %H:%M %Y');
my $bt = Time::Piece->strptime("$bmon $bday $btime $year", '%b %d %H:%M %Y');
return $at <=> $bt;
}
print sort { sort_by_timestamp } <DATA>;
__DATA__
-rw-rw-r-- 1 noper sbcprd 9175 Aug 6 03:36 opLogDir
-rw-rw-r-- 1 soper sbcprd 9104 Aug 6 03:04 opLogDir
-rw-rw-r-- 1 moper sbcprd 9561 Aug 6 02:18 opLogDir
-rw-rw-r-- 1 woper sbcprd 9561 Aug 6 05:06 opLogDir
-rw-rw-r-- 1 boper sbcprd 9834 Aug 6 03:34 opLogDir
-rw-rw-r-- 1 xoper sbcprd 9873 Aug 6 00:50 opLogDir
-rw-rw-r-- 1 doper sbcprd 9479 Aug 6 04:12 opLogDir
Note - for obvious reasons, this won't work very well when you span a year.
If you don't know -k option, you can still do it by putting $8 at the first placeļ¼š
awk '{print $8,$3,$6,$7}'

killing series of processes

When I enter the command ps -ef |grep sharatds , I get a list of processes.
sharatds 13164 13163 0 20:53 pts/2 00:00:00 [bt.C.256] <defunct>
sharatds 13165 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13199 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13233 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13267 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13301 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13335 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13369 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13403 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13437 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13471 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13505 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13539 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13573 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13607 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13641 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13675 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13709 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13743 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13777 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13811 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13845 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13879 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
sharatds 13913 13163 0 20:53 pts/2 00:00:00 [rsh] <defunct>
I want to kill all the processes which have the last column as defunct .
Can anybody help me with a script ?
This will do:
ps -ef | grep sharatds | awk '{print $2}' | xargs kill
I usually do something like this:
kill $(ps -ef |grep sharatds|awk '{print $2}')
Edit: Wait! Those are defunct processes. They are already dead, and cannot be killed further! The parent process will have to run wait() to read their statuses so that they can be cleaned up and removed from the process table.

Resources