I have a large log file containing lines for a particular task as follows:
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
I want to count the number of unique "My task"s logged. Which in this case should be 3.
I have used these two commands which, in my opinion, should give the same and correct results:
grep 'My Task :' | uniq | wc -l
grep -E 'My Task :' | sort --unique | grep -cE 'My Task :'
The two commands give the same results on the small test files I create but different results on the large log file on the server. I cannot understand why. To be exact, the first command gives a count of ~33k while the second one gives ~15k. Which command of the two, if any is correct? And what should I ideally be doing?
It's possible it happens because uniq can only find consecutive
identical lines. Say, if your file looks like this:
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
results will be different:
$ grep 'My task :' FILE | uniq | wc -l
15
$ grep -E 'My task :' FILE | sort --unique | wc -l
3
Problem: uniq and sort -u
They should be equivalent in the simple case, but will behave
differently if you're using the -k option to define only certain
fields of the input line to use as sort keys. In that case, sort -u
will suppress lines which have the same key even if other parts of the
line differ, whereas uniq will only suppress lines that are exactly
identical.
**Difference Between the Two Commands **
1st Command grep 'My Task :' | uniq | wc -l: It prints the count
of lines which are unique for 'My Task'
2nd Command grep -E 'My Task :' | sort --unique | grep -cE 'My Task' : It prints the count of matching pattern 'My Task'
Discrepancy b/w the two depends on the contents of your log file.
To answer your question which one to use:
When using grep with -E like you did try to make an apt pattern and then count the lines.
No need multiple commands to count unique number.
awk '/My task/ {a[$NF]++;c+=a[$NF]==1?1:0} END {print c}' file
3
/My task/ do line contain My task, if yes:
a[$NF]++ create an array with number as a key. First time a value is found it will be 1, second time 2, third time same value found it will be 3 etc
c+=a[$NF]==1?1:0 if array a[number] is 1 (first time found), increment c with 1, else add 0
{print c} print number of unique number from variable c
Related
I'm getting some values with jq command like these:
curl xxxxxx | jq -r '.[] | ["\(.job.Name), \(.atrib.data)"]' | #tsv' | column -t -s ","
It gives me:
AAAA PENDING
ZZZ FAILED BAD
What I want is that I get is a first field with a secuencial number (1 ....) like these:
1 AAA PENDING
2 ZZZ FAILED BAD
......
Do you know if it's possible? Thanks!
One way would be to start your pipeline with:
range(0;length) as $i | .[$i]
You then can use $i in the remainder of the program.
I have two files, one with about 100 root domains, and second file with URLs only. Now I have to filter that URL list to get third file which contains only URLs that have domains from the list.
Example of URL list:
| URL |
| ------------------------------|
| http://github.com/name |
| http://stackoverflow.com/name2|
| http://stackoverflow.com/name3|
| http://www.linkedin.com/name3 |
Example of word list:
github.com
youtube.com
facebook.com
Resut:
| http://github.com/name |
My goal is to filter out whole row where URL contain specific word. This is what I tried:
for i in $(cat domains.csv);
do grep "$i" urls.csv >> filtered.csv ;
done
Result is strange, I've got some of the links, but not all of them that contain root domains from the first file. Then I tried to do the same thing with python and saw that bash doesn't do what I wanted, I've got better result with python script, but it takes more time to write python script than running bash commands.
How shoud I accomplish this with bash in further ?
Using grep:
grep -F -f domains.csv url.csv
Test Results:
$ cat wordlist
github.com
youtube.com
facebook.com
$ cat urllist
| URL |
| ------------------------------|
| http://github.com/name |
| http://stackoverflow.com/name2|
| http://stackoverflow.com/name3|
| http://www.linkedin.com/name3 |
$ grep -F -f wordlist urllist
| http://github.com/name |
I'm writing scenarios for QA engineers, and now I face a problem such as step encapsulation.
This is my scenario:
When I open connection
And device are ready to receive files
I send to device file with params:
| name | ololo |
| type | txt |
| size | 123 |
All of each steps are important for people, who will use my steps.
And I need to automate this steps and repeat it 100 times.
So I decide create new step, which run it 100 times.
First variant was to create step with other steps inside like:
Then I open connection, check device are ready and send file with params 100 times:
| name | ololo |
| type | txt |
| size | 123 |
But this version is not appropriate, because:
people who will use it, will don't understand which steps executes inside
and some times name of steps like this are to long
Second variant was to create step with other steps in parameters table:
I execute following steps 100 times:
| When I open connection |
| And device are ready to receive files |
| I send to device file |
It will be easy to understand for people, who will use me steps and scenarios.
But also I have some steps with parameters,
and I need to create something like two tier table:
I execute following steps 100 times:
| When I open connection |
| And device are ready to receive files |
| I send to device file with params: |
| | name | ololo | |
| | type | txt | |
| | size | 123 | |
This is the best variant in my situation.
But of cause cucumber can't parse it without errors ( it's not correct as cucumber code ).
How can I fix last example of step? (mark with bold font)
Does cucumber have some instruments, which helps me?
Can you suggest some suggest your type of solution?
Does someone have similar problems?
I decide to change symbols "|" to "/" in parameters table, which are inside.
It's not perfect, but it works:
This is scenarios steps:
I execute following steps 100 times:
| I open connection |
| device are ready to receive files |
| I send to device file with params: |
| / name / ololo / |
| / type / txt / |
| / size / 123 / |
This is step definition:
And /^I execute following steps (.*) times:$/ do |number, table|
data = table.raw.map{ |raw| raw.last }
number.to_i.times do
params = []
step_name = ''
data.each_with_index do |line,index|
next_is_not_param = data[index+1].nil? || ( data[index+1] && !data[index+1].include?('/') )
if !line.include?('/')
step_name = line
#p step_name if next_is_not_param
step step_name if next_is_not_param
else
params += [line.gsub('/','|')]
if next_is_not_param
step_table = Cucumber::Ast::Table.parse( params.join("\n"), nil, nil )
#p step_name
#p step_table
step step_name, step_table
params = []
end
end
end
#p '---------------------------------------------------------'
end
end
I have a file that include the following lines :
2 | blah | blah
1 | blah | blah
3 | blah
2 | blah | blah
1
1 | high | five
3 | five
I wanna extract only the lines that has 3 columns (3 fields, 2 seperators...)
I wanna pipe it to the following commands :
| sort -nbsk1 | cut -d "|" -f1 | uniq -d
So after all I will get only :
2
1
Any suggestions ?
It's a part of homework assignment, we are not allowed to use awk\sed and some more commands.. (grep\tr and whats written above can be used)
Thanks
since you said grep is allowed:
grep -E '^([^|]*\|){2}[^|]*$' file
grep '.*|.*|.*' will select lines with at least three fields and two separators.
newtext.csv looks like below :
Record 1
---------
line 1 line 2 Sample Number: 123456789 (line no. 3) | | | | | Time In: 2012-05-29T10:21:06Z (line no. 21) | | | Time Out: 2012-05-29T13:07:46Z (line no. 30)
Record 2
----------
line 1 line 2 Sample Number: 363214563 (line no. 3) | | | | | Time In: 2012-05-29T10:21:06Z (line no. 21) | | | Time Out: 2012-05-29T13:07:46Z (line no. 30)
Record 3
---------
line 1 line 2 Sample Number: 987654321 (line no. 3) | | | | | Time In: 2012-05-29T10:21:06Z (line no. 21) | | | Time Out: 2012-05-29T13:07:46Z (line no. 30)
Assume there are such 100 records in a newtext.csv So, now i need the parameters of the entered i/p string, which is something below
Input
Enter the search String :
123456789
Output
Sample Number is, Sample Number: 123456789
Connected Time is,Time In: 2012-05-29T10:21:06Z
Disconnected Time is, Time Out: 2012-05-29T13:07:46Z
This is what exactly i need. Can you please help me with shell scripting for the above mentioned format ?
OK, the input and the desired output are kinda weird, but it's still not difficult to get what you want, try the following:
var=123456789
awk -v "var=$var" --exec /dev/stdin newtext.csv <<'EOF'
($7 == var) {
printf("Sample Number is, Sample Number: %s\n", $7);
printf("Connected Time is, Time In: %s\n", $18);
printf("Disconnected Time is, Time Out: %s\n", $27);
}
EOF