Parse a two values from the file - bash

Part of my file looks like so:
STATUS REPORT FOR JOB: Job_logging
Generated: 2014-03-14 07:05:03
Job start time=2014-03-13 06:37:49
Job end time=2014-03-13 06:37:51
Job elapsed time=00:00:02
Job status=1 (Finished OK)
Stage: Oracle_Connector_0, 1 rows input
Stage start time=2014-03-13 06:37:51, end time=2014-03-13 06:37:51, elapsed=00:00:00
Link: DSLink2, 1 rows
Stage: Peek_3, 1 rows input
Stage start time=2014-03-13 06:37:51, end time=2014-03-13 06:37:51, elapsed=00:00:00
Status code = 0
Link: DSLink2, 1 rows
I need to extract values that stand for Job start time and Job end time
So i need 2014-03-13 06:37:49 and 2014-03-13 06:37:51 to be saved into two separate variables: v1 and v2.
How do I do that using BASH?
I've already killed about an hour playing with strings concatanation and sed but still got nothing.
Little help, please?

Using awk it can be found in single line:
awk -F 'Job (start|end) time=' 'NF>1{print $2}' file
2014-03-13 06:37:49
2014-03-13 06:37:51
To read both values in variables:
IFS=';' && read v1 v2 < <(awk -F 'Job (start|end) time=' 'NF>1{printf "%s;", $2}' file)

You can use grep for this:
$ grep -Po '(?<=Job start time=).*' file
2014-03-13 06:37:49
$ grep -Po '(?<=Job end time=).*' file
2014-03-13 06:37:51
It used a look-behind that checks what comes after Job start/end time= in the given file.
And to store into a variable, use
$ var=$(grep -Po '(?<=Job end time=).*' file)
$ echo "$var"
2014-03-13 06:37:51

This is simple and easy to read:
grep "Job start time" test.txt | cut -d"=" -f2
grep "Job end time" test.txt | cut -d"=" -f2
This searches for the lines containing your specific string, sets the delimeter as = between the two, and takes the field on the right of it.

With sed:
v1=$(sed -rn 's/.*Job start time=(.*)/\1/p' yourfile)
v2=$(sed -rn 's/.*Job end time=(.*)/\1/p' yourfile)

eval `sed -n 's/^ *Job start time=\(.*\)/v1="\1"/p
s/^ *Job end time=\(.*\)/v2="\1"/p' YourFile`
tested on aix/bash (so no GNU sed)

Related

Alternating output in bash for loop from two grep

I'm trying to search through files and extract two pieces of relevant information every time they appear in the file. The code I currently have:
#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
reads=$(grep $str1 $file | cut -d ':' -f 3
samples=$(grep $str2 $file | cut -d '/' -f 8
echo $samples $reads >> reads.txt
done
It is doing each line for the file (the files have varying numbers of instances of these phrases) and gives me the output per row for each file:
PopA_15.fq 1081264
PopA_16.fq PopA_17.fq 1008416 554791
PopA_18.fq PopA_20.fq PopA_21.fq 604610 531227 595129
...
I want it to match each instance (i.e. 1st instance of both greps next two each other):
PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
...
How do I do this? Thank you
Considering that your Input_file is same as sample shown and number of columns are even on each line with 1 PopA value and other will be with digit values. Following awk may help you in same.
awk '{for(i=1;i<=(NF/2);i++){print $i,$((NF/2)+i)}}' Input_file
Output will be as follows.
PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
In case you want to pass output of a command to awk command then you could do like your command | awk command... no need to add Input_file to above awk command.
This is what ended up working for me...any tips for more efficient code are definitely welcome
#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
reads=$(grep $str1 $file | cut -d ':' -f 3)
samples=$(grep $str2 $file | cut -d '/' -f 8)
paste <(echo "$samples" | column -t) <(echo "$reads" | column -t) >> reads.txt
done
This provides the desired output described above.

Extracting a part of lines matching a pattern

I have a configuration file and need to parse out some values using bash
Ex. Inside config.txt
some_var= Not_needed
tests= spec1.rb spec2.rb spec3.rb
some_other_var= Also_not_needed
Basically I just need to get "spec1.rb spec2.rb spec3.rb" WITHOUT all the other lines and "tests=" removed from the line.
I have this and it works, but I'm hoping there's a much more simple way to do this.
while read run_line; do
if [[ $run_line =~ ^tests=* ]]; then
echo "FOUND"
all_selected_specs=`echo ${run_line} | sed 's/^tests= /''/'`
fi
done <${config_file}
echo "${all_selected_specs}"
all_selected_specs=$(awk -F '= ' '$1=="tests" {print $2}' "$config_file")
Using a field separator of "= ", look for lines where the first field is tests and print the second field.
This should work too
grep "^tests" ${config_file} | sed -e "s/^tests= //"
How about grep and cut?
all_selected_specs=$(grep "^tests=" "$config_file" | cut -d= -f2-)
try:
all_selected_specs=$(awk '/^tests/{sub(/.*= /,"");print}' Input_file)
searching for string tests which comes in starting of a line then substituting that line's all values till (= ) to get all spec values, once it is substituted then we are good to get the spec values so printing that line. Finally saving it's value to variable with $(awk...).

Grepping a specific string from a file in script

I have following file:(A sample file with filename: 2015_09_22_processedPartnumList.txt, Location: /a/b/c/itemreport)
DataLoader_trace_2015_09_22_02_01_32.0956.log:INFO: 2015-09-22
Data Processing Starts : 12345678
I just want to get all the ids from the above file i.e. 12345678 .... (each id in a separate line, not comma separated) in a file /a/b/c/d/ids_date +%d_%m_%Y_%H_%M_%S.log
I have written the following script, but the file I am getting is empty. Without showing any exception or anything. So, it is very difficult for me to identify the errors. Please tell me what is wrong in the script.
LOGDIR=/a/b/logdir
tr=`date +%p`
echo $tr
if [ $tr = "PM" ];
then
date=`date +%Y-%m-%d`
echo "considering today's date for grepping logs"
else
date=`date -d '1 day ago' +%Y-%m-%d`
echo "considering yesterday's date for grepping logs as job run is delayed"
fi
ITEM_FILE=/a/b/c/d/ids_`date +%d_%m_%Y_%H_%M_%S`.log
After implementing grep in PCRE, I am getting this and not any ids are being copied into the new file.
If your grep supports PCRE, you can do:
grep -Po '.*:\s\K\d+$' /a/b/c/itemreport/2015_09_22_processedPartnumList.txt \
>/apps/feeds/out/catalog/ItemPartnumbers_"$(date '+%d_%m_%Y_%H_%M_%S')".log
.*:\s will match upto the space after :, \K will discard the match
\d+$ will match our desired portion i.e. the digits till the end of the line
Example:
% grep -Po '.*:\s\K\d+$' 2015_09_22_processedPartnumList.txt \
>ItemPartnumbers_"$(date '+%d_%m_%Y_%H_%M_%S')".log
% cat ItemPartnumbers_09_11_2015_11_30_49.log
13982787
14011550
13984790
13984791
14176509
14902623
14924193
14924194
13982787
46795670
46795671
That's not very good solution, but it's working.
cat your\ file | cut -d ':' -f2-2 | tr -d INFO

KSH: Loop performance

I need to process a file with approximately 120k lines that has the following format using ksh:
"[UserId=USER1]";"Client=001";"Locked_Status=0";"TYPE=A";"Last_Logon=00000000";"Valid_To=99991231";"Password_Change=20120131";"Last_Password_Change=29990"
"[UserId=USER2]";"Client=000";"Locked_Status=0";"TYPE=A";"Last_Logon=20141020";"Valid_To=00000000";"Password_Change=20140620";"Last_Password_Change=9501"
"[UserId=USER3]";"Client=002";"Locked_Status=0";"TYPE=A";"Last_Logon=00000000";"Valid_To=99991231";"Password_Change=20140304";"Last_Password_Change=9817"
The output should be something like:
[UserId=USER1] Client=001
Locked_Status=0
TYPE=A
Last_Logon=00000000
Valid_To=99991231
Password_Change=20120131
Last_Password_Change=29985
[UserId=USER2]
Client=000
Locked_Status=0
TYPE=A
Last_Logon=20141020
Valid_To=00000000
Password_Change=20140620
Last_Password_Change=9496
[UserId=User3]
Client=002
Locked_Status=0
TYPE=A
Last_Logon=00000000
Valid_To=99991231
Password_Change=20140304
Last_Password_Change=9812
I initially used the following code do process the file:
for a in $(<$1)
do
a=$(echo $a|sed -e 's/;/ /g' -e 's/"//g')
for b in $a
do
print $b
done
done
It was taking around 3hrs to process 120k lines.
Then I tried to improved the code changing it to the following:
for a in $(<$1)
do
printf "\n$(echo $a|sed -e 's/"//g' -e 's/;/\\n/g')"
done
That gave me 2hrs processing time however it still takes too long to process 120k lines
At last I tried this code which processed the 120k lines in 3secs!
perl -ne '
chomp;
s/\"//g;
s/;/\n/g;
print;
' <$1
Is there anyway I can improve the code in KSH to achieve similar performance? I believe that I must be missing something in my KSH code... Help me to find out please.
Thanks in advance
How about:
tr ';' '\n' < file | tr -d '"'
Your code is assigning every whitespace-delimited word to variable "a" in turn, and thus invoking sed once for each word in the file. Clearly a lot of accumulated overhead spawning all those processes. The idiom to iterate over the lines of a file is:
while IFS= read -r line; do ...; done < file
You suggestion worked perfectly
Host> wc -l /tmp/MyTest
114449 /tmp/MyTest
Host> time tr ';' '\n' < /tmp/MyTest | tr -d '"' > /tmp/zuza.out
real 0m1.04s
user 0m1.06s
sys 0m0.08s
Host> time perl -ne '
chomp;
s/\"//g;
s/;/\n/g;
print "\n$_";
' </tmp/MyTest > /tmp/zuza
real 0m1.30s
user 0m0.60s
sys 0m0.08s

How to use sed to extract a string [duplicate]

This question already has answers here:
BASH extract value after string in variable Not file [duplicate]
(2 answers)
Closed last year.
I need to extract a number from the output of a command: cmd. The output is type: 1000
So my question is how to execute the command, store its output in a variable and extract 1000 in a shell script. Also how do you store the extracted string in a variable?
This question has been answered in pieces here before, it would be something like this:
line=$(sed -n '2p' myfile)
echo "$line"
if [ `echo $line || grep 'type: 1000' ` ] then;
echo "It's there!";
fi;
Store output of sed into a variable
String contains in Bash
EDIT: sed is very limited, you would need to use bash, perl or awk for what you need.
This is a typical use case for grep:
output=$(cmd | grep -o '[0-9]\+')
You can write the output of a command or even a pipeline of commands into a shell variable using so called command substitution:
variable=$(cmd);
In comments it appeared that the output of cmd contains more lines than the type : 1000. In this case I would suggest sed:
output=$(cmd | sed -n 's/type : \([0-9]\+\)/\1/p;q')
You tagged your question as sed but your question description does not restrict other tools, so here's a solution using awk.
output = `cmd | awk -F':' '/type: [0-9]+/{print $2}'`
Alternatively, you can use the newer $( ) syntax. Some find the newer syntax preferable and it can be conveniently nested, without the need for escaping backtics.
output = $(cmd | awk -F':' '/type: [0-9]+/{print $2}')
If the output is rigidly restricted to "type: " followed by a number, you can just use cut.
var=$(echo 'type: 1000' | cut -f 2 -d ' ')
Obviously you'll have to pipe the output of your command to cut, I'm using echo as a demo.
In addition, I'd use grep and then cut if the string you are searching is more complex. If we assume there can be all kind of numbers in the text, but only one occurrence of "type: " followed by a number, you can use the command:
>> var=$(echo "hello 12 type: 1000 foo 1001" | grep -oE "type: [0-9]+" | cut -f 2 -d ' ')
>> echo $var
1000
You can use the | operator to send the output of one command to another, like so:
echo " 1\n 2\n 3\n" | grep "2"
This sends the string " 1\n 2\n 3\n" to the grep command, which will search for the line containing 2. It sound like you might want to do something like:
cmd | grep "type"
Here is a plain sed solution that uses a regualar expression to find the number in your string:
cmd | sed 's/^.*type: \([0-9]\+\)/\1/g'
^ means from the start
.* can be any character (also none)
\([0-9]\+\) are numbers (minimum one character)
\1 means it takes the first pattern it finds (and only in this case) and uses it as replacement for the whole string

Resources