Unix - Bash - How to split a file according to specific rules

Unix - Bash - How to split a file according to specific rules - bash

I have thousands of files on unix, that I need to split into two parts, according to following rules:
1) Find the first occurence of the string ' JOB ' in the file
2) Find the first line after the occurence found in point 1) which doesn't end with comma ','
3) Split the file after the line found in point 2)
Below is a sample file, this one should be split after the line ending with the string 'DUMMY'.
//*%OPC SCAN
//*%OPC FETCH MEMBER=$BUDGET1,PHASE=SETUP
// TESTJOB JOB USER=TESTUSER,MSGLEVEL=5,
// CLASS=H,PRIORITY=10,
// PARAM=DUMMY
//*
//STEP1 EXEC DB2OPROC
//...
How can I achieve this?
Thanks

You can use sed for this task:
$ cat data1
//*%OPC SCAN
//*%OPC FETCH MEMBER=$BUDGET1,PHASE=SETUP
// TESTJOB JOB USER=TESTUSER,MSGLEVEL=5,
// CLASS=H,PRIORITY=10,
// PARAM=DUMMY
//*
//STEP1 EXEC DB2OPROC
//...
$ sed -n '0,/JOB/ p;/JOB/,/[^,]$/ p' data1 | uniq > part1
$ sed '0,/JOB/ d;0,/[^,]$/ d' data1 > part2
$ cat part1
//*%OPC SCAN
//*%OPC FETCH MEMBER=$BUDGET1,PHASE=SETUP
// TESTJOB JOB USER=TESTUSER,MSGLEVEL=5,
// CLASS=H,PRIORITY=10,
// PARAM=DUMMY
force#force-virtual-machine:~$ cat part2
//*
//STEP1 EXEC DB2OPROC
//...
$

my solution is:
find all files to be checked;
grep each file for specified pattern with -n to get the match line if it matches;
split the matching file by head or tail with line number got in step two.
what's more, grep can handle reg expression. such as grep -n "^.*JOB.*[^,]$" filename.

You can do this in a couple of steps using awk/sed:
line=`awk '/JOB/,/[^,]$/ {x=NR} END {print x}' filename`
next=`expr $line + 1`
sed -ne "1,$line p" filename > part_1
sed -ne "$next,\$ p" filename > part_2
where filename is the name of your file. This will create two files: part_1 and part_2.

Related

replace string with exact match in bash script

I have a many repeated content as give below in a file . These are only uniq content.
CHECKSUM="Y"
CHECKSUM="N"
CHECKSUM="U"
CHECKSUM="
I want to replace empty field with "Null" and need output as :
CHECKSUM="Y"
CHECKSUM="N"
CHECKSUM="U"
CHECKSUM="Null"
What I can think of as :
#First find the matching content
cat file.txt | egrep 'CHECKSUM="Y"|CHECKSUM="N"|CHECKSUM="U"' > file_contain.txt
# Find the content where given string are not there
cat file.txt | egrep -v 'CHECKSUM="Y"|CHECKSUM="N"|CHECKSUM="U"' > file_donot_contain.txt
# Replace the string in content not found file
sed -i 's/CHECKSUM="/CHECKSUM="Null"/g' file_donot_contain.txt
# Merge the files
cat file_contain.txt file_donot_contain.txt > output.txt
But I find this is not efficient way of doing. Any other suggestion ?

To achieve this you need to mark that this is the end of the line, not just part of it, using $ (And optionally ^ to mark the start of the line too):
sed -i s'/^CHECKSUM="$/CHECKSUM="Null"/' file.txt

Reading multiple lines using read line do

OK i'm an absolute noob to this (only started trying to code a few weeks ago for my job) so please go easy on me
IM on an aix system
I have file1, file2 and file3 and they all contain 1 column of data (text or numerical).
file1
VBDSBQ_KFGP_SAPECC_PRGX_ACCNT_WKLY
VBDSBQ_KFGP_SAPECC_PRGX_ADDRM_WKLY
VBDSBQ_KFGP_SAPECC_PRGX_COND_WKLY
VBDSBQ_KFGP_SAPECC_PRGX_CUSTM_WKLY
VBDSBQ_KFGP_SAPECC_PRGX_EPOS_DLY
VBDSBQ_KFGP_SAPECC_PRGX_INVV_WKLY
file2
MCMILS03
HGAHJK05
KARNEK93
MORROT32
LAWFOK12
LEMORK82
file3
8970597895
0923875
89760684
37960473
526238495
146407
There will be the exact same amount of lines in each of these files.
I have another file called "dummy_file" which is what i want to pull out, replace parts and pop into a new file.
WORKSTATION#JOB_NAME
SCRIPTNAME "^TWSSCRIPTS^SCRIPT"
STREAMLOGON "^TWSUSER^"
-job JOB_NAME -user USER_ID -i JOB_ID
RECOVERY STOP
There are only 3 strings i care about in this file that i want replaced and they will always be the same for the dummy files i use in future
JOB_NAME
JOB_ID
USER_ID
There are 2 entries for JOB_NAME and only 1 for the others. What i want is take the raw file, replace both JOB_NAME entries with line 1 from file1 then replace USER_ID with line 1 from file 2 and then replace JOB_ID with line 1 from file3 then throw this into a new file
I want to repeat the process for all the lines in file 1, 2 and 3 so the next one will have its entries replaced by line 2 from the 3 files then next one will have its entries replaced by line 3 from the 3 files then all of line 3 from the files and so on and so on
raw file and the expected output are below:
WORKSTATION#JOB_NAME
SCRIPTNAME "^TWSSCRIPTS^SCRIPT"
STREAMLOGON "^TWSUSER^"
-job JOB_NAME -user USER_ID -i JOB_ID
RECOVERY STOP
WORKSTATION#VBDSBQ_KFGP_SAPECC_PRGX_ACCNT_WKLY
SCRIPTNAME "^TWSSCRIPTS^SCRIPT"
STREAMLOGON "^TWSUSER^"
-job VBDSBQ_KFGP_SAPECC_PRGX_ACCNT_WKLY -user MCMILS03 -i 8970597895
RECOVERY STOP
this is as far as i got (again i know its crap)
file="/dir/dir/dir/file1"
while IFS= read -r line
do
cat dummy_file | sed "s/JOB_NAME/$file1/" | sed "s/JOB_ID/$file2/" | sed "s/USER_ID/$file3" #####this is where i get stuck as i dont know how to reference file2 and file3##### >>new_file.txt
done

You really don't want a do/while loop in the shell. Just do:
awk '/^WORKSTATION/{
getline jobname < "file1";
getline user_id < "file2";
getline job_id < "file3"
}
{
gsub("JOB_NAME", jobname);
gsub("USER_ID", user_id);
gsub("JOB_ID", job_id)
}1' dummy_file

This might work for you (GNU parallel and sed):
parallel -q sed 's/JOB_NAME/{1}/;s/USER_ID/{2}/;s/JOB_ID/{3}/' templateFile >newFile :::: file1 ::::+ file2 ::::+ file3
This creates newFile by appending the templateFile for each instance of a line jointly in file1, file2 and file3.
N.B. the ::::+ operation ensures the union of lines in file1, file2 and file3 rather than the default product.

Using GNU awk (ARGIND and 2d arrays):
$ gawk '
NR==FNR { # store the template file
t=t (t==""?"":ORS) $0 # to t var
next
}
{
a[FNR][ARGIND]=$0 # store filen records to 2d array
}
END { # in the end
for(i=1;i<=FNR;i++) { # for each record stored from filen
t_out=t # make a working copy of the template
gsub(/JOB_NAME/,a[i][2],t_out) # replace with data
gsub(/USER_ID/,a[i][3],t_out)
gsub(/JOB_ID/,a[i][4],t_out)
print t_out # output
}
}' template file1 file2 file3
Output:
WORKSTATION#VBDSBQ_KFGP_SAPECC_PRGX_ACCNT_WKLY
SCRIPTNAME "^TWSSCRIPTS^SCRIPT"
STREAMLOGON "^TWSUSER^"
-job VBDSBQ_KFGP_SAPECC_PRGX_ACCNT_WKLY -user MCMILS03 -i 8970597895
RECOVERY STOP
...

Bash variant
#!/bin/bash
exec 5<file1 # create file descriptor for file with job names
exec 6<file2 # create file descriptor for file with job ids
exec 7<file3 # create file descriptor for file with user ids
dummy=$(cat dummy_txt) # load dummy text
output () { # create output by inserting new values in a copy of dummy var
out=${dummy//JOB_NAME/$JOB_NAME}
out=${out//USER_ID/$USER_ID}
out=${out//JOB_ID/$JOB_ID}
printf "\n$out\n"
}
while read -u5 JOB_NAME; do # this will read from all files and print output
read -u6 JOB_id
read -u7 USER_ID
output
done
From read help
$ read --help
...
-u fd read from file descriptor FD instead of the standard input
...
And a variant with paste
#!/bin/bash
dummy=$(cat dummy)
while read JOB_NAME JOB_id USER_ID; do
out=${dummy//JOB_NAME/$JOB_NAME}
out=${out//USER_ID/$USER_ID}
out=${out//JOB_ID/$JOB_ID}
printf "\n$out\n"
done < <(paste file1 file2 file3)

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.

You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end

Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Count lines following a pattern from file

For example I have a file test.json that contains a series of line containing:
header{...}
body{...}
other text (as others)
empty lines
I wanted to run a script that returns the following
Counted started on : test.json
- headers : 4
- body : 5
- <others>
Counted finished : <time elapsed>
What I got so far is this.
count_file() {
echo "Counted started on : $1"
#TODO loop
cat $1 | grep header | wc -l
cat $1 | grep body | wc -l
#others
echo "Counted finished : " #TODO timeElapsed
}
Edit:
Edit question and added code snippet

Perl on Command Line
perl -E '$match=$ARGV[1];open(Input, "<", $ARGV[0]);while(<Input>){ ++$n if /$match/g } say $match," ",$n;' your-file your-pattern
For me
perl -E '$match=$ARGV[1];open(Input, "<", $ARGV[0]);while(<Input>){ ++$n if /$match/g } say $match," ",$n;' parsing_command_line.pl my
It counts how many number of pattern my are, in my script parsing_command_line.pl
output
my 3
For you
perl -E '$match=$ARGV[1];open(Input, "<", $ARGV[0]);while(<Input>){ ++$n if /$match/g } say $match," ",$n;' test.json headers
NOTE
Write all code in one line on your command prompt
First argument is your file
Second is your pattern
This is not a complete code since you have to enter all your pattern one-by-one

You can capture the result of a commande in a variable, like:
result=`cat $1 | grep header | wc -l`
and then print the result:
echo "# headers : $b"
` is the eval operator that let replace the whole expression by the output of the command inside.

How to append a line after a search result?

So I grep for something in some file:
grep "import" test.txt | tail -1
In test.txt there is
import-one
import-two
import-three
some other stuff in the file
This will return the last search result:
import-three
Now how do I add some text -after-- import-three but before "some other stuff in the file". Basically I want to append a line but not at the end of a file but after a search result.

I understand that you want some text after each search result, which would mean after every matching line. So try
grep "import" test.txt | sed '/$/ a\Line to be added'

You can try something like this with sed
sed '/import-three/ a\
> Line to be added' t
Test:
$ sed '/import-three/ a\
> Line to be added' t
import-one
import-two
import-three
Line to be added
some other stuff in the file

One way assuming that you cannot distingish between different "import" sentences. It reverses the file with tac, then find the first match (import-three) with sed, insert a line just before it (i\) and reverse again the file.
The :a ; n ; ba is a loop to avoid processing again the /import/ match.
The command is written throught several lines because the sed insert command is very special with the syntax:
$ tac infile | sed '/import/ { i\
"some text"
:a
n
ba }
' | tac -
It yields:
import-one
import-two
import-three
"some text"
some other stuff in the file

Using ed:
ed test.txt <<END
$
?^import
a
inserted text
.
w
q
END
Meaning: go to the end of the file, search backwards for the first line beginning with import, add the new lines below (insertion ends with a "." line), save and quit

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unix - Bash - How to split a file according to specific rules - bash

my solution is: find all files to be checked; grep each file for specified pattern with -n to get the match line if it matches; split the matching file by head or tail with line number got in step two. what's more, grep can handle reg expression. such as grep -n "^.JOB.[^,]$" filename.

You can do this in a couple of steps using awk/sed: line=`awk '/JOB/,/[^,]$/ {x=NR} END {print x}' filename` next=`expr $line + 1` sed -ne "1,$line p" filename > part_1 sed -ne "$next,\$ p" filename > part_2 where filename is the name of your file. This will create two files: part_1 and part_2.

Related

replace string with exact match in bash script

Reading multiple lines using read line do

How to process tr across all files in a directory and output to a different name in another directory?

Count lines following a pattern from file

How to append a line after a search result?

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unix - Bash - How to split a file according to specific rules - bash

my solution is: find all files to be checked; grep each file for specified pattern with -n to get the match line if it matches; split the matching file by head or tail with line number got in step two. what's more, grep can handle reg expression. such as grep -n "^.*JOB.*[^,]$" filename.

You can do this in a couple of steps using awk/sed: line=`awk '/JOB/,/[^,]$/ {x=NR} END {print x}' filename` next=`expr $line + 1` sed -ne "1,$line p" filename > part_1 sed -ne "$next,\$ p" filename > part_2 where filename is the name of your file. This will create two files: part_1 and part_2.

Related

replace string with exact match in bash script

Reading multiple lines using read line do

How to process tr across all files in a directory and output to a different name in another directory?

Count lines following a pattern from file

How to append a line after a search result?

Categories

Resources

my solution is: find all files to be checked; grep each file for specified pattern with -n to get the match line if it matches; split the matching file by head or tail with line number got in step two. what's more, grep can handle reg expression. such as grep -n "^.JOB.[^,]$" filename.