I want to add the header at the start of the file for that I use the following code but it will not add the header can please help where i am doing wrong.
start_MERGE_JobRec()
{
FindBatchNumber
export TEMP_SP_FORMAT="Temp_${file_indicator}_SP_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_SS_FORMAT="Temp_${file_indicator}_SS_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_SG_FORMAT="Temp_${file_indicator}_SG_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_GS_FORMAT="Temp_${file_indicator}_GS_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export SP_OUTPUT_FILE="RTBCON_${file_indicator}_SP_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export SS_OUTPUT_FILE="RTBCON_${file_indicator}_SS_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export SG_OUTPUT_FILE="RTBCON_${file_indicator}_SG_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export GS_OUTPUT_FILE="RTBCON_${file_indicator}_GS_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
#---------------------------------------------------
# Add header at the start for each file
#---------------------------------------------------
awk '{print "recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason"$0}' >> $SP_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SP_FORMAT}" -exec cat {} + >> $SP_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SS_FORMAT}" -exec cat {} + >> $SS_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SG_FORMAT}" -exec cat {} + >> $SG_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_GS_FORMAT}" -exec cat {} + >> $GS_OUTPUT_FILE
}
I use awk to add the header but it's not working.
Awk requires an input file before it will print anything.
A common way to force Awk to print something even when there is no input is to put the print statement in a BEGIN block;
awk 'BEGIN { print "something" }' /dev/null
but if you want to prepend a header to all the output files, I don't see why you are using Awk here at all, let alone printing the header in front of every output line. Are you looking for this, instead?
echo 'recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason' |
tee "$SS_OUTPUT_FILE" "$SG_OUTPUT_FILE" "$GS_OUTPUT_FILE" >"$SP_OUTPUT_FILE"
Notice also how we generally always quote shell variables unless we specifically want the shell to perform word splitting and wildcard expansion on their values, and avoid upper case for private variables.
There also does not seem to be any reason to export your variables -- neither Awk nor find pays any attention to them, and there are no other processes here. The purpose of export is to make a variable visible to the environment of subprocesses. You might want to declare them as local, though.
Perhaps break out a second function to avoid all this code repetition, anyway?
merge_individual_job() {
echo 'recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason'
find . -maxdepth 1 -type f -name "$1" -exec cat {} +
}
start_MERGE_JobRec()
{
FindBatchNumber
local id
for id in SP SS SG GS; do
merge_individual_job \
"Temp_${file_indicator}_${id}_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt" \
>"RTBCON_${file_indicator}_${id}_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
done
}
If FindBatchNumber sets the variable file_indicator, a more idiomatic and less error-prone approach is to have it just echo it, and have the caller assign it:
file_indicator=$(FindBatchNumber)
Related
I am iterating over files like so:
find $directory -type f -exec codesign {} \;
Now the problem here is that files on a higher hierarchy are signed first.
Is there a way to iterate over a directory tree and handle the deepest files first?
So that
/My/path/to/app/bin
is handled before
/My/path/mainbin
Yes, just use -depth:
-depth
The primary shall always evaluate as true; it shall cause descent of the directory hierarchy to be done so that all entries in a directory are acted on before the directory itself. If a -depth primary is not specified, all entries in a directory shall be acted on after the directory itself. If any -depth primary is specified, it shall apply to the entire expression even if the -depth primary would not normally be evaluated.
For example:
$ mkdir -p top/a/b/c/d/e/f/g/h
$ find top -print
top
top/a
top/a/b
top/a/b/c
top/a/b/c/d
top/a/b/c/d/e
top/a/b/c/d/e/f
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f/g/h
$ find top -depth -print
top/a/b/c/d/e/f/g/h
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f
top/a/b/c/d/e
top/a/b/c/d
top/a/b/c
top/a/b
top/a
top
Note that at a particular level, ordering is still arbitrary.
Using GNU utilities, and decorate-sort-undecorate pattern (aka Schwartzian transform):
find . -type f -printf '%d %p\0' |
sort -znr |
sed -z 's/[0-9]* //' |
xargs -0 -I# echo codesign #
Drop the echo if the output looks ok.
Using find's -depth option as my other answer, or naive sort as some others, only ensures that sub-directories of a directory are processed before the directory itself, but not that the deepest level is processed first.
For example:
$ mkdir -p top/a/b/d/f/h top/a/c/e/g
$ find top -depth -print
top/a/c/e/g
top/a/c/e
top/a/c
top/a/b/d/f/h
top/a/b/d/f
top/a/b/d
top/a/b
top/a
top
For overall deepest level to be processed first, the ordering should be something like:
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
To determine this ordering, the entire list must be known, and then the number of levels (ie. /) of each path counted to enable ranking.
A simple-ish Perl script (assigned to a shell function for this example) to do this ordering is:
$ dsort(){
perl -ne '
BEGIN { $/ = "\0" } # null-delimited i/o
$fname[$.] = $_;
$depth[$.] = tr|/||;
END {
print
map { $fname[$_] }
sort { $depth[$b] <=> $depth[$a] }
keys #fname
}
'
}
Then:
$ find top -print0 | dsort | xargs -0 -I# echo #
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
How about sorting the output of find in descending order:
while IFS= read -d "" -r f; do
codesign "$f"
done < <(find "$directory" -type f -print0 | sort -zr)
<(command ..) is a process substitution which feeds the output
of the command to the read command in while loop via the redirect.
-print0, sort -z and read -d "" combo uses a null character
as a file delimiter. It is useful to protect filenames which include
special characters such as whitespace.
I don't know if there is a native way in find, but you may pipe the output of it into a loop and process it line by line as you wish this way:
find . | while read file; do echo filename: "$file"; done
In your case, if you are happy just reversing the output of find, you may go with something like:
find $directory -type f | tac | while read file; do codesign "$file"; done
I am wondering if there is a way to search for all the files from a certain directory including subdirectories using a find command on AIX 6.x, before calling an external command (e.g. hlcat) to display/convert them into a readable format, which can then be piped through a grep command to find a pattern instead of using loops in the shell?
e.g. find . -type f -name “*.hl7” -exec hlcat {} | grep -l “pattern” \;
The above command would not work and I have to use a while loop to display the content and search for the pattern as follows:
find . -type f -name “*.hl7” -print | while read file; do
hlcat $file | grep -l “pattern”;
done
At the same time, these HL7 files have been renamed with round brackets which prevent them from being open without having to include double quotes around the file name.
e.g. hlcat (patient) filename.hl7 will fail to open.
hlcat “(patient) filename.hl7” will work.
In short, I am looking for a clean concise one-liner approach within the find command and view and search their content these HL7 files with round bracket names.
Many thanks,
George
P.S. HL7 raw data is made up of one continuous line and is not readable unless it is converted into a workable reading format using tools such as hlcat.
in
Update: The easy way
find . -type f -name '*.hl7' -exec grep -iEl 'Barry|Jolene' {} +
note: You may get some false positives though. See below for a targeted search.
Searching for a first name in a bunch of HL7v2 files:
1. Looking into the HL7v2 file format
Example of HL7v2 PID segment:
PID|||56782445^^^UAReg^PI||KLEINSAMPLE^BARRY^Q^JR||19620910|M|||
PID Segment decomposition:
Seq
NAME
HHIC USE
LEN
0
PID keyword
Segment Type
3
3
Patient ID
Medical Record Num
250
5
Patient Name
Last^First^Middle
250
7
Date/Time Of Birth
YYYYMMDD
26
8
Sex
F, M, or U
1
2. Writing targeted searches
With grep (AIX):
find . -type f -name '*.hl7' -exec grep -iEl '^PID\|([^|]*\|){4}[^^|]*\^(Barry|Jolene)\^' {} +
With awk:
find . -type f -name '*.hl7' -exec awk -v firstname='^(Barry|Jolene)$' '
BEGIN { FS="|" }
FNR == 1 { if( found ) print filename; found = 0; filename = FILENAME }
$1 == "PID" { split($6, name, "^"); if (toupper(name[2]) ~ toupper(firstname)) { found = 1 } }
END { if ( found ) print filename }
' {} +
remark: The good part about this awk solution is that you pass the first name regexp as an argument. This solution is easily extendable, for example for searching the last name.
I am trying to create a script that will, within a for loop, identify pairs of files in a directory and then perform a function on each pair. The paired files are named such as FILENAME_1.fastq and FILENAME_2.fastq and there are multiple pairs within the directory. Here are some actual filenames in case it matters for regex functions:
WT1_0min-SRR9929263_1.fastq
WT1_0min-SRR9929263_2.fastq
WT1_20min-SRR9929265_1.fastq
WT1_20min-SRR9929265_2.fastq
WT3_20min-SRR12062597_1.fastq
WT3_20min-SRR12062597_2.fastq
Is it possible to do this without feeding any information about the filename, other than that it has a pair? I am absolutely awful with regex and name-search functions, but below is my latest failed attempt.
cd ~/Directory
for file in *.fastq
do
sample=`basename ${file}` #I think needs a modification to subtract the _1 or _2 and then a search function to find the paired files
myfunction \
-1 ${sample}_1.fastq \
-2 ${sample}_2.fastq \
done
Thanks for any help. Been stuck for 2 days x_x
UPDATE
Please see this new post for answers on how to adapt the xarg answer for use with a for loop.
to account for the scenario not every file is fully paired , try
file . -depth 1 type f -not -name ".*" | \
\
gawk 'BEGIN { FS="_"; } { $ 0 = gensub(/^.+\/([^\/]+)$/ , "\\1", "1"); }
{ inL[$1$2][substr($3,1,1)] = $0 ; }
END { OFS = ORS = "\0";
for (pfx in inL) {
if (1 in inL[pfx]) && (2 in inL[pfx]) && \
(length(inL[pfx])==2)
{ print inL[pfx][1], inL[pfx][2]; } } }' | \
\
parallel -0 -N 2 -j 1 myfunction -1 '{}' -2 '{}' ;
gnu parallel allows for functions to be exported. this version of the code will use gawk to handle both basename and print0 functions. This also ensures ONLY files with exact 1+2 parings will be shown in the end, in case there are ones with only one of the 2, or some files with even a "_3.fastq", shall your need to expand into such a realm.
Use find and xargs, and replace echo with the command of your choice:
find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq
In bioinformatics we use this all the time when having a paired end run:
parallel --plus echo {} {/_1.fastq/_2.fastq} ::: *_1.fastq
or:
parallel echo {} {=s/_1.fastq/_2.fastq/=} ::: *_1.fastq
I need a command that will help me accomplish what I am trying to do. At the moment, I am looking for all the ".html" files in a given directory, and seeing which ones contain the string "jacketprice" in any of them.
Is there a way to do this? And also, for the second (but separate) command, I will need a way to replace every instance of "jacketprice" with "coatprice", all in one command or script. If this is feasible feel free to let me know. Thanks
find . -name "*.html" -exec grep -l jacketprice {} \;
for i in `find . -name "*.html"`
do
sed -i "s/jacketprice/coatprice/g" $i
done
As for the second question,
find . -name "*.html" -exec sed -i "s/jacketprice/coatprice/g" {} \;
Use recursive grep to search through your files:
grep -r --include="*.html" jacketprice /my/dir
Alternatively turn on bash's globstar feature (if you haven't already), which allows you to use **/ to match directories and sub-directories.
$ shopt -s globstar
$ cd /my/dir
$ grep jacketprice **/*.html
$ sed -i 's/jacketprice/coatprice/g' **/*.html
Depending on whether you want this recursively or not, perl is a good option:
Find, non-recursive:
perl -nwe 'print "Found $_ in file $ARGV\n" if /jacketprice/' *.html
Will print the line where the match is found, followed by the file name. Can get a bit verbose.
Replace, non-recursive:
perl -pi.bak -we 's/jacketprice/coatprice/g' *.html
Will store original with .bak extension tacked on.
Find, recursive:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
say $ARGV if /jacketprice/'
It will print the file name for each match. Somewhat less verbose might be:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
$found{$ARGV}++ if /jacketprice/; END { say for keys %found }'
Replace, recursive:
perl -MFile::Find -pi.bak -we '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
s/jacketprice/coatprice/g'
Note: In all recursive versions, /dir is the bottom level directory you wish to search. Also, if your perl version is less than 5.10, say can be replaced with print followed by newline, e.g. print "$_\n" for keys %found.
I have thousands of mp3s inside a complex folder structure which resides within a single folder. I would like to move all the mp3s into a single directory with no subfolders. I can think of a variety of ways of doing this using the find command but one problem will be duplicate file names. I don't want to replace files since I often have multiple versions of a same song. Auto-rename would be best. I don't really care how the files are renamed.
Does anyone know a simple and safe way of doing this?
You could change a a/b/c.mp3 path into a - b - c.mp3 after copying. Here's a solution in Bash:
find srcdir -name '*.mp3' -printf '%P\n' |
while read i; do
j="${i//\// - }"
cp -v "srcdir/$i" "dstdir/$j"
done
And in a shell without ${//} substitution:
find srcdir -name '*.mp3' -printf '%P\n' |
sed -e 'p;s:/: - :g' |
while read i; do
read j
cp -v "srcdir/$i" "dstdir/$j"
done
For a different scheme, GNU's cp and mv can make numbered backups instead of overwriting -- see -b/--backup[=CONTROL] in the man pages.
find srcdir -name '*.mp3' -exec cp -v --backup=numbered {} dstdir/ \;
bash like pseudocode:
for i in `find . -name "*.mp3"`; do
NEW_NAME = `basename $i`
X=0
while ! -f move_to_dir/$NEW_NAME
NEW_NAME = $NEW_NAME + incr $X
mv $i $NEW_NAME
done
#!/bin/bash
NEW_DIR=/tmp/new/
IFS="
"; for a in `find . -type f `
do
echo "$a"
new_name="`basename $a`"
while test -e "$NEW_DIR/$new_name"
do
new_name="${new_name}_"
done
cp "$a" "$NEW_DIR/$new_name"
done
I'd tend to do this in a simple script rather than try to fit in in a single command line.
For instance, in python, it would be relatively trivial to do a walk() through the directory, copying each mp3 file found to a different directory with an automatically incremented number.
If you want to get fancier, you could have a dictionary of existing file names, and simply append a number to the duplicates. (the index of the dictionary being the file name, and the value being the number of files found so far, which would become your suffix)
find /path/to/mp3s -name *.mp3 -exec mv \{\} /path/to/target/dir \;
At the risk of many downvotes, a perl script could be written in short time to accomplish this.
Pseudocode:
while (-e filename)
change filename to filename . "1";
In python: to actually move the file, change debug=False
import os, re
from_dir="/from/dir"
to_dir = "/target/dir"
re_ext = "\.mp3"
debug = True
w = os.walk(from_dir)
n = w.next()
while n:
d, arg, names = n
names = filter(lambda fn: re.match(".*(%s)$"%re_ext, fn, re.I) , names)
n = w.next()
for fn in names:
from_fn = os.path.join(d,fn)
target_fn = os.path.join(to_dir, fn)
file_exists = os.path.exists(target_fn)
if not debug:
if not file_exists:
os.rename(from_fn, target_fn)
else:
print "DO NOT MOVE - FILE EXISTS ", from_fn
else:
print "MOVE ", from_fn, " TO " , target_fn
Since you don't care how the duplicate files are named, utilize the 'backup' option on move:
find /path/to/mp3s -name *.mp3 -exec mv --backup=numbered {} /path/to/target/dir \;
Will get you:
song.mp3
song.mp3.~1~
song.mp3.~2~