I'm trying to use awk to create sub-directories in a directory (which is always the last line of file1, each block separated by an empty line), if the number in line 2 (always the first 6 digits in the format xx-xxxx) of file2 is found in $2 of file1.
The directory will already be created in /path/to/directory. In the example below, Directory2_2 already exists in /path/to/directory and since 19-0003, 19-0004 and 19-0005 are found in $2 of file1, they are moved to Directory2_2.
file1
xxxx_006 19-0000_Lname-yyyy-zzzzz
xxxx_007 19-0001_Lname-yyyy-zzzzz
Directory1_1
xxxx_008 19-0003_Lname-yyyy-zzzzz
xxxx_009 19-0004_Lname-yyyy-zzzzz
xxxx_020 19-0005_Lname-yyyy-zzzzz
Directory2_2
file2
xxxx
19-0003-xxx-xxx-xxx_000-111
yyyy
xxxx
19-0004-xxx-xxx-xxx_000-111
yyyy
xxxx
19-0005-xxx-xxx-xxx_000-111
yyyy
awk in bash for loop
for f in $(awk { print cut -d'_' -f1 }' file2); do
[[ "$f" == $2 ]] && mkdir -p "$f" /path/to/directory
done
desired output
Directory2_2
19-0003-xxx-xxx-xxx_000-111
19-0004-xxx-xxx-xxx_000-111
19-0005-xxx-xxx-xxx_000-111
If the directory names don't contain spaces (below file1 processed in paragraph-mode and file2 in line-mode.):
awk 'NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next }
{ k = substr($0, 1, 7) }
k in a { cmd = sprintf("mkdir -p %s/%s", a[k], $0); print(cmd); }
' RS= file1 RS='\n' file2
#mkdir -p Directory2_2/19-0003-xxx-xxx-xxx_000-111
#mkdir -p Directory2_2/19-0004-xxx-xxx-xxx_000-111
#mkdir -p Directory2_2/19-0005-xxx-xxx-xxx_000-111
change print(cmd) to system(cmd) to actually run the command.
Note: if the directory names contain spaces, you might need to setup IFS='\n' in order to use $NF for the base directory in file1:
awk 'NR==FNR { for(i=1; i<NF; i++) a[substr($i,index($i," ")+1,7)] = $NF; next }
{ k = substr($0, 1, 7) }
k in a { cmd = sprintf("mkdir -p \"%s\"/\"%s\"", a[k], $0); print(cmd); }
' FS='\n' RS= file1 RS='\n' file2
Could you please try following.
awk -v path_val="/your_path/" '
FNR==NR{
if($0 ~ /^[0-9]+/){
a[substr($0,1,7)]=$0
}
next
}
/^Directory/{
if(count==value){
print "Directory " $0 " all elements are present." ORS "Going to write shell script code now..."
print $0 ORS val
print "*************************************************"
print "if [[ -d " path_val $0 " ]]" ORS "then" ORS\
" cd " path_val $0 ORS " mkdir " val ORS\
" if [[ $? -eq 0 ]]" ORS " then" ORS \
" echo " s1 "Directories named "\
val s1 " created successfully in path " path_val\
"." s1 ORS " else" ORS " echo " s1\
"kindly check from your end once seems directories not created." s1\
ORS " fi" ORS "else" ORS " echo " s1\
"Please check seems base directory " path_val " NOT present itself."\
s1 ORS "fi"
}
count=val=value=""
}
($3 in a){
val=(val?val OFS a[$3]:a[$3])
count++
}
/^xxx/{
value++
}' Input_file2 FS="[ _]" Input_file1
Explanation what code does is:
1- Code has variable named /your_path/ which is your BASE path where directories will be created etc.
2- It will check if all the lines coming before Directory_...(para by para) keyword from Input_file1 is present in Input_file2 if yes, then it will print output of those lines along with directory name + it will write code on console too(bash code which checks about your base directory path and then creates the matched directories inside base directory). As of now I am simply printing it you could either take it to .ksh file(as an output file) and could run it OR you could add | bash at the end of this code. I haven't tested it I leave this up to OP.
Following will be the output:
Directory Directory2_2 all elements are present.
Going to write shell script code now...
Directory2_2
19-0003-xxx-xxx-xxx_000-111 19-0004-xxx-xxx-xxx_000-111 19-0005-xxx-xxx-xxx_000-111
*************************************************
if [[ -d /your_path/Directory2_2 ]]
then
cd /your_path/Directory2_2
mkdir 19-0003-xxx-xxx-xxx_000-111 19-0004-xxx-xxx-xxx_000-111 19-0005-xxx-xxx-xxx_000-111
if [[ $? -eq 0 ]]
then
echo Directories named 19-0003-xxx-xxx-xxx_000-111 19-0004-xxx-xxx-xxx_000-111 19-0005-xxx-xxx-xxx_000-111 created successfully in path /your_path/.
else
echo kindly check from your end once seems directories not created.
fi
else
echo Please check seems base directory /your_path/ NOT present itself.
fi
PS: As mentioned above take final shell code which should create directories in system either in output_file or run it by using | bash etc at end of awk code I haven't tested it. Please DO NOT run code without testing. Statements are very simply you can go through them and should test it in a test directory/test environment only.
Related
I am trying to append multiple csv files in to one.
How can I enhance the below script so that an additional column is added. Let's call it "tag". The values in the tag should be the filename from which the record has been appended.
flag=0
for f in $#/*.csv;
do
k=$(wc -l<"$f" )
if [ $flag -eq 0 ];
then
head -n $k "$f" > out.csv
flag=1
else
tail -n +2 "$f" >> out.csv
fi
done
Using #Shawn's approach below I am getting this:-
$ cat TEST1/a.csv
h1,h2,h3
a,b,c
d,e,f
$ cat TEST1/b.csv
h1,h2,h3
1,2,3
4,5,6
$ awk 'NR == 1 { print $0 ",tag"; next }
FNR == 1 { next }
{ print $0 "," FILENAME }' TEST1/a.csv TEST1/b.csv
,tag2,h3
,TEST1/a.csv
,TEST1/a.csv
,TEST1/b.csv
,TEST1/b.csv
Something like this using awk:
$ cat a.csv
header1,header2,header3
a,b,c
d,e,f
$ cat b.csv
header1,header2,header3
1,2,3
4,5,6
$ awk 'NR == 1 { print $0 ",tag"; next }
FNR == 1 { next }
{ print $0 "," FILENAME }' a.csv b.csv
header1,header2,header3,tag
a,b,c,a.csv
d,e,f,a.csv
1,2,3,b.csv
4,5,6,b.csv
This: Treats the first line of the first file as a header line to print out, skips the first lines of all further files, and prints the remaining lines of all files, appending a column with the current filename to each one.
I have 3 files with below data
$cat File1.txt
Apple,May
Orange,June
Mango,July
$cat File2.txt
Apple,Jan
Grapes,June
$cat File3.txt
Apple,March
Mango,Feb
Banana,Dec
I require the below output file.
$Output_file.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec
Requirement here is the take out the first column and then common data in column 1 in each file need to be searched and second column needs to be "|" separated. If there is no common column, then same needs to be printed in the output file.
I have tried putting this in a while loop, but it takes time as the file size increase. Wanted a simple solution using shell script.
This should work :
#!/bin/bash
for FRUIT in $( cat "$#" | cut -d "," -f 1 | sort | uniq )
do
echo -ne "${FRUIT},"
awk -F "," "\$1 == \"$FRUIT\" {printf(\"%s|\",\$2)}" "$#" | sed 's/.$/\'$'\n/'
done
Run it as :
$ ./script.sh File1.txt File2.txt File3.txt
A purely native-bash solution (calling no external tools, and thus limited only by the performance constraints of bash itself) might look like:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4 or newer required" >&2; exit 1;; esac
declare -A items=( )
for file in "$#"; do
while IFS=, read -r key value; do
items[$key]+="|$value"
done <"$file"
done
for key in "${!items[#]}"; do
value=${items[$key]}
printf '%s,%s\n' "$key" "${value#'|'}"
done
...called as ./yourscript File1.txt File2.txt File3.txt
This is fairly easy done with a single awk command:
awk 'BEGIN{FS=OFS=","} {a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i in a) print i, a[i]}' File{1,2,3}.txt
Orange,June
Banana,Dec
Apple,May|Jan|March
Grapes,June
Mango,July|Feb
If you want output in the same order as strings appear in original files then use this awk:
awk 'BEGIN{FS=OFS=","} !($1 in a) {b[++n] = $1}
{a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i=1; i<=n; i++) print b[i], a[b[i]]}' File{1,2,3}.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec
I'm trying to use a command on everyfile .log in a folder but I can't understand how to change the output file for every file in the folder.
#!/bin/bash
N="0"
for i in "*.log"
do
echo "Processing $f file..."
cat $i | grep "test" | awk '/show/ {a = $1} !/show/{print a,$6}' > "log$N.txt"
done
How can i increment the counter for log$n.txt?
It's bad practice to write shell loops just to process text (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). This is all you need:
awk '
FNR==1 { print "Processing", FILENAME; a=""; close(out); out="log" fileNr++ ".txt" }
!/test/ { next }
/show/ { a = $1; next }
{ print a, $6 > out }
' *.log
#!/bin/bash
N="0"
for i in *.log
do
echo "Processing $f file..."
cat $i | grep "test" | awk '/show/ {a = $1} !/show/{print a,$6}' > "log$N.txt"
N=$((N+1))
done
You need to increment the variable 'N' over every iteration and remove the double-quotes in the for loop
I am currently working on a bash script which requires me to display words from a file separated by a tilde (~) character using grep or sed. In my script I have to display different fields for each word. For example I will need to echo "Word 1: " followed by the first word in the file, on separate lines in the script. This is what I have so far:
#!/bin/bash
echo "Word 1: " >> print.txt
echo "Word 2: " >> print.txt
echo "Word 3: " >> print.txt
etc.
I need to read the words from another file that contains just words separated by ~ and does not have any spaces. This will be included after the "Word 1: " and before the append operator. I have been looking around and it looks like I need to do something with the grep -o command.
Thanks for your help.
You may want to use awk.
echo "Word 1: " `awk -F '~' '{print $1}' print.txt`
echo "Word 2: " `awk -F '~' '{print $2}' print.txt`
echo "Word 3: " `awk -F '~' '{print $3}' print.txt`
If you want it to loop infinitely over all items in the tilde separated list, you'll use a loop in awk.
awk -F '~' '{ for(i = 1; i <= NF; i++) {print "Word " i ": " $i;} }' print.txt
What you are currently doing with the >> is actually appending to the end of that file. If you executed what you have, your print.txt would end up looking like
word~anotherword~etc
Word 1:
Word 2:
Word 3:
EDIT based on commented clarification:
Ah I see, okay then, this awk loop will output the list, then all you need to do is redirect that to the print file. I understand now why you were using that. It will look like this:
awk -F '~' '{ for(i = 1; i <= NF; i++) { print "Word " i ": " $i; } }' words.txt > print.txt
hi i am new to using awk and i am trying to edit a specific field based on input
e.g delimiter is :
item.txt file
comics:stan lee:5:1:2
comics:stan lim:1:2:3
i want to use awk to display the entire line & i want to use sed to edit the 3rd field based on the input of user, how should i proceed from here?
echo: "Enter Field 1: "
read field_one
echo: "Enter field 2: "
read field_two
#need awk statement
echo "Enter New Field 3: "
read field_three_new
sed -i "/^$field_one:$field_two:/ s/$field_three_new/$field_three/" item.txt || tee item.txt
Do not use sed to edit the line; it will end badly if the user enters something that sed interprets as a control character (such as /, {, }, \ and so on). Try your script and enter / for one of the variables, sed will complain about syntax errors.
You can do both with awk:
echo "Enter Field 1: "
read field_one
echo "Enter field 2: "
read field_two
awk -F : -v f1="$field_one" -v f2="$field_two" '$1 == f1 && $2 == f2' item.txt
echo "Enter New Field 3: "
read field_three_new
# If you have GNU awk 4.1.0 or later:
awk -i inplace -F : -v f1="$field_one" -v f2="$field_two" -v f3="$field_three_new" 'BEGIN { OFS=FS } $1 == f1 && $2 == f2 { $3 = f3 } 1' item.txt
# otherwise:
# the && between the commands are to short-circuit in case of an error; if one of
# the commands fails, the others should not be executed anymore. Kudos to
# Ed Morton for catching that.
tempfile=$(mktemp) &&
awk -F : -v f1="$field_one" -v f2="$field_two" -v f3="$field_three_new" 'BEGIN { OFS=FS } $1 == f1 && $2 == f2 { $3 = f3 } 1' item.txt > "$tempfile" &&
mv "$tempfile" item.txt
# or, keeping a backup (this is my preferred method)
cp item.txt item.txt~ &&
awk -F : -v f1="$field_one" -v f2="$field_two" -v f3="$field_three_new" 'BEGIN { OFS=FS } $1 == f1 && $2 == f2 { $3 = f3 } 1' item.txt~ > item.txt
The advantage is that the awk variables are set to the shell variables in a context in which no code interpretation takes place, so special characters are handled gracefully.
The awk code is fairly straightforward. The command line option -F : sets the field separator to :, so $1 will be the first colon-delimited field (e.g. comics), $2 the second (e.g. stan lee) and so forth. -v varname=value sets an awk variable named varname to the value value; the variable can later be used in the script.
In the first call, then,
$1 == f1 && $2 == f2
selects those lines in which field 1 is equal to f1 (which is set to the first user input in the shell script), and field 2 is set to f2 (which is set to the second).
In the second call,
BEGIN { # once right at the start (before the first line)
OFS = FS # set the output field separator to the field separator.
} # Now both are :
$1 == f1 && $2 == f2 { # if the current line matches the condition from before
$3 = f3 # replace the third field with the third user input
}
1 # then select all lines for printing. This works because
# by convention, 1 means true (as do all non-zero values)
The OFS = FS thing is necessary because the default print action will print the fields one by one if a field is changed and place the output field separator between them. It does not matter for the unchanged lines, which are repeated verbatim.