Shell Script to fill the target defined in XML file - shell

Suppose there is one file.txt in which text is written as mentioned below:-
ABC
EFG
XYZ
In another xml, there is one empty body target named(compile) defined.
<project>
<compile>
.
.
.
start //from here till EOF
shell
script
xyz
</compile>
</project>
I need a shell script which fill the content in between the target defined . After executing the script it should look as mentioned below in output tag.It will be done for the entire content written in file.txt file.
Output:-
<!-- ...preceding portions of input document... -->
<project>
<compile>
componentName="ABC"
componentName="EFG"
componentName="XYZ"
start
shell
script
xyz
</compile>
</project>
<!-- ...remaining portions of input document... -->

Use a proper XML parser. XMLStarlet is one tool fit for the job:
#!/bin/bash
# ^^^^- important, not /bin/sh
# read input file into an array
IFS=$'\n' read -r -d '' -a pieces <file.txt
# assemble target text based on expanding that array
printf -v text 'componentName=%s\n' "${pieces[#]}"
# Read input, changing all elements named "compile" in the default namespace
# ...to contain our target text.
xmlstarlet ed -u '//compile' -v "$text" <in.xml >out.xml

You can do what you are attempting (to some degree) with sed and a while read -r loop. For example, you can fill a temporary file with the contents of your xml file from line 1 to the <targettag> with
sed -n "1, /^${ttag}$/p" "$xfn" > "$ofn" ## fill output to ttag
(where xfn is your xml file name and ofn is your output file name)
You can then read all values from your text file and prepend componentName=" and append " with:
while read -r line; do ## read each line in ifn and concatenate
printf "%s%s\"\n" "$cmptag" "$line" >> "$ofn"
done <"$ifn"
(where ifn is your input file name)
And finally, you can write the closing tag to end of your xml file to your output file with:
sed -n "/^${ttag/</<[\/]}$/, \${p}" "$xfn" >> "$ofn"
(using parameter expansion with substring replacement to add the closing '/' to the beginning of <targettag>.
Putting it altogether, you could do something like:
#!/bin/bash
ifn="f1"
xfn="f2.xml"
ofn="f3.xml"
ttag="${1:-<targettag>}" ## set target tag
cmptag="componentName=\"" ## set string to prepend
sed -n "1, /^${ttag}$/p" "$xfn" > "$ofn" ## fill output to ttag
while read -r line; do ## read each line in ifn and concatenate
printf "%s%s\"\n" "$cmptag" "$line" >> "$ofn"
done <"$ifn"
## fill output from closing tag to end
sed -n "/^${ttag/</<[\/]}$/, \${p}" "$xfn" >> "$ofn"
Input Files
$ cat f1
ABC
EFG
XYZ
$ cat f2.xml
<someschema>
<targettag>
</targettag>
</someschema>
Example Use/Output
$ fillxml.sh
$ cat f3.xml
<someschema>
<targettag>
componentName="ABC"
componentName="EFG"
componentName="XYZ"
</targettag>
</someschema>
(you can adjust the indentation to fit your needs)
Addition After Changes to Question
The changes needed to handle writing from start to end after adding the componentName="..." tags are simple. However, the commonality of the word start exemplifies why the answer by Charles encourages you to use an XML tool rather than a simple script. Why? If the word 'start' occurs anywhere else in your .xml file before your intended start, the script will fail by writing for the first occurrence of start to the end.
That said, if this is a simple on-off conversion and start doesn't occur otherwise, then the changes to the script to accomplish your desired output are easy:
#!/bin/bash
ifn="f1"
xfn="another.xml"
ofn="f3.xml"
ttag="${1:-<compile>}" ## set target tag
cmptag="componentName=\"" ## set string to prepend
sed -n "1, /^${ttag}$/p" "$xfn" > "$ofn" ## fill output to ttag
## read each line in ifn and concatenate
while read -r line || [ -n "$line" ]; do
printf "%s%s\"\n" "$cmptag" "$line" >> "$ofn"
done <"$ifn"
## fill output from 'start' to end
sed -n "/^start/, \${p}" "$xfn" >> "$ofn"
Input Files
$ cat f1
ABC
EFG
XYZ
$ cat another.xml
<project>
<compile>
start
shell
script
xyz
</compile>
</project>
Example Use/Output
$ cat f3.xml
<project>
<compile>
componentName="ABC"
componentName="EFG"
componentName="XYZ"
start
shell
script
xyz
</compile>
</project>
Look it over and let me know if you have questions.

Related

looping with grep over several files

I have multiple files /text-1.txt, /text-2.txt ... /text-20.txt
and what I want to do is to grep for two patterns and stitch them into one file.
For example:
I have
grep "Int_dogs" /text-1.txt > /text-1-dogs.txt
grep "Int_cats" /text-1.txt> /text-1-cats.txt
cat /text-1-dogs.txt /text-1-cats.txt > /text-1-output.txt
I want to repeat this for all 20 files above. Is there an efficient way in bash/awk, etc. to do this ?
#!/bin/sh
count=1
next () {
[[ "${count}" -lt 21 ]] && main
[[ "${count}" -eq 21 ]] && exit 0
}
main () {
file="text-${count}"
grep "Int_dogs" "${file}.txt" > "${file}-dogs.txt"
grep "Int_cats" "${file}.txt" > "${file}-cats.txt"
cat "${file}-dogs.txt" "${file}-cats.txt" > "${file}-output.txt"
count=$((count+1))
next
}
next
grep has some features you seem not to be aware of:
grep can be launched on lists of files, but the output will be different:
For a single file, the output will only contain the filtered line, like in this example:
cat text-1.txt
I have a cat.
I have a dog.
I have a canary.
grep "cat" text-1.txt
I have a cat.
For multiple files, also the filename will be shown in the output: let's add another textfile:
cat text-2.txt
I don't have a dog.
I don't have a cat.
I don't have a canary.
grep "cat" text-*.txt
text-1.txt: I have a cat.
text-2.txt: I don't have a cat.
grep can be extended to search for multiple patterns in files, using the -E switch. The patterns need to be separated using a pipe symbol:
grep -E "cat|dog" text-1.txt
I have a dog.
I have a cat.
(summary of the previous two points + the remark that grep -E equals egrep):
egrep "cat|dog" text-*.txt
text-1.txt:I have a dog.
text-1.txt:I have a cat.
text-2.txt:I don't have a dog.
text-2.txt:I don't have a cat.
So, in order to redirect this to an output file, you can simply say:
egrep "cat|dog" text-*.txt >text-1-output.txt
Assuming you're using bash.
Try this:
for i in $(seq 1 20) ;do rm -f text-${i}-output.txt ; grep -E "Int_dogs|Int_cats" text-${i}.txt >> text-${i}-output.txt ;done
Details
This one-line script does the following:
Original files are intended to have the following name order/syntax:
text-<INTEGER_NUMBER>.txt - Example: text-1.txt, text-2.txt, ... text-100.txt.
Creates a loop starting from 1 to <N> and <N> is the number of files you want to process.
Warn: rm -f text-${i}-output.txt command first will be run and remove the possible outputfile (if there is any), to ensure that a fresh new output file will be only available at the end of the process.
grep -E "Int_dogs|Int_cats" text-${i}.txt will try to match both strings in the original file and by >> text-${i}-output.txt all the matched lines will be redirected to a newly created output file with the relevant number of the original file. Example: if integer number in original file is 5 text-5.txt, then text-5-output.txt file will be created & contain the matched string lines (if any).

how to print An array on the same line

I am reading a file which contains the following config as an array:
$ cat ./FILENAME
*.one.dev.arr.name.com
*.one.dev.brr.name.com
*.one.dev.sic.name.com
*.one.dev.sid.name.com
*.one.dev.xyz.name.com
*.one.dev.yza.name.com
The array is read
IFS='$\n' read -d '' -r -a FILENAME < ./FILENAME
I need the format to be of the following format:
'{*.one.dev.arr.name.com,*.one.dev.brr.name.com,*.one.dev.sic.name.com,*.one.dev.sid.name.com,*.one.dev.xyz.name.com,*.one.dev.yza.name.com}'
I've tried using printf, the tricky part is the wildcard(*) at the start of the name.
Just output what you want to output - { and } and join lines with ,:
echo "{$(paste -sd, FILENAME)}"
If you want with an array, you can just:
echo "{$(IFS=, ; echo "${array[*]}")}"

split large text file into chunks by lines containing a specific character

I am trying to chunk a large text file (~27 Gb) into a series of smaller files, where the break points are defined by a subheader each of which contains the same symbol (in this case '#').
So the following large file:
#auniquestring
dataline1
dataline2
...
dataline33456
#aseconduniquestring
dataline33458
dataline33459
...
dataline124589
#athirdunqiuestring
dataline124591
dataline124592
...
...becomes:
1st file:
#auniquestring
dataline1
dataline2
...
dataline33456
2nd file:
#aseconduniquestring
dataline33458
dataline33459
...
dataline124589
3rd file:
#athirdunqiuestring
dataline124591
dataline124592
...
etc
I've tried things like sed -n '/#/,/#/p' myfile but it outputs everything at once, and misses the contents of every other subheader. Any help would be much appreciated
Using awk (NOTICE IT WILL CREATE FILES NAMED file[0-9]+.txt):
$ awk '
BEGIN {
file="file0.txt" # just in case
}
/^#/ { # when record starts with #
close(file) # close previous file
file=sprintf("file%d.txt",++f) # generate next filename
}
{
print > file # output to generated filename
}' file
Sample output:
$ cat file1.txt
#auniquestring
dataline1
dataline2
...
dataline33456
Modern Bash versions can compare regular expressions.
#! /bin/bash
n=1
while read -r line; do
if [[ $line =~ ^# ]]; then
exec >file$((n++))
fi
printf "%s\n" "$line"
done

Cat content of files to .txt files with common pattern name in bash

I have a series of .dat files and a series of .txt files that have a common matching pattern. I want to cat the content of the .dat files into each respective .txt file with the matching pattern in the file name, in a loop. Example files are:
xfile_pr_WRF_mergetime_regionA.nc.dat
xfile_pr_GFDL_mergetime_regionA.nc.dat
xfile_pr_RCA_mergetime_regionA.nc.dat
#
yfile_pr_WRF_mergetime_regionA.nc.dat
yfile_pr_GFDL_mergetime_regionA.nc.dat
yfile_pr_RCA_mergetime_regionA.nc.dat
#
pr_WRF_mergetime_regionA_final.txt
pr_GFDL_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
What I have tried so far is the following (I am trying to cat the content of all files starting with "xfile" to the respective model .txt file.
#
find -name 'xfile*' | sed 's/_mergetime_.*//' | sort -u | while read -r pattern
do
echo "${pattern}"*
cat "${pattern}"* >> "${pattern}".txt
done
Let me make some assumptions:
All filenames contain _mergetime_* substring.
The pattern is the portion such as pr_GFDL and is essential to
identify the file.
Then would you try the following:
declare -A map # create an associative array
for f in xfile_*.dat; do # loop over xfile_* files
pattern=${f%_mergetime_*} # remove _mergetime_* substring to extract pattern
pattern=${pattern#xfile_} # remove xfile_ prefix
map[$pattern]=$f # associate the pattern with the filename
done
for f in *.txt; do # loop over *.txt files
pattern=${f%_mergetime_*} # extract the pattern
[[ -f ${map[$pattern]} ]] && cat "${map[$pattern]}" >> "$f"
done
If I understood you correctly, you want the following:
- xfile_pr_WRF_mergetime_regionA.nc.dat
- yfile_pr_WRF_mergetime_regionA.nc.dat
----> pr_WRF_mergetime_regionA_final.txt
- xfile_pr_GFDL_mergetime_regionA.nc.dat
- yfile_pr_GFDL_mergetime_regionA.nc.dat
----> pr_GFDL_mergetime_regionA_final.txt
- xfile_pr_RCA_mergetime_regionA.nc.dat
- yfile_pr_RCA_mergetime_regionA.nc.dat
----> pr_RCA_mergetime_regionA_final.txt
So here's what you want to do in the script:
Get all .nc.dat files in the directory
Extra the pr_TYPE_mergetime_region from the file
Append the _final.txt part to the output file
Then actually pipe the cat output onto that file
So I ended up with the following code:
find *.dat | while read -r pattern
do
output=$(echo $pattern | sed -e 's![^(pr)]*!!' -e 's!.nc.dat!!')
cat $pattern >> "${output}_final.txt"
done
And here are the files I ended up with:
pr_GFDL_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
pr_WRF_mergetime_regionA_final.txt
Kindly let me know in the comments if I misunderstood anything or missed anything.
Seems like what you asks for:
concatxy.sh:
#!/usr/bin/env bash
# do not return the pattern if no file matches
shopt -s nullglob
# Iterate all xfiles
for xfile in "xfile_pr_"*".nc.dat"; do
# Regex to extract the common filename part
[[ "$xfile" =~ ^xfile_(.*)\.nc\.dat$ ]]
# Compose the matching yfile name
yfile="yfile_${BASH_REMATCH[1]}.nc.dat"
# Compose the output text file name
txtfile="${BASH_REMATCH[1]}_final.txt"
# Perform the concatenation of xfile and yfile into the .txt file
cat "$xfile" "$yfile" >"$txtfile"
done
Creating populated test files:
preptest.sh:
#!/usr/bin/env bash
# Populating test files
echo "Content of xfile_pr_WRF_mergetime_regionA.nc.dat" >xfile_pr_WRF_mergetime_regionA.nc.dat
echo "Content of xfile_pr_GFDL_mergetime_regionA.nc.dat" >xfile_pr_GFDL_mergetime_regionA.nc.dat
echo "Content of xfile_pr_RCA_mergetime_regionA.nc.dat" >xfile_pr_RCA_mergetime_regionA.nc.dat
#
echo "Content of yfile_pr_WRF_mergetime_regionA.nc.dat" > yfile_pr_WRF_mergetime_regionA.nc.dat
echo "Content of yfile_pr_GFDL_mergetime_regionA.nc.dat" >yfile_pr_GFDL_mergetime_regionA.nc.dat
echo "Content of yfile_pr_RCA_mergetime_regionA.nc.dat" >yfile_pr_RCA_mergetime_regionA.nc.dat
#
#pr_WRF_mergetime_regionA_final.txt
#pr_GFDL_mergetime_regionA_final.txt
#pr_RCA_mergetime_regionA_final.txt
Running test
$ bash ./preptest.sh
$ bash ./concatxy.sh
$ ls -tr1
concatxy.sh
preptest.sh
yfile_pr_WRF_mergetime_regionA.nc.dat
yfile_pr_RCA_mergetime_regionA.nc.dat
yfile_pr_GFDL_mergetime_regionA.nc.dat
xfile_pr_WRF_mergetime_regionA.nc.dat
xfile_pr_RCA_mergetime_regionA.nc.dat
xfile_pr_GFDL_mergetime_regionA.nc.dat
pr_GFDL_mergetime_regionA_final.txt
pr_WRF_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
$ cat pr_GFDL_mergetime_regionA_final.txt
Content of xfile_pr_GFDL_mergetime_regionA.nc.dat
Content of yfile_pr_GFDL_mergetime_regionA.nc.dat
$ cat pr_WRF_mergetime_regionA_final.txt
Content of xfile_pr_WRF_mergetime_regionA.nc.dat
Content of yfile_pr_WRF_mergetime_regionA.nc.dat
$ cat pr_RCA_mergetime_regionA_final.txt
Content of xfile_pr_RCA_mergetime_regionA.nc.dat
Content of yfile_pr_RCA_mergetime_regionA.nc.dat

how to add a word at the end of a line with ^ without a line break?

I would like to add a string (example: "1565555555") at the end of a particular line in my file.
My file .txt before :
mystrinsdsfssffdfdg
mystrdsfdsfdfffding
mystrsfdsdfsffdfing
mystrdsfdfsdfsffing
Here is my script:
for file in mydirectory/*txt; do
filename=`basename "$file"`
# read each line
while IFS= read -r line
do
old="$IFS"
IFS="^"
set $line
IFS="$old"
count=1
id="2656556655"
sed "s/$line/&^$id/" -i $file #my problem
((count++))
done < "$file"
done
Today, my result :
mystrinsdsfssffdfdg
^2656556655
mystrdsfdsfdfffding
^2656556655
mystrsfdsdfsffdfing
^2656556655
mystrdsfdfsdfsffing
^2656556655
Expected result :
mystrinsdsfssffdfdg^2656556655
mystrdsfdsfdfffding^2656556655
mystrsfdsdfsffdfing^2656556655
mystrdsfdfsdfsffing^2656556655
Assuming the objective is to append a string (^2656556655) on the end of every line in a given file ...
One sample file:
$ cat mystring.txt
mystrinsdsfssffdfdg
mystrdsfdsfdfffding
mystrsfdsdfsffdfing
mystrdsfdfsdfsffing
One sed solution that appends to the end of every line in the file:
$ sed 's/$/^2656556655/g' mystring.txt
mystrinsdsfssffdfdg^2656556655
mystrdsfdsfdfffding^2656556655
mystrsfdsdfsffdfing^2656556655
mystrdsfdfsdfsffing^2656556655
One benefit to this method is that you replace a) the inner looping construct and the repeated sed calls for each line in the file with b) a single sed call and a single pass through the input file. Net result is that you should see a noticeable speed up in the time it takes to process a given file.

Resources