Shell script file takes partial path from parameter file - shell

I have a parameter file(parameter.txt) which contain like below:
SASH=/home/ec2-user/installers
installer=/home/hadoop/path1
And My shell script(temp_pull.sh) is like below:
EPATH=`cat $1|grep 'SASH' -w| cut -d'=' -f2`
echo $EPATH
${EPATH}/data-integration/kitchen.sh -file="$KJBPATH/hadoop/temp/maxtem/temp_pull.kjb"
When I run my temp_pull.sh like below:
./temp_pull.sh parameter.txt
$EPATH gives me correct path, but 3rd line of code takes only partial path.
Error code pasted below:
/home/ec2-user/installers-->out put of 2nd line
/data-integration/kitchen.sh: No such file or directory**2-user/installer** -->out put of 3rd line

There is no need to manually parse the values of the file, because it already contains data in the format variables are defined: var=value.
Hence, if the file is safe enough, you can source the file so that SASH value will be available just by saying $SASH.
Then, you can use the following:
source "$1" # source the file given as first parameter
"$SASH"/data-integration/kitchen.sh -file="$KJBPATH/hadoop/temp/maxtem/temp_pull.kjb"

The problem is file which we were using was copied from windows to UNIX.So delimiter issue are the root cause.
By using dos2unix paramfile.txt we are able to fix the isue.
command:
dos2unix paramfile.txt
This will convert all the delemeter of windows to unix format.

Related

Delete a string in a file using bash script

We have a file which has unformatted xml content in a single line
<sample:text>Report</sample:text><sample:user name="11111111" guid="163g673"/><sample:user name="22222222" guid="aknen1763y82bjkj18"/><sample:user name="33333333" guid="q3k4nn5k2nk53n6"/><sample:user name="44444444" guid="34bkj3b5kjbkq"/><sample:user name="55555555" guid="k4n5k34nlk6n711kjnk5253"/><sample:user name="66666666" guid="1n4k14nknl1n4lb1"/>
If we find a particular string suppose "22222222", i want to remove the entire string that surrounds the matched string. In our case the entire portion around 22222222 i.e., <sample:user name="22222222" guid="aknen1763y82bjkj18"/> should be removed and the file has to be saved.
How can we do it? Please help
You can do it using sed utility by invoking it like this:
sed -i file -e 's/<[^<]*"22222222"[^>]*>//'

Implementation issue in cascading while reading data from hdfs

Suppose I have these files in hdfs directory
500/Customer/part-001
500/Customer/part-002
500/Customer/part-003
Can it be possible to check from which part file the tuple is coming?
Note:I have researched but got nothing.
your question is not very clear.
Let's say your output is in following layout and the delimiter is ';'
id;name;age
1;Jordan;22
2;Nathan;33
and so on
You could use awk or grep or both to get the record
for example, if you want to search for the record Nathan, try the file command
grep -r "Nathan" part*
above command will search for the string "Nathan" and if the string is present in any part file then the first entry (word) in output will be the name of the file.
if you don't want the file name you could use
grep -hr "Nathan" part*
Please be more clear when questioning.
I got answer how to get from which part file tuple file are coming.I solved my problem using code below.
String fileName = flowProcess.getProperty("cascading.source.path").toString();
Thanks,

Modify text file based on file's name, repeat for all files in folder

I have a folder with several files named : something_1001.txt; something_1002.txt; something_1003.txt; etc.
Inside the files there is some text. Of course each file has a different text but the structure is always the same: some lines identified with the string ">TEXT", which are the ones I am interested in.
So my goal is :
for each file in the folder, read the file's name and extract the number between "_" and ".txt"
modify all the lines in this particular file that contain the string ">TEXT" in order to make it ">{NUMBER}_TEXT"
For example : file "something_1001.txt"; change all the lines containing ">TEXT" by ">1001_TEXT"; move on to file "something_1002.txt" change all the lines containing ">TEXT" by ">1002_TEXT"; etc.
Here is the code I wrote so far :
for i in /folder/*.txt
NAME=`echo $i | grep -oP '(?<=something_/).*(?=\.txt)'`
do
sed -i -e 's/>TEXT/>${NAME}_TEXT/g' /folder/something_${NAME}.txt
done
I created a small bash script to run the code but it's not working. There seems to be syntax errors and a loop error, but I can't figure out where.
Any help would be most welcome !
There are two problems here. One is that your loop syntax is wrong; the other is that you are using single quotes around the sed script, which prevents the shell from interpolating your variable.
The grep can be avoided, anyway; the shell has good built-in facilities for extracting the base name of a file.
for i in /folder/*.txt
do
base=${i#/folder/something_}
sed -i -e "s/>TEXT/>${base%.txt}_TEXT/" "$i"
done
The shell's ${var#prefix} and ${var%suffix} variable manipulation facility produces the value of $var with the prefix and suffix trimmed off, respectively.
As an aside, avoid uppercase variable names, because those are reserved for system use, and take care to double-quote any variable whose contents may include shell metacharacters.

Replace last line of XML file

Looking for help creating a script that will replace the last line of an XML file with a tag. I have a few hundred files so I'm looking for something that will process them in a loop. I've managed to rename the files sequentially like this:
posts1.xml
posts2.xml
posts3.xml
etc...
to make it easier to loop through. But I have no idea how to write a script to do this. I'm open to using either Linux or Windows (but i would guess that Linux is better for this kind of task).
So if you want to append a line to every file:
sed -i '$a<YOUR_SHINY_NEW_TAG>' *xml
To replace the last line:
sed -i '$s/.*/<YOUR_SHINY_NEW_TAG>/' *xml
But do note, sed is not the ideal tool to modify xml.
XMLStarlet is a command-line toolkit for performing XML parsing and manipulations. Note that as an XML-aware toolkit, it'll respect XML structure, character encoding and entity substitution.
Check out the ed command to see how to modify documents. You can wrap this in a standard bash loop.
e.g. in a doc consisting of a chain of <elem>s, you can add a following <added>5</added>:
mkdir new
for x in *.xml; do
xmlstarlet ed -a "//elem[count(//elem)]" -t elem -n added -v 5 $x > new/$x
done
Linux way using sed:
To edit the last line of the file in place, you can use sed:
sed -i '$s_pattern_replacement_' filename
To change the whole line to "replacement" use $s_.*_replacement_. Be sure to escape any _'s in replacement with a \.
To loop over files, just use for:
for f in /path/posts*.xml; do sed -i '$s_.*_replacement_' $f; done
This, however, is a dirty way as it's not aware of the XML structure, whereas the XML structure is not affected by newlines. You have to be sure the last line of the files contains exactly what you expect it to.
It makes little to no difference whether you're on Linux, Windows or MacOS
The question is what language do you want to use?
The following is an example in c# (not optimized, but read it as speudocode):
string rootDirectory = #"c:\myfiles";
var files = Directory.GetFiles(rootDirectory, "*.xml");
foreach (var file in files)
{
var lines = File.ReadAllLines(file);
lines[lines.Length - 1] = "whatever you want here";
File.WriteAllLines(file, lines);
}
You can compile this and run it on Windows, Linux, etc..
Or you could do the same in Python.
Of course this method does not actually parse the XML,
but you just wanted to replace the last line right?

ls command in UNIX

I have to ls command to get the details of certain types of files. The file name has a specific format. The first two words followed by the date on which the file was generated
e.g.:
Report_execution_032916.pdf
Report_execution_033016.pdf
Word summary can also come in place of report.
e.g.:
Summary_execution_032916.pdf
Hence in my shell script I put these line of codes
DATE=`date +%m%d%y`
Model=Report
file=`ls ${Model}_execution_*${DATE}_*.pdf`
But the value of Model always gets resolved to 'REPORT' and hence I get:
ls: cannot access REPORT_execution_*032916_*.pdf: No such file or directory
I am stuck at how the resolution of Model is happening here.
I can't reproduce the exact code here. Hence I have changed some variable names. Initially I had used the variable name type instead of Model. But Model is the on which I use in my actual code
You've changed your script to use Model=Report and ${Model} and you've said you have typeset -u Model in your script. The -u option to the typeset command (instead of declare — they're synonyms) means "convert the strings assigned to all upper-case".
-u When the variable is assigned a value, all lower-case characters are converted to upper-case. The lower-case attribute is disabled.
That explains the upper-case REPORT in the variable expansion. You can demonstrate by writing:
Model=Report
echo "Model=[${Model}]"
It would echo Model=[REPORT] because of the typeset -u Model.
Don't use the -u option if you don't want it.
You should probably fix your glob expression too:
file=$(ls ${Model}_execution_*${DATE}*.pdf)
Using $(…) instead of backticks is generally a good idea.
And, as a general point, learn how to Debug a Bash Script and always provide an MCVE (How to create a Minimal, Complete, and Verifiable Example?) so that we can see what your problem is more easily.
Some things to look at:
type is usually a reserved word, though it won't break your script, I suggest you to change that variable name to something else.
You are missing an $ before {DATE}, and you have an extra _ after it. If the date is the last part of the name, then there's no point in having an * at the end either. The file definition should be:
file=`ls ${type}_execution_*${DATE}.pdf`
Try debugging your code by parts: instead of doing an ls, do an echo of each variable, see what comes out, and trace the problem back to its origin.
As #DevSolar pointed out you may have problems parsing the output of ls.
As a workaround
ls | grep `date +%m%d%y` | grep "_execution_" | grep -E 'Report|Summary'
filters the ls output afterwards.
touch 'Summary_execution_032916.pdf'
DATE=`date +%m%d%y`
Model=Summary
file=`ls ${Model}_execution_*${DATE}*.pdf`
worked just fine on
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Part of question:
But the value of Model always gets resolved to 'REPORT' and hence I get:
This is due to the fact that in your script you have exported Model=Report
Part of question:
ls: cannot access REPORT_execution_*032916_*.pdf: No such file or directory
No such file our directory issue is due to the additional "_" and additional "*"s that you have put in your 3rd line.
Remove it and the error will be gone. Though, Model will still resolve to Report
Original 3rd line :
file=`ls ${Model}_execution_*${DATE}_*.pdf`
Change it to
file=`ls ${Model}_execution_${DATE}.pdf`
Above change will resolve the could not found issue.
Part of question
I am stuck at how the resolution of Model is happening here.
I am not sure what you are trying to achieve, but if you are trying to populate the file parameter with file name with anything_exection_someDate.pdf, then you can write your script as
DATE=`date +%m%d%y`
file=`ls *_execution_${DATE}.pdf`
If you echo the value of file you will get
Report_execution_032916.pdf Summary_execution_032916.pdf
as the answer
There were some other scripts which were invoked before the control reaches the line of codes which I mentioned in the question. In one such script there is a code
typeset -u Model
This sets the value of the variable model always to uppercase which was the reason this error was thrown
ls: cannot access REPORT_execution_032916_.pdf: No such file or directory
I am sorry that
i couldn't provide a minimal,complete and verifiable code

Resources