Shell script to parse through a file ( csv ) and process line by line [duplicate] - shell

This question already has answers here:
Bash: Parse CSV with quotes, commas and newlines
(10 answers)
Closed 2 years ago.
Hi Need a shell script to parse through the csv file - Line by line and then field by field ]
the file will look like this
X1,X2,X3,X4
Y1,Y2,Y3,Y4
I need to extract each of these X1,X2....
I wrote a script but it fails if the line exceeds one line..

Here's how I would do it.
First i set the IFS environment variable to tell read that "," is the field separator.
export IFS=","
Given the file "input" containing the data you provided, I can use the following code:
cat test | while read a b c d; do echo "$a:$b:$c:$d"; done
To quickly recap what is going on above. cat test | reads the file and pipes it to while. while runs the code between do and done while read returns true. read reads a line from standard input and separates it into variables ("a", "b", "c" and "d") according to the value of $IFS. Finally echo just displays the variables we read.
Which gives me the following output
X1:X2:X3:X4
Y1:Y2:Y3:Y4
BTW, the BASH manual is always good reading. You'll learn something new every time you read it.

Since eykanal mentioned AWk and and sed, I thought I'd show how you could use them.
awk -F, 'BEGIN{OFS="\n"}{$1=$1; print}' inputfile
or
sed 's/,/\n/' inputfile
Then a shell script could process their output:
awk_or_sed_cmd | while read -r field
do
do_something "$field"
done
Of course, you could do the processing within the AWK script:
awk -F, '{for (i=1;i<=NF;i++) do_something($i)}' inputfile

ls -l
vi filename.sh
#!bin/sh
echo "INPUT PATTERN"
cat > test
(input data and save it)
cat test | while read (input);(ex : "$a:$b:$c:$d");
done
echo "pattern shown as "$a:$b:$c:$d"\n"
exit(0;);

Related

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

How to find and replace the text in shell script

I am a new bie to shell scripting, here i am trying to find the text and replace the text using the shell script.
what i am trying to do is, actually i have a text file which has 2 strings separated by ":
"
Like this
lorem:ipsum
dola:meru
etc....
my script will take 2 parameters while running.
now the script should check if first parameter is found or not if not found it should add it to text file.
if the first parameter is found, then it should replace the second parameter.
for example
The text file has data like this
lorem:ipsum
dola:meru
caby:cemu
i am running my script with 2 parameters like this
./script.sh lorem meru
So when i run the script it should check if the first parameter found in the file if found, the script should replace the second string..
i.e
I ran the script like this
./script.sh lorem meru
so in the file
lorem:ipsum
dola:meru
caby:cemu
after running the script, in the line
lorem:ipsum
should get replaced to
lorem:meru
here is what i have tried..
#!/bin/sh
#
FILE_PATH=/home/script
FILE_NAME=$FILE_PATH/new.txt
echo $1
echo $2
if [] then
else
echo $1:$2 >> $FILE_NAME
fi
Using sed might be simpler.
$ cat inputfile
lorem:ipsum
dola:meru
caby:cemu
$ pattern="lorem"
$ word="meru"
$ sed "/^$pattern:/ s/:.*/:$word/" inputfile
lorem:meru
dola:meru
caby:cemu
try this line in your script:
awk -F: -v OFS=":" -v s="$1" -v r="$2" '$1==s{$2=r}7' file > newFile
You can try this as well,
input file : new.txt
lorem:ipsum
dola:meru
caby:cemu
Script file: script.sh
#!/bin/sh
sed -i "s/$1:.*/$1:$2/" new.txt
if you run the script as u wished "./script.sh lorum meru"
Output: new.txt, will be displayed as
lorem:meru
dola:meru
caby:cemu
Explanation:
sed is a powerful text processing tool, where u can manipulate the text as you wish with the options it provides.
Brief explanation of the code,
sed -i > used to replace the contents of the original file with the new changes ( if u dont give -i in sed it will just print the modified contents without affecting the original file)
"s/old:.*/old:new/"
"s" is used for substitution. Syntax is "s/old/new/'.
Here * specifies anything that is present after the ":"
So by executing this you will get the desired output.

How to combine two text files into one with a specific format with shell script?

I'm trying to combine two files into one file with a specific format, the files contain the following:
fileA.txt:
2
1
1
...
fileB.txt:
0023412322
1241231132
1234411177
...
So the output should be:
fileC.txt:
A B
2 0023412322
1 1241231132
1 1234411177
...
Where A and B represent the name of the column, and also form the initial line of the output file
The script should run on Solaris but I'm also having trouble with the instruction awk and I have not allowed to change or install anything on the system. Now I have a solution using a loop but not very efficient because the script takes too long with large files. So aside from using the awk instruction and loops, any suggestions?
I could never use an awk instruction so I do not have an awk instruction, only the loop:
echo "A B" > fileC.txt
i=echo 1
for line in cat fileA.txt
do
VAR=`sed -n "$i"',1p' fileB.txt`
echo "$line $VAR" >> fileC.txt
echo "$VAR" >> file"$line".txt #Another neccesary command for the script
i=`expr $i + 1`
done
What changes should I do?
paste is a very handy program that does almost exactly what you want, short of printing out the filenames or writing to a file. I wrote this simple shell script to add the filenames:
echo -e "$1\t$2" # print the filenames separated by a tab
paste $1 $2
You can run this by using chmod to make it executable, then running ./myscript file1 file2 (assuming that you name the script myscript). If you want to write to a third file, you can do ./myscript file1 file2 > file3.
Note that as written, the contents of each file are separated by tabs. If you want them to instead be separated by spaces, you can use this script instead:
echo "$1 $2"
paste -d" " $1 $2

reading a file line by line from a shell script

I am facing a problem in a bash shell script. This script is supposed to execute another shell script (./script here) and the output of the script is redirected to a file(tmp). Then the file should be read line by line and for each of the lines the same script(./script) should be executed giving the line as argument and the result should be stored in a file(tmp1). Eventually these results should be appended to the first file(tmp).
I am pasting my script below:
./script $1 $2 > tmp
cat tmp | while read a
do
./script $a $2 >> tmp1
done
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
I get the following error when I execute the script "./script: line 11: syntax error: unexpected end of file"
Can anyone please help me out in this??
Thanks a lot in advance.
The script file has DOS/Windows line endings. Try this command, then run your script:
dos2unix ./script
You're probably editing the file using a Windows editor and that's adding \r (0x0d) to the end of each line. This can be removed by using dos2unix.
The shell splits on whitespace normally. You can make it split on newlines by writing, towards the top,
IFS='
'
However, I think what you're trying to do might not be appropriate for shell, and might be better served by Perl or Ruby.
lose all the cats! they are unnecessary. And i suppose summ_tmp is an existing file?
#!/bin/bash
set -x
./wiki $1 $2 > tmp
while read -r a
do
./wiki $a $2 >> tmp1
done < summ_tmp
while read -r line
do
./wiki $line $2 >> tmp
done < tmp1
With what you are doing, you might want to refactor your "./script" to eliminate unnecessary steps. If its not too long, show what your "./script" does. Show your desired output and show examples of relevant input files where possible
Put set -x in your script (wiki and ./script) to help you debug.
Alternatively you could use xargs - this executes a command on every line in a file, which is exactly what you want.
You can replace
cat tmp1 | while read line
do
./script $line $2 >> tmp
done
with
xargs <tmp1 -n1 -IXXX ./script XXX $2 >>tmp
-n1 means read one line at a time from the input,
-IXXX means substitute XXX with the line that was read in - the default is to append it to the end of the command line.

Resources