how to map one csv file content to second csv file and write it another csv using unix - shell

After writing some unix scripts I am able to manage to get data from different xml files to csv format and now I got stuck with the following problem
file1.csv : contains
1,5,6,7,8
2,3,4,5,9
1,6,10,11,12
1,5,11,12
file2.csv : contains
1,Mango,Tuna,Webby,Through,Franky,Sam,Sumo
2,Franky
3,Sam
4,Sumo
5,Mango,Tuna,Webby
6,Tuna,Webby,Through
7,Through,Sam,Sumo
8,Nothing
9,Sam,Sumo
10,Sumo,Mango,Tuna
11,Mango,Tuna,Webby,Through
12,Mango,Tuna,Webby,Through,Franky
output I want is
1,5,6,7,8
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Tuna,Webby,Through
Through,Sam,Sumo
Nothing
Common word:None
2,3,4,5,9
Franky
Sam
Sumo
Mango,Tuna,Webby
Sam, Sumo
Common Word:None
1,6,10,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Tuna,Webby,Through
Sumo,Mango,Tuna
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Tuna
1,5,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Mango,Tuna,Webby
I apprecaite any help.
Thanks
I got some solution but not complete
##!/bin/bash
count=1
count_2=1
for i in `cat file1.csv`
do
echo $i > $count.txt
cat $count.txt | tr "," "\n" > $count_2.txt
count=`expr $count + 1`
count_2=`expr $count_2 + 1`
done;
#this code will create separte files for each line in file1.csv,
bash file3_search.sh
##########################
file3_search.sh
================
##!/bin/bash
cat file2.csv | sed '/^$/d' | sed 's/[ ]*$//' > trim.txt
dos2unix -q 1.txt 1.txt
dos2unix 2.txt 2.txt
dos2unix 3.txt 3.txt
echo "1st Combination results"
for i in `cat 1.txt`
do
cat trim.txt | egrep -w $i
done > Combination1.txt;
echo "2nd Combination results"
for i in `cat 2.txt`
do
cat trim.txt | egrep -w $i
done > Combination2.txt;
echo "3rd Combination results"
for i in `cat 3.txt`
do
cat trim.txt | egrep -w $i
done > Combination3.txt;
Guys I am not good at programming (I am software tester) please someone can re-factor my code and also please tell me how to get the common word in those Combination.txt file

IMHO it works:
for line in $(cat 1.csv) ; do
echo $line ;
grepline=`echo $line | sed 's/ \+//g;s/,/,|/g;s/^\(.*\)$/^(\1,)/'`;
egrep $grepline 2.csv
egrep $grepline 2.csv | \
awk -F "," '
{ for (i=2;i<=NF;i++)
{s[$i]+=1}
}
END { for (key in s)
{if (s[key]==NR) { tp+=key "," }
}
if (tp!="") {print "Common word(s): " gensub(/,$/,"","g",tp)}
else {print "Common word: None"}}'
echo
done
HTH

Here's an answer for you. It depends on associative array capabilities of bash version 4:
IFS=,
declare -a words
# read and store the words in file2
while read line; do
set -- $line
n=$1
shift
words[$n]="$*"
done < file2.csv
# read file1 and process
while read line; do
echo "$line"
set -- $line
indexes=( "$#" )
NF=${#indexes[#]}
declare -A common
for (( i=0; i<$NF; i++)); do
echo "${words[${indexes[$i]}]}"
set -- ${words[${indexes[$i]}]}
for word; do
common[$word]=$(( ${common[$word]} + 1))
done
done
printf "Common words: "
n=0
for word in "${!common[#]}"; do
if [[ ${common[$word]} -eq $NF ]]; then
printf "%s " $word
(( n++ ))
fi
done
[[ $n -eq 0 ]] && printf "None"
unset common
printf "\n\n"
done < file1.csv

Related

Bash - Extract Matching String from GZIP Files Is Running Very Slow

Complete novice in Bash. Trying to iterate thru 1000 gzip files, may be GNU parallel is the solution??
#!/bin/bash
ctr=0
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
for f in "$dir"/*.gz; do
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr" >> $1
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym
else
ctr=$((ctr+1))
fi
done
done
Any help to speed the process will be greatly appreciated !!!
#!/bin/bash
ctr=0
export ctr
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
export dir
doit() {
f="$1"
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr"
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym >&2
else
ctr=$((ctr+1))
fi
done
}
export -f doit
parallel doit ::: *gz 2>&1 > $1
The Bash while read loop is probably your main bottleneck here. Calling multiple external processes for simple field splitting will exacerbate the problem. Briefly,
while IFS="|" read -r first second third rest; do ...
leverages the shell's built-in field splitting functionality, but you probably want to convert the whole thing to a simple Awk script anyway.
echo "file_name,symbol,record_count" > "$1"
for f in "/data/myfolder"/*.gz; do
gunzip -c "$f" |
awk -F "\|" -v f="$f" -v OFS="," '
/H/ { if(ctr) print f, sym, ctr
ctr=0; sym=$3;
print sym >"/dev/stderr"
next }
{ ++ctr }'
done >>"$1"
This vaguely assumes that printing the lone sym is just for diagnostics. It should hopefully not be hard to see how this can be refactored if this is an incorrect assumption.

Need to add space at the end of each line using Unix shell script

I need to add space at the end of each line except the header lines.Below is the example of my file:
13120000005000002100000000000000000000081D000
231200000000000 000 00XY018710V000000000
231200000000000 000 00XY018710V000000000
13120000012000007000000000000000000000081D000
231200000000000 000 00XY057119V000000000
So 1st & 4th line(starting with 131200 ) is my header line...Except my header I want 7-8spaces at the end of each line.
Please find the code that I am currently using:
find_list=`find *.dat -type f`
Filename='*.dat'
filename='xyz'
for file in $find_list
do
sed -i -e 's/\r$/ /' "$file"
n=1
loopcounterpre=""
newfile=$(echo "$filename" | sed -e 's/\.[^.]*$//')".dat"
while read line
do
if [[ $line != *[[:space:]]* ]]
then
rowdetail=$line
loopcounter=$( echo "$rowdetail" | cut -b 1-6)
if [[ "$loopcounterpre" == "$loopcounter" ]]
then
loopcounterpre=$loopcounter
#Increases the counter for in the order of 001,002 and so on until the Pay entity is changed
n=$((n+1))
#Resets the Counter to 1 when the pay entity changes
else
loopcounterpre=$loopcounter
n=1
fi
printf -v m "%03d" $n
llen=$(echo ${#rowdetail})
rowdetailT=$(echo "$rowdetail" | cut -b 1-$((llen-3)))
ip=$rowdetailT$m
echo "$ip" >> $newfile
else
rowdetail=$line
echo "$rowdetail" >> $newfile
fi
done < $file
bye
EOF
done
The entire script can be replaced with one line of GNU sed:
sed -is '/^131200\|^1351000/!s/$/ /' $(find *.dat -type f)
Using awk:
$ awk '{print $0 ($0~/^(131200|1351000)/?"":" ")}' file
print current record $0 and if it starts with $0~/^(131200|1351000)/ print "" else : print " ".

Reverse the words but keep the order Bash

I have a file with lines. I want to reverse the words, but keep them in same order.
For example: "Test this word"
Result: "tseT siht drow"
I'm using MAC, so awk doesn't seem to work.
What I got for now
input=FILE_PATH
while IFS= read -r line || [[ -n $line ]]
do
echo $line | rev
done < "$input"
Here is a solution that completely avoids awk
#!/bin/bash
input=./data
while read -r line ; do
for word in $line ; do
output=`echo $word | rev`
printf "%s " $output
done
printf "\n"
done < "$input"
In case xargs works on mac:
echo "Test this word" | xargs -n 1 | rev | xargs
Inside your read loop, you can just iterate over the words of your string and pass them to rev
line="Test this word"
for word in "$line"; do
echo -n " $word" | rev
done
echo # Add final newline
output
tseT siht drow
You are actually in fairly good shape with bash. You can use string-indexes and string-length and C-style for loops to loop over the characters in each word building a reversed string to output. You can control formatting in a number of ways to handle spaces between words, but a simple flag first=1 is about as easy as anything else. You can do the following with your read,
#!/bin/bash
while read -r line || [[ -n $line ]]; do ## read line
first=1 ## flag to control space
a=( $( echo $line ) ) ## put line in array
for i in "${a[#]}"; do ## for each word
tmp= ## clear temp
len=${#i} ## get length
for ((j = 0; j < len; j++)); do ## loop length times
tmp="${tmp}${i:$((len-j-1)):1}" ## add char len - j to tmp
done
if [ "$first" -eq '1' ]; then ## if first word
printf "$tmp"; first=0; ## output w/o space
else
printf " $tmp" ## output w/space
fi
done
echo "" ## output newline
done
Example Input
$ cat dat/lines2rev.txt
my dog has fleas
the cat has none
Example Use/Output
$ bash revlines.sh <dat/lines2rev.txt
ym god sah saelf
eht tac sah enon
Look things over and let me know if you have questions.
Using rev and awk
Consider this as the sample input file:
$ cat file
Test this word
Keep the order
Try:
$ rev <file | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
tseT siht drow
peeK eht redro
(This uses awk but, because it uses no advanced awk features, it should work on MacOS.)
Using in a script
If you need to put the above in a script, then create a file like:
$ cat script
#!/bin/bash
input="/Users/Anastasiia/Desktop/Tasks/test.txt"
rev <"$input" | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
And, run the file:
$ bash script
tseT siht drow
peeK eht redro
Using bash
while read -a arr
do
x=" "
for ((i=0; i<${#arr}; i++))
do
((i == ${#arr}-1)) && x=$'\n'
printf "%s%s" $(rev <<<"${arr[i]}") "$x"
done
done <file
Applying the above to our same test file:
$ while read -a arr; do x=" "; for ((i=0; i<${#arr}; i++)); do ((i == ${#arr}-1)) && x=$'\n'; printf "%s%s" $(rev <<<"${arr[i]}") "$x"; done; done <file
tseT siht drow
peeK eht redro

copy text in another file and append different strings shell script

file=$2
isHeader=$true
while read -r line;
do
if [ $isHeader ]
then
sed "1i$line",\"BATCH_ID\"\n >> $file
else
sed "$line,1"\a >> $file
fi
isHeader=$false
done < $1
echo $file
In the first line I want to append a string and to the others lines I want to append the same string for the rest of the lines. I tried this but it doesn't work. I don't have any ideas, can somebody help me please?
Not entirely clear to me what you want to do, but if you simply want to append text at the end of each line, use echo in place of sed:
file=$2
isHeader=1
while read -r line;
do
if [ $isHeader ]
then
#sed "1i$line",\"BATCH_ID\"\n >> $file
echo "${line},\"BATCH_ID\"\n" > $file
else
#sed "$line,1"\a >> $file
echo "${line},1\a" >> $file
fi
isHeader=0
done < $1
cat $file
The accepted answer is slightly wrong because echo...\a produces a bell. Also, awk or sed support regular expressions and are 10x faster at line-by-line processing. Here it is in awk:
#! /bin/sh
script='NR == 1 { print $0 ",\"BATCH_ID\"" }
NR > 1 { print $0 ",1" }'
awk "$script" $1 > $2
In sed it's even simpler:
sed '1 s/$/,"BATCH_ID"/; 2,$ s/$/,1/' $1 > $2
To convince yourself of the speed, try this yourself:
$ time seq 100000 | while read f; do echo ${f}foo; done > /dev/null
real 0m2.068s
user 0m1.708s
sys 0m0.364s
$ time seq 100000 | sed 's/$/foo/' > /dev/null
real 0m0.166s
user 0m0.156s
sys 0m0.017s

bash, adding string after a line

I'm trying to put together a bash script that will search a bunch of files and if it finds a particular string in a file, it will add a new line on the line after that string and then move on to the next file.
#! /bin/bash
echo "Creating variables"
SEARCHDIR=testfile
LINENUM=1
find $SEARCHDIR* -type f -name *.xml | while read i; do
echo "Checking $i"
ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
cat $i | while read LINE; do
((LINENUM=LINENUM+1))
if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
echo "editing $i"
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
fi
done
fi
LINENUM=1
done
the bit I'm having trouble with is
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
if I just use $i at the end, it will output the content to the screen, if I use $i > $i then it will just erase the file and if I use $i >> $i it will get stuck in a loop until the disk fills up.
any suggestions?
Unfortunately awk dosen't have an in-place replacement option, similar to sed's -i, so you can create a temp file and then remove it:
awk '{commands}' file > tmpfile && mv tmpfile file
or if you have GNU awk 4.1.0 or newer, the -i inplace is added, so you can do:
awk -i inplace '{commands}' file
to modify the original
#cat $i | while read LINE; do
# ((LINENUM=LINENUM+1))
# if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
# echo "editing $i"
# awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
# fi
# done
# replaced by
sed -i 's/STRING_TO_SEARCH_FOR/&\n/g' ${i}
or use awk in place of sed
also
# ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
# if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
#by
if [ $( grep -c 'STRING_TO_SEARCH_FOR' ${i} ) -gt 0 ]; then
# if file are huge, if not directly used sed on it, it will be faster (but no echo about finding the file)
If you can, maybe use a temporary file?
~$ awk ... $i > tmpfile
~$ mv tmpfile $i
Or simply awk ... $i > tmpfile && mv tmpfile $i
Note that, you can use mktemp to create this temporary file.
Otherwise, with sed you can insert a line right after a match:
~$ cat f
auie
nrst
abcd
efgh
1234
~$ sed '/abcd/{a\
new_line
}' f
auie
nrst
abcd
new_line
efgh
1234
The command search if the line matches /abcd/, if so, it will append (a\) the line new_line.
And since sed as the -i to replace inline, you can do:
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
echo "editing $i"
sed -i "/STRING_TO_SEARCH_FOR/{a
\new line to insert
}" $i
fi

Resources