Writing to CSV using bash sed not working as expected - bash

I'm having some trouble getting this code to work, and I have no idea why it's not, Maybe one of you gurus can lend me a hand.
To begin with I have two CSV files structured as such:
Book1.csv:
Desc,asset,asset name,something,waiver,waiver name,init date,wrong date,blah,blah,target
akdhfa,2014,adskf,kadsfjh,123-4567,none,none,none,none,none,BOOP
Book2.csv
Desc,asset,asset name,something,waiver,waiver name,init date,wrong date,blah,blah,target
akdhfa,2014,adskf,kadsfjh,123-4567,none,none,none,none,none
(Lack of "BOOP" on the second book)
What I want is to scan Book1.csv for column 11. If it's there, find the matching row in Book2.csv based on asset and waiver. Then simply append the target to that line.
Here's what I've tried so far:
#!/bin/bash
oldIFS=IFS
IFS=$'\n'
HOME=($(cat Book1.csv))
for i in "${HOME[#]}"
do
target=`echo $i | cut -d "," -f 11`
asset=`echo $i | cut -d "," -f 2`
waiv=`echo $i | cut -d "," -f 5`
if [ "$target" != "target" ]
then
sed -i '/*${asset}*${waiv}*/s/$/,${target}/' Book2.csv
fi
done
IFS=oldIFS
Everything seems to be working except for the sed command. Any suggestions?

You are using
sed -i '/*${asset}*${waiv}*/s/$/,${target}/' Book2.csv
which means that the variables are not expanded (the ' quotes "hide" them).
Also the * needs something "in front of it" - probably you meant to use .* (otherwise you are looking for "any number of repeats of the last character in asset, etc.).
Just change it to
sed -i "/.*${asset}.*${waiv}.*/s/$/,${target}/" Book2.csv
Now the variables will be replaced with their value before the sed command runs, and the quantifier (*) should work properly, as it has something to quantify (.)...

You're using single quotes, which inhibit variable expansion. Change to double quotes.
This awk might be tidier:
awk -F, -v OFS=, '
NR == FNR {boop[$2,$5] = $11; next}
NF != 11 {$11 = boop[$2,$5]}
{print}
' Book1.csv Book2.csv > tmpfile && mv tmpfile Book2.csv
awk does not have a -i option.

Related

Linux command echo files names until char

Here is my code
cd /bin/
echo *xyz?2* | cut -f 1 -d '.'
Please, how can i change this command to display files without extension ?
Bests.
Dump the filenames into an array and then use parameter expansion:
$ arr=(xyz?2*); echo "${arr[*]%.*}"
xyz32281 xyz32406 xyz32459 xyz3252 xyz7214 xyz8286
Assuming your filenames don't have any whitespace or glob characters.
You can just use printf '%s\n' instead of echo in your command:
printf '%s\n' *xyz?2* | cut -f 1 -d '.'
xyz32281
xyz32406
xyz32459
xyz3252
xyz7214
xyz8286
If you must use echo then use awk as this:
echo *xyz?2* | awk '{for(i=1; i<=NF; i++) print (split($i, a, /\./)==2 ? a[1] : $i)}'
xyz32281
xyz32406
xyz32459
xyz3252
xyz7214
xyz8286
This awk command iterated through each filename matched by glob pattern and splits each name by dot. If dot is found then first part is printed otherwise full filename is printed.
Your problem is that all files of echo *xyz?2* are shown in one line. When the filenames are without spaces/newlines, you can fix this by moving them to different lines and joining theem again when finished.
echo *xyz?2* | tr ' ' '\n' | cut -f 1 -d '.' | tr '\n' ' '| sed '$s/ $/\n/'
You can do this a lot easier with sed:
echo *xyz?2* | sed 's/[.][^. ]*//g'

Script returned '/usr/bin/awk: Argument list too long' in using -v in awk command

Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.
You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"
There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'

I want to re-arrange a file in an order in shell

I have a file test.txt like below spaces in between each record
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
Now I want to rearrange it as below into a output.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service/2.2
I've tried:
final_str=''
COMPOSITES=''
# Re-arranging the composites and preparing the composite property file
while read line; do
partition_val="$(echo $line | cut -d ',' -f 2)"
composite_temp1_val="$(echo $line | cut -d ',' -f 1)"
composite_val="$(echo $composite_temp1_val | cut -d '[' -f 1)"
version_temp1_val="$(echo $composite_temp1_val | cut -d '[' -f 2)"
version_val="$(echo $version_temp1_val | cut -d ']' -f 1)"
final_str="$partition_val/$composite_val/$version_val,"
COMPOSITES=$COMPOSITES$final_str
done <./temp/test.txt
We start with the file:
$ cat test.txt
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
We can rearrange that file as follows:
$ awk -F, -v RS=" " 'BEGIN{printf "COMPOSITES=";} {gsub(/[[]/, "/"); gsub(/[]]/, ""); if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1;}' test.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service2/2.2
The same command split over multiple lines is:
awk -F, -v RS=" " '
BEGIN{
printf "COMPOSITES=";
}
{
gsub(/[[]/, "/")
gsub(/[]]/, "")
if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1
}
' test.txt
Here's what I came up with.
awk -F '[],[]' -v RS=" " 'BEGIN{printf("COMPOSITES=")}/../{printf("%s/%s/%s,",$4,$1,$2);}' test.txt
Broken out for easier reading:
awk -F '[],[]' -v RS=" " '
BEGIN {
printf("COMPOSITES=");
}
/../ {
printf("%s/%s/%s,",$4,$1,$2);
}' test.txt
More detailed explanation of the script:
-F '[],[]' - use commas or square brackets as field separators
-v RS=" " - use just the space as a record separator
'BEGIN{printf("COMPOSITES=")} - starts your line
/../ - run the following code on any line that has at least two characters. This avoids the empty field at the end of a line terminating with a space.
printf("%s/%s/%s,",$4,$1,$2); - print the elements using a printf() format string that matches the output you specified.
As concise as this is, the format string does leave a trailing comma at the end of the line. If this is a problem, it can be avoided with a bit of extra code.
You could also do this in sed, if you like writing code in line noise.
sed -e 's:\([^[]*\).\([^]]*\).,\([^,]*\), :\3/\1/\2,:g;s/^/COMPOSITES=/;s/,$//' test.txt
Finally, if you want to avoid external tools like sed and awk, you can do this in bash alone:
a=($(<test.txt))
echo -n "COMPOSITES="
for i in "${a[#]}"; do
i="${i%,}"
t="${i%]*}"
printf "%s/%s/%s," "${i#*,}" "${i%[*}" "${t#*[}"
done
echo ""
This slurps the contents of test.txt into an array, which means your input data must be separated by whitespace, per your example. It then adds the prefix, then steps through the array, using Parameter Expansion to massage the data into the fields you need. The last line (echo "") is helpful for testing; you may want to eliminate it in practice.

copy files from mount point listed in a csv

I need to move over 100,000 img's from 1 server to another via a mount point, i have a .csv with them listed and im looking to script it
the csv looks like this
"images1\002_0001\thumb",53717902.jpg,/www/images/002_0001/thumb/
"images1\002_0001\thumb",53717901.jpg,/www/images/002_0001/thumb/
"images1\002_0001\thumb",53717900.jpg,/www/images/002_0001/thumb/
comma separated we have source name and destination
I was thinking of using awk to create each as a variable
SOURCE=`awk -F ',' '{ print $1 }' test.csv`
IMGNAME=`awk -F ',' '{ print $2 }' test.csv`
DEST=`awk -F ',' '{ print $3 }' test.csv`
this is where im getting stuck, my loop
while read line
do
cp $SOURCE${IMGNAME} $DEST
done <test.csv
this has copied the first name it finds into all the directories
You could use what you have and move the variable declaration into the loop referencing $line, or you could use IFS, as suggested below.
while IFS=, read -r src filename dest
do
cp $src${filename} $dest
done <test.csv
There are many way to do it, some example
If you have no spaces in the directories string: you can do even from shell
sed -E 's/"/cp /; s/",/\// ; s/,/ /;s/\\/\//g' test.csv | /bin/bash
It better if check it before you try. You speak about a lot of files...
sed -E 's/"/cp /; s/",/\// ; s/,/ /;s/\\/\//g' test.csv | less
It can happen that you have spaces in the string of the directory name like My Windows Like Dir Name. In this case you need double quotes (there are the double quote even for this reason maybe...)
You can do it using only awk(always from the shell)
awk -F',' '{gsub(/"/, "", $1); gsub(/\\/, "/", $1); print "cp \""$1"/" $2"\" \"" $3"\""}' test.csv | /bin/bash
or that is equivalent
awk -F',' '{gsub(/"/, "", $1); gsub(/\\/, "/", $1); printf ("cp \"%s/%s\" \"%s\"\n",$1,$2,$3)}' test.csv | /bin/bash
Check it always in advance, avoiding the last pipe |/bin/bash, putting maybe | head -n 10 to have only the first 10 lines.
The script can be written:
while IFS=, read -r SOURCE IMGNAME DEST
do
SOURCE=( ${SOURCE//\\/\/} ) # Here you need to change "\" in "/"
SOURCE=( ${SOURCE//\"/} ) # Here I like to kill ""
cp "${SOURCE}/${IMGNAME}" "$DEST" # Here I put again ""
done <test.csv
Note: I think you need to change "\" windows style in "/" unix style. So I required to the substitution rules.

awk parse filename and add result to the end of each line

I have number of files which have similar names like
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out
etc.
I need to get number before .csv(1 or 2) from the file name and put it into end of every line in file with TAB separator.
I have written this code, it finds number that I need, but i do not know how to put this number into file. There is space in the filename, my script breaks because of it.
Also I am not sure, how to send to script list of files. Now I am working only with one file.
My code:
#!/bin/sh
string="DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out"
out=$(echo $string | awk 'BEGIN {FS="_"};{print substr ($7,0,1)}')
awk ' { print $0"\t$out" } ' $string
for file in *
do
sfx=$(echo "$file" | sed 's/.*_\(.*\).csv.*/\1/')
sed -i "s/$/\t$sfx/" "$file"
done
Using sed:
$ sed 's/.*_\(.*\).csv.*/&\t\1/' file
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out 1
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out 2
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out 1
To make this for many files:
sed 's/.*_\(.*\).csv.*/&\t\1/' file1 file2 file3
OR
sed 's/.*_\(.*\).csv.*/&\t\1/' file*
To make this changed get saved in the same file(If you have GNU sed):
sed -i 's/.*\(.\).csv.*/&\t\1/' file
Untested, but this should do what you want (extract the number before .csv and append that number to the end of every line in the .out file)
awk 'FNR==1 { split(FILENAME, field, /[_.]/) }
{ print $0"\t"field[7] > FILENAME"_aaaa" }' *.out
for file in *_aaaa; do mv "$file" "${file/_aaaa}"; done
If I understood correctly, you want to append the number from the filename to every line in that file - this should do it:
#!/bin/bash
while [[ 0 < $# ]]; do
num=$(echo "$1" | sed -r 's/.*_([0-9]+).csv.*/\t\1/' )
#awk -e "{ print \$0\"\t${num}\"; }" < "$1" > "$1.new"
#sed -r "s/$/\t$num/" < "$1" > "$1.mew"
#sed -ri "s/$/\t$num/" "$1"
shift
done
Run the script and give it names of the files you want to process. $# is the number of command line arguments for the script which is decremented at the end of the loop by shift, which drops the first argument, and shifts the other ones. Extract the number from the filename and pick one of the three commented lines to do the appending: awk gives you more flexibility, first sed creates new files, second sed processes them in-place (in case you are running GNU sed, that is).
Instead of awk, you may want to go with sed or coreutils.
Grab number from filename, with grep for variety:
num=$(<<<filename grep -Eo '[^_]+\.csv' | cut -d. -f1)
<<<filename is equivalent to echo filename.
With sed
Append num to each line with GNU sed:
sed "s/\$/\t$num" filename
Use the -i switch to modify filename in-place.
With paste
You also need to know the length of the file for this method:
len=$(<filename wc -l)
Combine filename and num with paste:
paste filename <(seq $len | while read; do echo $num; done)
Complete example
for filename in DWH_Export*; do
num=$(echo $filename | grep -Eo '[^_]+\.csv' | cut -d. -f1)
sed -i "s/\$/\t$num" $filename
done

Resources