parsing many variables to another file contain many rows - bash

I have an issue while parsing many variables which extracted by cut command to another file contain many rows, I need to set the variables to the end of each row in sequence.
EX: file 100.txt contain 1000 rows and contain 3 fields A,B,C
another file called pin contain 1000 rows and contain 1 filed 2222
I need to take it one by one and inserted at the end of each row into 100.txt file.
while IFS= read -r line; do
sed -i "/:[0-9]*$/ ! s%$%,$line%" "100.txt"
done < pin.txt
What I have got:
1,2,3,2222,3333
1,2,3,2222,3333
What I expected:
1,2,3,2222
1,2,3,3333

If both files have the same number of lines, paste is your friend:
paste -d, 100.txt pin.txt > tmp.txt
mv -f tmp.txt 100.txt

Here is i would do it using a while read loop without the sed
while IFS= read -r file1 <&3; do
IFS= read -r file2
printf '%s,%s\n' "$file1" "$file2"
done 3<100.txt < pin.txt
Using mapfile bash4+ only.
mapfile -t file1 < 100.txt
mapfile -t file2 < pin.txt
for i in "${!file1[#]}"; do
printf '%s,%s\n' "${file1[$i]}" "${file2[$i]}"
done
Of course those shell loop would be very slow on large data/size files.

Related

Assign content of a file to an array in bash

I have a file that contains parts of file names either as newline (or separated by spaces). Lets take the following example:
cat file.txt
1
4
500
The actual file names are file_1.dat, file_2.dat, file_3.dat, file_4.dat file_500.dat, and so on.
I want to combine only those files whose names (or part of the names) are stored in file.txt.
To do so I am doing the following:
## read the file and assign to an array
array=()
while IFS= read -r line; do
array+=($line)
done < file.txt
## combine the contents of the files
for file in ${array[#]}
do
cat "file_$file.dat"
done > output.dat
Now in this solution what I don't like is the assignment of the array, that I have to run a do loop for this.
I tried to use
mapfile -t array < <(cat file.txt)
I also tried,
array=( $(cat file2.txt) )
The array that is needed finally is
array=(1 4 500)
In some of the answers(in this platform), I see that doing in the above way (the last option) might be harmful. I wanted to have some clarification on what to do for such assignments.
My question is: in this situation what is the best (safe and fast) way to assign the content of a file into an array?
array=( $(cat file2.txt) )
does not necessarily put each line in the array. It puts each word resulting from word-splitting and globbing into the array.
Consider this file
1
2 3
*
mapfile -t array < file.txt will create an array with the elements 1, 2 3, and *.
array=( $(cat file.txt) ) will create an array with the elements 1, 2, and 3, along with an element for each file name in the current directory.
Using mapfile is both safer and makes your intent of storing one line per element clearer.
However, there is no need for an array at all. You can process each file as you pull a line from your input file.
while IFS= read -r line; do
cat "file_$line.dat"
done < file.txt > output.dat
If you don’t want to deduplicate the file name fragments:
readarray -t lines < file.txt
declare -a fragments
for line in "${lines[#]}"; do
fragments+=($line)
done
names=("${fragments[#]/#/file_}")
names=("${names[#]/%/.dat}")
cat "${names[#]}"
If you do want to deduplicate the file name fragments:
readarray -t lines < file.txt
declare -Ai set_of_fragments
for line in "${lines[#]}"; do
for fragment in $line; do
((++set_of_fragments["${fragment}"]))
done
done
fragments=("${!set_of_fragments[#]}")
names=("${fragments[#]/#/file_}")
names=("${names[#]/%/.dat}")
cat "${names[#]}"

choosing column name in .csv file

I'm really new to bash programming. I want to write the results of two variables into a .csv file. I use this command:
while IFS= read -r line; do
ip=$(dig +short $line)
echo "${line}, ${ip}" >> file.csv
done < domains
It works file. It creates two columns in file.csv and writes the result of $line in the first column and the result of $ip in the 2nd column.
I wanted to know if there is a way to choose a name for these columns. For example
column1 : $line & column2:$ip
In CSV files column names are the contents of the first row, so (before your loop) you can write:
echo "Line,Ip" > file.csv.tmp # Add columns in new temporary file
cat file.csv.tmp >> file.csv # Append all the data of the original file
rm file.csv # Remove the original file
mv file.cvs.tmp file.csv # Rename the temporary file
Or you can also simply use this other method:
echo "Line,Ip
$(cat file.cs)" > file.csv
I hope it helps.
As helen pointed out in the comments, if the file should be overwritten with every run then you can simply add echo "Line,Ip" > file.csv before the loop.

Replacing the duplicate uuids across multiple files

I am trying to replace the duplicate UUIDs from multiple files in a directory. Even the same file can have duplicate UUIDs.
I am using Unix utilities to solve this.
Till now I have used grep, cut, sort and uniq to find all the duplicate UUIDs across the folder and store it in a file (say duplicate_uuids)
Then I tried sed to replace the UUIDs by looping through the file.
filename="$1"
re="*.java"
while read line; do
uuid=$(uuidgen)
sed -i'.original' -e "s/$line/$uuid/g" *.java
done < "$filename"
As you would expect, I ended up replacing all the duplicate UUIDs with new UUID but still, it is duplicated throughout the file!
Is there any sed trick that can work for me?
There are a bunch of ways this can likely be done. Taking a multi-command approach using a function might give you greater flexibility if you want to customize things later, for example:
#!/bin/bash
checkdupes() {
files="$*"
for f in $files; do
filename="$f"
printf "Searching File: %s\n" "${filename}"
while read -r line; do
arr=( $(grep -n "${line}" "${filename}" | awk 'BEGIN { FS = ":" } ; {print $1" "}') )
for i in "${arr[#]:1}"; do
sed -i '' ''"${i}"'s/'"${line}"'/'"$(uuidgen)"'/g' "${filename}"
printf "Replaced UUID [%s] at line %s, first found on line %s\n" "${line}" "${i}" "${arr[0]}"
done
done< <( sort "${filename}" | uniq -d )
done
}
checkdupes /path/to/*.java
So what this series of commands does is to first sort the duplicates (if any) in whatever file you choose. It takes those duplicates and uses grep and awk to create an array of line numbers which each duplicate is found. Looping through the array (while skipping the first value) will allow the duplicates to be replaced by a new UUID and then re-saving the file.
Using a duplicate list file:
If you want to use a file with a list of dupes to search other files and replace the UUID in each of them that match it's just a matter of changing two lines:
Replace:
for i in "${arr[#]:1}"; do
With:
for i in "${arr[#]}"; do
Replace:
done< <( sort "${filename}" | uniq -d )
With:
done< <( cat /path/to/dupes_list )
NOTE: If you don't want to overwrite the file, then remove sed -i '' at the beginning of the command.
This worked for me:
#!/bin/bash
duplicate_uuid=$1
# store file names in array
find . -name "*.java" > file_names
IFS=$'\n' read -d '' -r -a file_list < file_names
# store file duplicate uuids from file to array
IFS=$'\n' read -d '' -r -a dup_uuids < $duplicate_uuid
# loop through all files
for file in "${file_list[#]}"
do
echo "$file"
# Loop through all repeated uuids
for old_uuid in "${dup_uuids[#]}"
do
START=1
# Get the number of times uuid present in this file
END=$(grep -c $old_uuid $file)
if (( $END > 0 )) ; then
echo " Replacing $old_uuid"
fi
# Loop through them one by one and change the uuid
for (( c=$START; c<=$END; c++ ))
do
uuid=$(uuidgen)
echo " [$c of $END] with $uuid"
sed -i '.original' -e "1,/$old_uuid/s/$old_uuid/$uuid/" $file
done
done
rm $file.original
done
rm file_names

Remove lines partially matching other lines in a file

I have the following lines in input.txt:
client_citic_plat_fix44;CITICHK;interbridge_ulnet_se_eqx
client_citic_plat_fix44;CITICHK;interbridge_ulnet_se_eqx;CITICHK;interbridge_hk_eqx
client_dkp_crd;DELIVERTOCOMPID;DESTINATION
client_dkp_crd;NORD;interbridge_fr
client_dkp_crd;NORD;interbridge_fr;broker_nordea_2
client_dkp_crd;AVIA;interbridge_fr
client_dkp_crd;AVIA;interbridge_fr;interbridge_ld
client_dkp_crd;SEBAP;interbridge_fr
client_dkp_crd;SEBAP;interbridge_fr;broker_seb_ss_thl
client_epf_crd;DELIVERTOCOMPID;DESTINATION
I need some bash (awk/sed) script to remove the lines that are partially similar to others. Desired output should be:
client_citic_plat_fix44;CITICHK;interbridge_ulnet_se_eqx;CITICHK;interbridge_hk_eqx
client_dkp_crd;DELIVERTOCOMPID;DESTINATION
client_dkp_crd;NORD;interbridge_fr;broker_nordea_2
client_dkp_crd;AVIA;interbridge_fr;interbridge_ld
client_dkp_crd;SEBAP;interbridge_fr;broker_seb_ss_thl
client_epf_crd;DELIVERTOCOMPID;DESTINATION
Columns 1, 2 and 3 are always similar and I always want to remove the shortest line between the two compared.
Thanks!
Here's a solution using grep and sed:
#!/bin/bash
file="filepath"
while IFS= read -r line;do
(($(grep $line "$file" -c)>1)) && sed -i "/^$line$/d" "$file"
done <"$file"
Note: This will replace your file.
To not replace your file and to put the output to another file, you can do this:
#!/bin/bash
infile="infilepath"
outfile="outfilepath"
cp "$infile" "$outfile"
while IFS= read -r line;do
(($(grep $line "$infile" -c)>1)) && sed -i "/^$line$/d" "$outfile"
done <"$infile"

how to read file from line x to the end of a file in bash

I would like know how I can read each line of a csv file from the second line to the end of file in a bash script.
I know how to read a file in bash:
while read line
do
echo -e "$line\n"
done < file.csv
But, I want to read the file starting from the second line to the end of the file. How can I achieve this?
tail -n +2 file.csv
From the man page:
-n, --lines=N
output the last N lines, instead of the last 10
...
If the first character of N (the number of bytes or lines) is a '+',
print beginning with the Nth item from the start of each file, other-
wise, print the last N items in the file.
In English this means that:
tail -n 100 prints the last 100 lines
tail -n +100 prints all lines starting from line 100
Simple solution with sed:
sed -n '2,$p' <thefile
where 2 is the number of line you wish to read from.
Or else (pure bash)...
{ for ((i=1;i--;));do read;done;while read line;do echo $line;done } < file.csv
Better written:
linesToSkip=1
{
for ((i=$linesToSkip;i--;)) ;do
read
done
while read line ;do
echo $line
done
} < file.csv
This work even if linesToSkip == 0 or linesToSkip > file.csv's number of lines
Edit:
Changed () for {} as gniourf_gniourf enjoin me to consider: First syntax generate a sub-shell, whille {} don't.
of course, for skipping only one line (as original question's title), the loop for (i=1;i--;));do read;done could be simply replaced by read:
{ read;while read line;do echo $line;done } < file.csv
There are many solutions to this. One of my favorite is:
(head -2 > /dev/null; whatever_you_want_to_do) < file.txt
You can also use tail to skip the lines you want:
tail -n +2 file.txt | whatever_you_want_to_do
Depending on what you want to do with your lines: if you want to store each selected line in an array, the best choice is definitely the builtin mapfile:
numberoflinestoskip=1
mapfile -s $numberoflinestoskip -t linesarray < file
will store each line of file file, starting from line 2, in the array linesarray.
help mapfile for more info.
If you don't want to store each line in an array, well, there are other very good answers.
As F. Hauri suggests in a comment, this is only applicable if you need to store the whole file in memory.
Otherwise, you best bet is:
{
read; # Just a scratch read to get rid (pun!) of the first line
while read line; do
echo "$line"
done
} < file.csv
Notice: there's no subshell involved/needed.
This will work
i=1
while read line
do
test $i -eq 1 && ((i=i+1)) && continue
echo -e "$line\n"
done < file.csv
I would just get a variable.
#!/bin/bash
i=0
while read line
do
if [ $i != 0 ]; then
echo -e $line
fi
i=$i+1
done < "file.csv"
UPDATE Above will check for the $i variable on every line of csv. So if you have got very large csv file of millions of line it will eat significant amount of CPU cycles, no good for Mother nature.
Following one liner can be used to delete the very first line of CSV file using sed and then output the remaining file to while loop.
sed 1d file.csv | while read d; do echo $d; done

Resources