Splitting CSV file into text files - bash

I have a CSV file of the form:
1,frog
2,truck
3,truck
4,deer
5,automobile
and so on, for about 50 000 entries. I want to create 50 000 separate .txt files named with the number before the comma and containing the word after the comma, like so:
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile
and so on.
This is the script I've written so far, but it does not work properly:
#!/bin/bash
folder=/home/data/cifar10
for file in $(find "$folder" -type f -iname "*.csv")
do
name=$(basename "$file" .txt)
while read -r tag line; do
printf '%s\n' "$line" >"$tag".txt
done <"$file"
rm "$file"
done

The issue is in your inner loop:
while read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You need to set IFS to , so that tag and line are parsed correctly:
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You can use shopt -s globstar instead of find, with Bash 4.0+. This will be immune to word splitting and globbing, unlike plain find:
shopt -s globstar nullglob
for file in /home/data/cifar10/**/*.csv; do
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
done
Note that the name set through name=$(basename "$file" .txt) statement is not being used in your code.

An awk alternative:
awk -F, '{print $2 > $1 ".txt"}' file.csv

awk 'BEGIN{FS=","} {print $1".txt contains: "$2}' file
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile

Related

Append out from reading lines in a txt file

I have a test.txt file with the following contents
100001
100003
100007
100008
100009
I am trying to loop through the text file and append each one with .xml.
Ex:
100001.xml
100003.xml
100007.xml
100008.xml
100009.xml
I have tried different variations of
while read p; do
echo "$p.zip"
done < test.txt
But it prints out weird like this
.xml01
.xml03
.xml07
.xml08
.xml09
Appending a .xml at the end of each line while removing CRLF, if present.
With sed and bash:
#!/bin/bash
sed -E $'s/\r?$/.xml/' test.txt
With awk:
awk -v suffix='.xml' '{sub(/\r?$/,suffix)}1' test.txt
Using it in a bash loop:
#!/bin/bash
while IFS='' read -r filename
do
printf '%q\n' "$filename"
done < <(
awk -v suffix='.xml' '{sub(/\r?$/,suffix)}1' test.txt
)
Or doing the whole thing in pure shell:
while IFS='' read -r filename
do
fullname="${filename%\r}.xml"
printf '%s\n' "$fullname"
done < test.txt

Rename files matching pattern in a loop - Bash

I have been trying to rename some specific files based on a table but with no success. It either renames all files or gives error.
The directory contains hundreds of files named with long barcodes and I want to rename only files containing the patter _1_.
Example
barcode_1_barcode_SL484171.fastq.gz barcode_2_barcode_SL484171.fastq.gz barcode_1_barcode_SL484370.fastq.gz barcode_2_barcode_SL484370.fastq.gz
mytable.txt
oldname
newname
barcode_1_barcode_SL484171
Description1
barcode_2_barcode_SL484171
Description1
barcode_1_barcode_SL484370
Description2
barcode_2_barcode_SL484370
Description2
Desire output:
Description1.R1.fastq.gz Description2.R1.fastq.gz
As you can see in the table there are two files per description but I only want to rename the ones with the _1_ pattern.
Code I have tried:
for i in *_1_*.fastq.gz; do read oldname newname; mv "$oldname" "$newname".R1.fastq.gz; done < mytable.txt
for i in $(grep '_1_' mytable.txt); do read -r oldname newname; mv ${oldname} ${newname}.R1.fastq.gz; done < mytable.txt
for i in $(grep '_1_' mytable.txt); do oldname=$(cut -f1 $i);newname=$(cut -f2 $i); ln -s ${oldname} ${newname}.R1.fastq.gz; done
while read -r oldname newname
do
if [[ $oldname =~ "_1_" ]]
then
mv $oldname $newname
fi
done < mytable.txt
Something like this.
#!/usr/bin/env bash
while IFS= read -r files; do ##: loop through the output of `grep 'barcode_1_barcode.*' table.txt`
while read -ru9 old_name prefix; do ##: loop through the output of `find . -name 'barcode_1_barcode*.gz' | grep -f <(cut -d' ' -f1 table.txt`
if [[ $files == *"$old_name"* ]]; then ##: If the filename from the output of find matches the first field of table.txt (space delimite)
old_filename="${files%.fastq.gz}" ##: Extract the filename without the fast.gz extesntion
extension="${files#"$old_filename"}" ##: Extract the extention .fast.gz without the filename
# mv -v "$files" "$prefix.R1${extension}"
printf '%s %s %s ==> %s\n' mv -v "$files" "$prefix.R1${extension}" ##: Rename the files to the desired output
fi
done 9< <(grep 'barcode_1_barcode.*' table.txt)
done < <(find . -name 'barcode_1_barcode*.gz' | grep -f <(cut -d' ' -f1 table.txt) ) ##: Remain the first column/field of table.txt
Output from the OP's sample data/files.
renamed './barcode_1_barcode_SL484370.fastq.gz' -> 'Description2.R1.fastq.gz'
renamed './barcode_1_barcode_SL484171.fastq.gz' -> 'Description1.R1.fastq.gz'
If you're satisfied with the output either move the # from the front of mv to the
front of printf or just delete the entire line with printf and remove the # from
mv in order for mv to actually rename the files.

Bash, deleting specific row from file

I have a file with filename and path to the file
I want to delete the the rows which have files that do not exist anymore
file.txt (For now all existing files):
file1;~/Documents/test/123
file2;~/Documents/test/456
file3;~/Test
file4;~/Files/678
Now if I delete any of the given files(file 2 AND file4 fore example) and run my script I want it to test if the file in the given row exists and remove the row if it does not
file.txt(after removing file2, file4):
file1;~/Documents/test/123
file3;~/Test
What I got so far(Not working at all):
-Does not want to run at all
#!/bin/sh
backup=`cat file.txt`
rm -f file.txt
touch file.txt
while read -r line
do
dir=`echo "$line" | awk -F';' '{print $2}'`
file=`echo "$line" | awk -F';' '{print $1}'`
if [ -f "$dir"/"$file" ];then
echo "$line" >> file.txt
fi
done << "$backup"
Here's one way:
tmp=$(mktemp)
while IFS=';' read -r file rest; do
[ -f "$file" ] && printf '%s;%s\n' "$file" "$rest"
done < file.txt > "$tmp" && mv "$tmp" file.txt
or if you don't want a temp file for some reason:
tmp=()
while IFS=';' read -r file rest; do
[ -f "$file" ] && tmp+=( "$file;$rest" )
done < file.txt &&
printf '%s\n' "${tmp[#]}" > file.txt
Both are untested but should be very close if not exactly correct.
If I understand, this should do it.
touch file.txt file2.txt
for i in `cat file.txt`; do
fp=`echo $i|cut -d ';' -f2`
if [ -e $fp ];then
echo "$i" >> file2.txt
fi
done
mv file2.txt file.txt

Bash script to remove lines containing any of a list of words

I have a large config file that I use to define variables for a script to pull from it, each defined on a single line. It looks something like this:
var val
foo bar
foo1 bar1
foo2 bar2
I have gathered a list of out of date variables that I want to remove from the list. I could go through it manually, but I would like to do it with a script, which would be at least more stimulating. The file that contains the vlaues may contain multiple instances. The idea is to find the value, and if it's found, remove the entire line.
Does anyone know if this is possible? I know sed does this but I do not know how to make it use a file input.
#!/bin/bash
shopt -s extglob
REMOVE=(foo1 foo2)
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && echo "$LINE"
done < input_file.txt > output_file.txt
Or (Use with a copy first)
#!/bin/bash
shopt -s extglob
FILE=$1 REMOVE=("${#:2}")
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file pattern1 pattern2 ...
Or
#!/bin/bash
shopt -s extglob
FILE=$1 PATTERNS_FILE=$2
readarray -t REMOVE < "$PATTERNS_FILE"
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file patterns_file
Here's one with sed. Add words to the array. Then use
./script target_filename
(assuming you put the following in a file called script). (Not very efficient). I think it might be more efficient if we concat the words and put it in the regex like bbonev did
#!/bin/bash
declare -a array=("foo1" "foo2")
for i in "${array[#]}";
do
sed -i "/^${i}\s.*/d" $1
done
It's actually even simpler using file input
If you have a word file
word1
word2
word3
.....
then the following will do the job
#!/bin/bash
while read i;
do
sed -i "/^${i}\s.*/d" $2
done <$1
usage:
./script wordlist target_file

Shell script to get one to one map and rename the filename

I have 2 files sorted by numerically. I need help with shell script to read these 2 files and do a 1:1 mapping and rename the filenames with the mapped case#;
For example:
cat case.txt
10_80
10_90
cat files.txt
A BCD_x 1.pdf
A BCD_x 2.pdf
ls pdf_dir
A BCD_x 1.pdf A BCD_x 2.pdf
Read these 2 txt and rename the pdf files in pdf_dir :
A BCD_x 1.pdf as A BCD_10_80.pdf
A BCD_x 1.pdf as A BCD_10_90.pdf
Use paste to create the "mapping", then shell facilities to do the renaming.
shopt -s extglob
while IFS=$'\t' read file replacement; do
echo mv "$file" "${file/x +([0-9])/$replacement}"
done < <(paste files.txt case.txt)
remove "echo" when you're satisfied.
Using awk:
awk 'FNR==NR{a[FNR]=$0;next}
{f=$0; sub(/_x /, "_" a[FNR] " "); system("mv \"" f "\" \"" $0 "\"")}' case.txt files.txt
Using normal array and sed substitution -
Removing echo before mv will provide you the move capability.
You can change the /path/to/pdf_dir/ to specify your path to desired directory
#!/bin/bash
i=0
while read line
do
arr[i]="$line"
((i=i+1));
done < files.txt
i=0
while read case
do
newFile=$(echo "${arr[i]}" | sed "s/x/"$case"/")
echo mv /path/to/pdf_dir/"${arr[i]}" /path/to/pdf_dir/"$newFile"
((i=i+1))
done < case.txt
If you have Bash 4.0 this could help:
#!/bin/bash
declare -A MAP
I=0
IFS=''
while read -r CASE; do
(( ++I ))
MAP["A BCD_x ${I}.pdf"]="A BCD_${CASE}.pdf"
done < case.txt
while read -r FILE; do
__=${MAP[$FILE]}
[[ -n $__ ]] && echo mv "$FILE" "$__" ## Remove echo when things seem right already.
done < files.txt
Note: Make sure you run the script in UNIX file format.

Resources