Building a csv file from multiple files - bash

I have on a folder multiple txt file containing one or several lines. Each file name is an email address containing different email(s) address(es) inside.
For example, I have 3 files on my folder :
distribution-list1#example.com.txt
distribution-list2#example.com.txt
distribution-list3#example.com.txt
Content of each files:
cat distribution-list1#example.com.txt
john#example.com
aurel#example.com
cat distribution-list2#example.com.txt
doe#example.com
cat distribution-list3#example.com.txt
jack#example.com
gilbert#example.com
jane#example.com
I would like to build only one file containing those data:
distribution-list1#example.com;john#example.com
distribution-list1#example.com;aurel#example.com
distribution-list2#example.com;doe#example.com
distribution-list3#example.com;jack#example.com
distribution-list3#example.com;gilbert#example.com
distribution-list3#example.com;jane#example.com

lists_merge.sh
#!/usr/bin/env bash
shopt -s nullglob;
for fname in *.txt;
do
while read line;
do
printf "%s;%s\n" "$fname" "$line";
done <"$fname";
done;
output
$ ./lists_merge.sh
distribution-list1#example.com.txt;john#example.com
distribution-list1#example.com.txt;aurel#example.com
distribution-list2#example.com.txt;doe#example.com
distribution-list3#example.com.txt;jack#example.com
distribution-list3#example.com.txt;gilbert#example.com
distribution-list3#example.com.txt;jane#example.com
note: script assumed to be in same directory as distribution list text
files. assumed no other text files are in this directory
reference
nullglob info

You can use sed:
for emailfile in *.txt; do
email=${emailfile%.txt}
sed "s:^:$email;:" "$emailfile"
done
This will fail if an email ID has a colon (:), but I doubt you'd have such an example.

Related

Bash; How to combine multiple files into one file

I have multiple file in one directory, I want to combine each file into a single file using Bash. The output need to contain the file name and then list its contents. Example would be
$ cat File 1
store
$ cat File 2
bank
$ cat File 3
car
Desired output is in a single file named master
$ cat master
File 1
store
File 2
bank
File 3
car
for FILE in "File 1" "File 2" "File 3"; do
echo "$FILE"
cat "$FILE"
done > master
What you have asked for is what cat is meant for; it's short for concatenate, because it concatenates the contents of files together.
But it doesn't inject the filenames into the output. If you want the filenames there, your best bet is probably a loop:
for f in "File 1" "File 2" "File 3"; do
printf '%s\n' "$f"
cat "$f"
done > master
This will do the job
for f in File{1..3} ; do
echo $f >> master;
cat $f >> master;
done
With gnu sed
sed -s '1F' *
'-s'
'--separate'
By default, 'sed' will consider the files specified on the command
line as a single continuous long stream. This GNU 'sed' extension
allows the user to consider them as separate files: range addresses
(such as '/abc/,/def/') are not allowed to span several files, line
numbers are relative to the start of each file, '$' refers to the
last line of each file, and files invoked from the 'R' commands are
rewound at the start of each file.
'F'
Print out the file name of the current input file (with a trailing
newline).

bash to update filename in directory based on partial match to another

I am trying to use bash to rename/update the filename of a text file in /home/cmccabe/Desktop/percent based on a partial match of digits with another text file in /home/cmccabe/Desktop/analysis.txt. The match will always be in either lines 3,4,or 5 of this file. I am not able to do this but hopefully the 'bash` below is a start. Thank you :).
text file in /home/cmccabe/Desktop/percent - there could be a maximum of 3 files in this directory
00-0000_fbn1_20xcoverage.txt
text file in /home/cmccabe/Desktop/analysis.txt
status: complete
id names:
00-0000_Last-First
01-0101_LastN-FirstN
02-0202_La-Fi
desired result in /home/cmccabe/Desktop/percent
00-0000_Last-First_fbn1_20xcoverage.txt
bash
for filename in /home/cmccabe/Desktop/percent/*.txt; do echo mv \"$filename\" \"${filename//[0-9]-[0-9]/}\"; done < /home/cmccabe/Desktop/analysis.txt
Using a proper Process-Substitution syntax with a while-loop,
You can run the script under /home/cmccabe/Desktop/percent
#!/bin/bash
# ^^^^ needed for associative array
# declare the associative array
declare -A mapArray
# Read the file from the 3rd line of the file and create a hash-map
# as mapArray[00-0000]=00-0000_Last-First and so on.
while IFS= read -r line; do
mapArray["${line%_*}"]="$line"
done < <(tail -n +3 /home/cmccabe/Desktop/analysis.txt)
# Once the hash-map is constructed, rename the text file accordingly.
# echo the file and the name to be renamed before invoking the 'mv'
# command
for file in *.txt; do
echo "$file" ${mapArray["${file%%_*}"]}"_${file#*_}"
# mv "$file" ${mapArray["${file%%_*}"]}"_${file#*_}"
done
This is another similar bash approach:
while IFS="_" read -r id newname;do
#echo "id=$newid - newname=$newname" #for cross check
oldfilename=$(find . -name "${id}*.txt" -printf %f)
[ -n "$oldfilename" ] && echo mv \"$oldfilename\" \"${id}_${newname}_${oldfilename#*_}\";
done < <(tail -n+3 analysis)
We read the analysis file and we split each line (i.e 00-0000_Last-First) to two fields using _ as delimiter:
id=00-000
newname=Last-First
Then using this file id we read from file "analysis" we check (using find) to see if a file exists starting with the same id.
If such a file exists, it's filename is returned in variable $oldfilename.
If this variable is not empty then we do the mv.
tail -n+3 is used to ignore the first three lines of the file results.txt
Test this solution online here

Remove specific words from a text file in bash

I want to remove specific words from a txt file in bash.
Here is my current script:
echo "Sequenzia Import Tag Sidecar Processor v0.2"
echo "=============================================================="
rootfol=$(pwd)
echo "Selecting files from current folder........"
images=$(ls *.jpg *.jpeg *.png *.gif)
echo "Converting sidecar files to folders........"
for file in $images
do
split -l 8 "$file.txt" tags-
for block in tags-*
do
foldername=$(cat "$rootfol/$block" | tr '\r\n' ' ')
FOO_NO_EXTERNAL_SPACE="$(echo -e "${foldername}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
mkdir "$FOO_NO_EXTERNAL_SPACE" > /dev/null
cd "$FOO_NO_EXTERNAL_SPACE"
done
mv "$rootfol/$file" "$file"
cd "$rootfol"
rm tags-* $file.txt
done
echo "DONE! Move files to import folder"
What it does is read the txt file that is named the same as a image and create folders that are interpreted as tags during a import into a Sequenzia image board (based in myimoutobooru) (https://code.acr.moe/kazari/sequenzia).
What i want to do is remove specific words (actually there symbol combinations) from the sidecar file so that they do not cause issues with the import process.
Combinations like ">_<" and ":o" i want to remove from the file.
What can i add that allows me do this with a list of illegal words considering my current script.
Before the line "split -l 8 "$file.txt" tags-" I suggest you clean up the $file.txt using something like:
sef -f sedscript <"$file.txt" >tempfile
sedscript is a file that you create beforehand containing all your unwanted strings, e.g.
s/>_<//g
s/:o//g
You'd change your split command to use tempfile.
Experimenting with stdin/stdout on my PC suggests that multiple matches in a sed script are executed in the same pass over the input file. Therefore is the file is large, this appraoch avoids reading the file multiple times.
another variant of this approach is:
sed -e s/>_<//g -e s/:o//g <infile >outfile
repeat the
-e s/xxx//g
option as many times as required.
You can create a file which lists out your illegal strings and iterate through the lines of the file, using regex to remove each one from your input like this.

Renames numbered files using names from list in other file

I have a folder where there are books and I have a file with the real name of each file. I renamed them in a way that I can easily see if they are ordered, say "00.pdf", "01.pdf" and so on.
I want to know if there is a way, using the shell, to match each of the lines of the file, say "names", with each file. Actually, match the line i of the file with the book in the positiĆ³n i in sort order.
<name-of-the-book-in-the-1-line> -> <book-in-the-1-position>
<name-of-the-book-in-the-2-line> -> <book-in-the-2-position>
.
.
.
<name-of-the-book-in-the-i-line> -> <book-in-the-i-position>
.
.
.
I'm doing this in Windows, using Total Commander, but I want to do it in Ubuntu, so I don't have to reboot.
I know about mv and rename, but I'm not as good as I want with regular expressions...
renamer.sh:
#!/bin/bash
for i in `ls -v |grep -Ev '(renamer.sh|names.txt)'`; do
read name
mv "$i" "$name.pdf"
echo "$i" renamed to "$name.pdf"
done < names.txt
names.txt: (line count must be the exact equal to numbered files count)
name of first book
second-great-book
...
explanation:
ls -v returns naturally sorted file list
grep excludes this script name and input file to not be renamed
we cycle through found file names, read value from file and rename the target files by this value
For testing purposes, you can comment out the mv command:
#mv "$i" "$name"
And now, simply run the script:
bash renamer.sh
This loops through names.txt, creates a filename based on a counter (padding to two digits with printf, assigning to a variable using -v), then renames using mv. ((++i)) increases the counter for the next filename.
#!/bin/bash
i=0
while IFS= read -r line; do
printf -v fname "%02d.pdf" "$i"
mv "$fname" "$line"
((++i))
done < names.txt

Add one text to multiple files using bash

I have many files with the extension .com, so the files are named 001.com, 002.com, 003.com, and so on.
And I have another file called headname which contains the following information:
abc=chd
dha=djj
cjas=FILENAME.chk
dhdh=hsd
I need to put the information of the file headname inside (and at the begin of) the files 001.com, 002.com, 003.com and so on... But FILENAME needs to be the filename of the file that will receive the headname information (without the .com extension).
So the output need to be:
For the 001.com:
abc=chd
dha=djj
cjas=001.chk
dhdh=hsd
For the 002.com:
abc=chd
dha=djj
cjas=002.chk
dhdh=hsd
For the 003.com:
abc=chd
dha=djj
cjas=003.chk
dhdh=hsd
And so on...
set -e
for f in *.com
do
cat <(sed "s#^cjas=FILENAME.chk\$#cjas=${f%.com}.chk#" headname) "$f" > "$f.new"
mv -f "$f.new" "$f"
done
Explanation:
for f in *.com -- this loops over all file names ending with .com.
sed is a program that can be used to replace text.
s#...#...# is the substitute command.
${f%.com} is the file name without the .com suffix.
cat <(...) "$f" -- this merges the new head with the body of the .com file.
The output of cat is stored into a file named 123.com.new -- mv -f "$f.new" "$f" is used to rename 123.com.new to 123.com.
Something like this should work:
head=$(<headname) # read head file into variable
head=${head//$'\n'/\\n} # replace literal newlines with "\n" for sed
for f in *.com; do # loop over all *.com files
# make a backup copy of the file (named 001.com.bak etc).
# insert the contents of $head with FILENAME replaced by the
# part of the filename before ".com" at the beginning of the file
sed -i.bak "1i${head/FILENAME/${f%.com}}" "$f"
done

Resources