Script to append files with same part of the name - shell

I have a bunch of files (> 1000), which content has columns of numbers separates by space. I would like to reduce the number of files by appending the content of groups of them in one file.
All the files start with "*time_NUMBER*" followed by a number, and the rest of the filename (*pow_....txt*). For example : *time_0.6pow_0.1-173.txt*
I would like to append the files with the same NUMBER in a single file and make it with a script since I got ~70 different NUMBERs.
I have found
cat time_0.6pow_*.txt > time_0.6.txt
it works but would like to make a script for all the possible NUMBERs.
Regards

You can do it like this:
for fName in time_*pow_*.txt; do
s="${fName#time_}"
cat "$fName" >> time_"${s%%pow*}".txt
done

Related

how to merge multiple text files using bash and preserving column order

I'm new to bash, I have a folder in which there are many text files among them there's a group which are named namefile-0, namefile-1,... namefile-100. I need to merge these file all in a new file. The format of each of these files is: header and 3 columns of data.
It is very important that the format of the new file is:
3 * 100 columns of data respecting the order of the columns (123123123...).
I don't mind if the header is also repeated or not.
I'm also willing, in case it was necessary, to place all these files in a folder in which no other files are present.
I've tried to do something like this:
for i in {1..100}
do
paste `echo "namefile$i"` >> `echo "b"
done
which prints only the first file into b.
I've also tried to do this:
STR=""
for i in {1..100}
do
STR=$STR"namefile"$i" "
done
paste $STR > b
which prints everything but does not preserve the order of the columns.
You need to mention what delimeter separates columns in your file.
Assuming the columns are separated by a single space,
paste -d' ' namefile-* > newfile
Other conditions like existence of other similar files or directories in the working directory, stripping of headers etc can also be tackled but some more information needs to be provided in the question.
paste namefile-{0..100} >combined
paste namefile* > new_file_name

Split files into groups based on part of their name and zip them

I have a folder containing several files using naming convention like this (4 parts divided by underscore):
Part1_Part2_Part3_Part4.csv
for example:
AAA_XXX_AAA001_20991231.csv
AAA_XXX_AAA001_20991131.csv
AAA_XXX_AAA002_20991031.csv
AAA_XXX_AAA002_20990931.csv
BBB_XXX_BBB001_20991231.csv
BBB_XXX_BBB001_20991131.csv
BBB_XXX_BBB002_20991031.csv
BBB_XXX_BBB002_20990931.csv
I need to create a shell script, which groups them based on substrings Part1 and Part3 and creates a zip archive using naming convention like this:
Part1_Part3.zip
For example zip file called
"AAA_AAA001.zip" should contain files:
AAA_XXX_AAA001_20991231.csv
AAA_XXX_AAA001_20991131.csv
"AAA_AAA002.zip" should contain files:
AAA_XXX_AAA002_20991231.csv
AAA_XXX_AAA002_20991131.csv
The same as above with "BBB_BBBXXX.zip".
The structure of the files is fixed. Part1 is always at start and Part3 is always following the second underscore. The number of characters can vary though.
I'm completely new to shell scripting. I've spent several hours trying working with string, zipping files, etc., but I'm not able to use this knowledge to come up with a complete solution like this.
Any help is very much appreciated.
First get all combinations part1-part3 and make that an unique list.
You can add your own options to zip, I think you want something like
ls *_*_*.csv | cut -d_ -f1,3 | sort -u| while IFS=_ read -r part1 part3; do
echo "zipfile ${part1}_${part3}.zip"
echo "Files ${part1}_*_${part3}_*.csv"
zip ${part1}_${part3}.zip ${part1}_*_${part3}_*.csv
done
Edit: Corrected typo as suggested in the comment.

replace $1 variable in file with 1-10000

I want to create 1000s of this one file.
All I need to replace in the file is one var
kitename = $1
But i want to do that 1000s of times to create 1000s of diff files.
I'm sure it involves a loop.
people answering people is more effective than google search!
thx
I'm not really sure what you are asking here, but the following will create 1000 files named filename.n containing 1 line each which is "kite name = n" for n = 1 to n = 1000
for i in {1..1000}
do
echo "kitename = $i" > filename.$i
done
If you have mysql installed, it comes with a lovely command line util called "replace" which replaces files in place across any number of files. Too few people know about this, given it exists on most linux boxen everywhere. Syntax is easy:
replace SEARCH_STRING REPLACEMENT -- targetfiles*
If you MUST use sed for this... that's okay too :) The syntax is similar:
sed -i.bak s/SEARCH_STRING/REPLACEMENT/g targetfile.txt
So if you're just using numbers, you'd use something like:
for a in {1..1000}
do
cp inputFile.html outputFile-$a.html
replace kitename $a -- outputFile-$a.html
done
This will produce a bunch of files "outputFile-1.html" through "outputFile-1000.html", with the word "kitename" replaced by the relevant number, inside the file.
But, if you want to read your lines from a file rather than generate them by magic, you might want something more like this (we're not using "for a in cat file" since that splits on words, and I'm assuming here you'd have maybe multi-word replacement strings that you'd want to put in:
cat kitenames.txt | while read -r a
do
cp inputFile.html "outputFile-$a.html"
replace kitename "$a" -- kitename-$a
done
This will produce a bunch of files like "outputFile-red kite.html" and "outputFile-kite with no string.html", which have the word "kitename" replaced by the relevant name, inside the file.

Split text file into multiple files

I am having large text file having 1000 abstracts with empty line in between each abstract . I want to split this file into 1000 text files.
My file looks like
16503654 Three-dimensional structure of neuropeptide k bound to dodecylphosphocholine micelles. Neuropeptide K (NPK), an N-terminally extended form of neurokinin A (NKA), represents the most potent and longest lasting vasodepressor and cardiomodulatory tachykinin reported thus far.
16504520 Computer-aided analysis of the interactions of glutamine synthetase with its inhibitors. Mechanism of inhibition of glutamine synthetase (EC 6.3.1.2; GS) by phosphinothricin and its analogues was studied in some detail using molecular modeling methods.
You can use split and set "NUMBER lines per output file" to 2. Each file would have one text line and one empty line.
split -l 2 file
Something like this:
awk 'NF{print > $1;close($1);}' file
This will create 1000 files with filename being the abstract number. This awk code writes the records to a file whose name is retrieved from the 1st field($1). This is only done only if the number of fields is more than 0(NF)
You could always use the csplit command. This is a file splitter but based on a regex.
something along the lines of :
csplit -ks -f /tmp/files INPUTFILENAMEGOESHERE '/^$/'
It is untested and may need a little tweaking though.
CSPLIT

method for merging two files, opinion needed

Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.

Resources