combine files same prefix text files into one - bash

I am trying to use bash to merge/combine all text files in a directory with the same prefix into one text file. Thank you :).
directory
111.txt
aaa
aaa
222_1.txt
bbb
222_2.txt
ccc
ccc
333_1.txt
aaa
333_2.txt
ccc
ccc
333_3.txt
bbb
desired
111.txt
aaa
aaa
222.txt
bbb
ccc
ccc
333.txt
aaa
ccc
ccc
bbb
bash
for file in `ls`|cut -d"_" -f1 ; do
cat ${file}_* > ${file}
done

This is a good use of an associative array as a set. Iterate over the file names, trimming the trailing _* from each name before adding it to the associative array. Then you can iterate over the array's keys, treating each one as a filename prefix.
# IMPORTANT: Assumes there are no suffix-less file names that contain a _
declare -A prefixes
for f in *; do
prefixes[${f%_*}]=
done
for f in "${!prefixes[#]}"; do
[ -f "$f".txt ] && continue # 111.txt doesn't need anything done
cat "$f"_* > "$f".txt
done

build a test environment just as you did
mkdir -p tmp/test
cd !$
touch {111,222,333}.{txt,_2.txt,_3.txt}
cat > 111.txt
aaa
aaa
and so on
then you know how to increment filnames :
for i in $( seq 1 3 ) ; do echo $i* ; done
111._2.txt 111._3.txt 111.txt
222._2.txt 222._3.txt 222.txt
333._2.txt 333._3.txt 333.txt
so you make your resulting files and here is the answer of mechanism to your needs :
for i in $( seq 1 9 ) ; do cat $i* >> new.$i.txt ; done
and finaly
ls -l new.[1-3]*
-rw-r--r-- 1 francois francois 34 Aug 4 14:04 new.1.txt
-rw-r--r-- 1 francois francois 34 Aug 4 14:04 new.2.txt
-rw-r--r-- 1 francois francois 34 Aug 4 14:04 new.3.txt
all 3* contents are in new.".txt for example here.
you only have to set the desired file destination to add in the content & if needed but not set in initial question a sorting of datas by alphabetic order or numerical... etc

Related

Selecting values based on parameter in bash

I am writing my first bash script and I can't find the solution to my problem.
Lets say I am calling ls -l and I want to save the names of certain files to a variable.
Output of ls -l:
-rw-r--r-- 1 user1 user1 125 Apr 19 2021 aaa
drwxrwxr-x 5 user2 user2 4096 Sep 7 15:54 bbbb
drwxr-xr-x 4 user3 user3 4096 Mär 16 2021 cccc
drwxr-xr-x 7 user1 user1 4096 Mai 18 15:32 asdf
To parse the output I use the following command:
`ls -l | while read a b c d e f g h j; do echo $c $j
Which results to:
user1 aaa
user2 bbbb
user3 cccc
user1 asdf
Now the step I cant figure it out is how to filter out based on on the values of j. Lets say, we have an array values=(aaa cccc). How could I extend my command, so that it prints out the users only if the value of j is a value in the array values ?
Final result should be:
user1 aaa
user3 cccc
How could I extend my command, so that it prints out the users only if the value of j is a value in the array values ?
For each line, check if the second column is in the array. Check if a Bash array contains a value
containsElement () {
local e match="$1"
shift
for e; do [[ "$e" == "$match" ]] && return 0; done
return 1
}
result="user1 aaa
user2 bbbb
user3 cccc
user1 asdf"
values=(aaa cccc)
printf "%s\n" "$result" |
while IFS=' ' read -r user file; do
if containsElement "$file" "${values[#]}"; then
printf "%s %s\n" "$user" "$file"
fi
done
A more crude solution would be to:
... | grep -f <(printf ' %s$\n' "${values[#]}")
It would probably be simpler if your array was an associative one (supported by recent versions of bash):
declare -A values=([aaa]=1 [cccc]=1)
ls -l | while read a b c d e f g h j; do [ -v values[$j] ] && echo "$c $j"; done
If your bash version supports associative arrays but not yet the [ -v var ] test, you can replace it by [ -n "${values[$j]+ok}" ].
But as explained in comments parsing the ls output is strongly discouraged. In your case any file name with spaces or tabs, or even worse, newline characters, would break all this.
If what you want is the owner of each file, use stat. (If a single call to stat per file is too big a bottleneck compared to one call to ls, then you shouldn't be using bash in the first place: use a language which provides direct access to the underlying system calls to retrieve the owner.)
for v in "${values[#]}"; do
user=$(stat ... "$v") # See man stat for the correct invocation
echo "$user $v"
done

How to properly read in a text file as a command line argument in shell script

The title of my question is very similar to other posts, I haven't found anything on my specific example though. I have to read in a text file as "$1", then put the values into an array line by line. Example:
myscript.sh /path/to/file
My question is would this approach work?
1 #!/bin/bash
2 file="$1"
3 readarray array < file
Would this code treat the "path/to/file" as "$1" then place that path into the variable "file". And if that part works correctly I believe line 3 should properly put the lines into an array correct?
This is the contents of the text file:
$ head short - rockyou .txt
290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
.
.
.
I hope this is enough information to help
Very close. :)
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0 needed" >&2; exit 1;; esac
file="$1"
readarray -t array <"$file"
declare -p array >&2 # print the array to stderr for demonstration/proof-of-concept
Note the use of the -t argument to readarray (to discard trailing newlines), and the use of $file rather than just file.
I use the following for placing the lines of a file in an array:
IFS=$'\r\n' GLOBIGNORE='*' command eval 'array=($(<filename))'
This gets all the columns and you can later work with it.
Edit: Explanations on the procedure above:
IFS=$'\r\n': stands for "internal field separator". It is used by the shell to determine how to do word splitting, i. e. how to recognize word boundaries.
GLOBIGNORE='*': From the bash's manual page: A colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
command eval: The addition of command eval allows for the expression to be kept in the present execution environment
array=...: Simply the definition.
There are different threads on Stackoverflow and Stackexchange with more details on this:
https://unix.stackexchange.com/questions/184863/what-is-the-meaning-of-ifs-n-in-bash-scripting
https://unix.stackexchange.com/questions/105465/how-does-globignore-work
Read lines from a file into a Bash array
Then I just loop around the array like this:
for (( b = 0; b < ${#array[#]}; b++ )); do
#Do Somethng
done
This could be matter of opinion. Please, wait for more comments.
Edit: Use case with empty lines and globs
After the comments yesterday. I finally have had time to test the suggestions (empty lines, lines with globs)
In both cases the array is working fine when working in conjunction with awk. In the following example I attempt to print only the column2 into a new text file:
IFS=$'\r\n' GLOBIGNORE='*' command eval 'array=($(<'$1'))'
for (( b = 0; b < ${#array[#]}; b++ )); do
echo "${array[b]}" | awk -F "/| " '{print $2}' >> column2.txt
done
Starting with the following text file:
290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
20901 rockyou
20553 12345678
16648 abc123
/*/*/*/*/*/*
20901 rockyou
20553 12345678
16648 abc123
Clear empty lines and globs in the script.
The result of the execution is the following:
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
rockyou
12345678
abc123
*
rockyou
12345678
abc123
Clear evidence that the array is working as expected.
Execution example:
adama#galactica:~$ ./processing.sh test.txt
adama#galactica:~$ cat column2.txt
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
rockyou
12345678
abc123
*
rockyou
12345678
abc123
Should we wish to remove empty lines (as it doesn't make sence to me have them in the output) we can do it in awk by changing the following line:
echo "${array[b]}" | awk -F "/| " '{print $2}' >> column2.txt
adding /./
echo "${array[b]}" | awk -F "/| " '/./ {print $2}' >> column2.txt
End Result:
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
rockyou
12345678
abc123
*
rockyou
12345678
abc123
Should you wish to apply it to the whole file (not column by column) you can take a look at the following thread:
AWK remove blank lines
Edit: Security concern on rm:
I actually went ahead and placed $(rm -rf ~) in the test file to test what would happen on a virtual machine:
Test.txt contents now:
290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
$(rm -rf ~)
20901 rockyou
20553 12345678
16648 abc123
/*/*/*/*/*/*
20901 rockyou
20553 12345678
16648 abc123
Execution:
adama#galactica:~$ ./processing.sh test.txt
adama#galactica:~$ ll
total 28
drwxr-xr-x 3 adama adama 4096 dic 1 22:41 ./
drwxr-xr-x 3 root root 4096 dic 1 19:27 ../
drwx------ 2 adama adama 4096 dic 1 22:38 .cache/
-rw-rw-r-- 1 adama adama 144 dic 1 22:41 column2.txt
-rwxr-xr-x 1 adama adama 182 dic 1 22:41 processing.sh*
-rw-r--r-- 1 adama adama 286 dic 1 22:39 test.txt
-rw------- 1 adama adama 1545 dic 1 22:39 .viminfo
adama#galactica:~$ cat column2.txt
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
-rf
rockyou
12345678
abc123
*
rockyou
12345678
abc123
No effect on the system.
Note: I am using Ubuntu 18.04 x64 LTS on an VM. Best not to try testing the security issue with root.
Edit: set -f necessity:
adama#galactica:~$ ./processing.sh a
adama#galactica:~$ cat column2.txt
[a]
adama#galactica:~$
Works perfectly without set -f
BR

Passing script to array

I have a bash script which runs as follows:
./script.sh var1 var2 var3.... varN varN+1
What i need to do is take first 2 variables, last 2 variables and insert them into file. The variables between 2 last and 2 first should be passed as a whole string to another file. How can this be done in bash?
Of course i can define a special variable with "read var" directive and then input this whole string from keyboard but my objective is to pass them from the script input
argc=$#
argv=("$#")
first_two="${argv[#]:0:2}"
last_two="${argv[#]:$argc-2:2}"
others="${argv[#]:2:$argc-4}"
#!/bin/bash
# first two
echo "${#:1:2}"
# last two
echo "${#:(-2):2}"
# middle
echo "${#:3:(($# - 4))}"
so sample
./script aaa bbb ccc ddd eee fff gggg hhhh
aaa bbb
gggg hhhh
ccc ddd eee fff

Batch rename files in sequence

I am having trouble renaming image sequences in the shell.
I have about 300 files following the pattern myimage_001.jpg and I would like to transform that to myimage.0001.jpg where the numbers increment with each file.
This is what I have tried with no success (the -n flag being there to show the result before actually applying it):
rename -n 's/_/./g' *.jpg
Try this command :
rename _ . *.jpg
Example :
> touch myimage_001.jpg
-rw-r--r-- 1 oracle oinstall 0 Mar 17 10:55 myimage_001.jpg
> rename _ . *.jpg
> ll
-rw-r--r-- 1 oracle oinstall 0 Mar 17 10:55 myimage.001.jpg
With an extra 0 :
> touch myimage_001.jpg
-rw-r--r-- 1 oracle oinstall 0 Mar 17 10:55 myimage_001.jpg
> rename _ .0 *.jpg
> ll
-rw-r--r-- 1 oracle oinstall 0 Mar 17 10:55 myimage.0001.jpg
the syntax is simple :
rename 'old' 'new' 'data-source'
Works fine for me? Note however that this does not add an additional leading zero like in your question, was this a typo?
$ find
.
./myimage_001.jpg
./myimage_007.jpg
./myimage_006.jpg
./myimage_002.jpg
./myimage_004.jpg
./myimage_009.jpg
./myimage_008.jpg
./myimage_003.jpg
./myimage_005.jpg
$ rename -n 's/_/./g' *.jpg
myimage_001.jpg renamed as myimage.001.jpg
myimage_002.jpg renamed as myimage.002.jpg
myimage_003.jpg renamed as myimage.003.jpg
myimage_004.jpg renamed as myimage.004.jpg
myimage_005.jpg renamed as myimage.005.jpg
myimage_006.jpg renamed as myimage.006.jpg
myimage_007.jpg renamed as myimage.007.jpg
myimage_008.jpg renamed as myimage.008.jpg
myimage_009.jpg renamed as myimage.009.jpg
$ rename 's/_/./g' *.jpg
$ find
.
./myimage.008.jpg
./myimage.007.jpg
./myimage.001.jpg
./myimage.003.jpg
./myimage.006.jpg
./myimage.005.jpg
./myimage.002.jpg
./myimage.009.jpg
./myimage.004.jpg
You can try something like:
for file in *.jpg; do
name="${file%_*}"
num="${file#*_}"
num="${num%.*}"
ext="${file#*.}"
mv "$file" "$(printf "%s.%04d.%s" $name $num $ext)"
done
This gives:
$ ls
myimage_001.jpg myimage_002.jpg
$ for file in *.jpg; do
name="${file%_*}"
num="${file#*_}"
num="${num%.*}"
ext="${file#*.}"
mv "$file" "$(printf "%s.%04d.%s" $name $num $ext)"
done
$ ls
myimage.0001.jpg myimage.0002.jpg
Another alternative:
$ touch a.txt b.txt c.txt d.txt e.txt f.txt
$ ls
a.txt b.txt c.txt d.txt e.txt f.txt
We can use ls combined with sed + xargs to achieve your goal.
$ ls | sed -e "p;s/\.txt$/\.sql/"|xargs -n2 mv
$ ls
a.sql b.sql c.sql d.sql e.sql f.sql
See http://nixtip.wordpress.com/2010/10/20/using-xargs-to-rename-multiple-files/ for detailed information.

Unix code wanted to copy template file and replace strings in template file in the copied files

I have 2 files:
File_1.txt:
John
Mary
Harry
Bill
File_2.txt:
My name is ID, and I am on line NR of file 1.
I want to create four files that look like this:
Output_file_1.txt:
My name is John, and I am on line 1 of file 1.
Output_file_2.txt:
My name is Mary, and I am on line 2 of file 1.
Output_file_3.txt:
My name is Harry, and I am on line 3 of file 1.
Output_file_4.txt:
My name is Bill, and I am on line 4 of file 1.
Normally I would use the following sed command to do this:
for q in John Mary Harry Bill
do
sed 's/ID/'${q}'/g' File_2.txt > Output_file.txt
done
But that would only replace the ID for the name, and not include the line nr of File_1.txt. Unfortunately, my bash skills don't go much further than that... Any tips or suggestions for a command that includes both file 1 and 2? I do need to include file 1, because actually the files are much larger than in this example, but I'm thinking I can figure the rest of the code out if I know how to do it with this hopefully simpler example... Many thanks in advance!
How about:
n=1
while read q
do
sed -e 's/ID/'${q}'/g' -e "s/NR/$n/" File_2.txt > Output_file_${n}.txt
((n++))
done < File_1.txt
See the Advanced Bash Scripting Guide on redirecting input to code blocks, and maybe the section on double parentheses for further reading.
How about awk, instead?
[ghoti#pc ~]$ cat file1
John
Mary
[ghoti#pc ~]$ cat file2
Harry
Bill
[ghoti#pc ~]$ cat merge.txt
My name is %s, and I am on the line %s of file '%s'.
[ghoti#pc ~]$ cat doit.awk
#!/usr/bin/awk -f
BEGIN {
while (getline line < "merge.txt") {
fmt = fmt line "\n";
}
}
{
file="Output_File_" NR ".txt";
printf(fmt, $1, FNR, FILENAME) > file;
}
[ghoti#pc ~]$ ./doit.awk file1 file2
[ghoti#pc ~]$ grep . Output_File*txt
Output_File_1.txt:My name is John, and I am on the line 1 of file 'file1'.
Output_File_2.txt:My name is Mary, and I am on the line 2 of file 'file1'.
Output_File_3.txt:My name is Harry, and I am on the line 1 of file 'file2'.
Output_File_4.txt:My name is Bill, and I am on the line 2 of file 'file2'.
[ghoti#pc ~]$
If you really want your filenames to be numbered, we can do that too.
What's going on here?
The awk script BEGINs by reading in your merge.txt file and appending it to the variable "fmt", line by line (separated by newlines). This makes fmt a printf-compatile format string.
Then, for every line in your input files (specified on the command line), an output file is selected (NR is the current record count spanning all files). The printf() function replaces each %s in the fmt variable with one of its options. Output is redirected to the appropriate file.
The grep just shows you all the files' contents with their filenames.
This might work for you:
sed '=' File_1.txt |
sed '1{x;s/^/'"$(<File_2.txt)"'/;x};N;s/\n/ /;G;s/^\(\S*\) \(\S*\)\n\(.*\)ID\(.*\)NR\(.*\)/echo "\3\2\4\1\5" >Output_file_\1.txt/' |
bash
TXR:
$ txr merge.txr
My name is John, and I am on the line 1 of file1.
My name is Mary, and I am on the line 2 of file1.
My name is Harry, and I am on the line 3 of file1.
My name is Bill, and I am on the line 4 of file1.
merge.txr:
#(bind count #(range 1))
#(load "file2.txt")
#(next "file1.txt")
#(collect)
#name
#(template name #(pop count) "file1")
#(end)
file2.txt:
#(define template (ID NR FILE))
#(output)
My name is #ID, and I am on the line #NR of #FILE.
#(end)
#(end)
Read the names into an array.
get the array length
iterate over the array
Test preparation:
echo "John
Mary
Harry
Bill
" > names
Names and numbers:
name=($(<names))
max=$(($(echo ${#name[*]})-1))
for i in $(seq 0 $max) ; do echo $i":"${name[i]}; done
with template:
for i in $(seq 0 $max) ; do echo "My name is ID, and I am on the line NR of file 1." | sed "s/ID/${name[i]}/g;s/NR/$((i+1))/g"; done
My name is John, and I am on the line 1 of file 1.
My name is Mary, and I am on the line 2 of file 1.
My name is Harry, and I am on the line 3 of file 1.
My name is Bill, and I am on the line 4 of file 1.
A little modification needed in your script.Thats it.
pearl.306> cat temp.sh
#!/bin/ksh
count=1
cat file1|while read line
do
sed -e "s/ID/${line}/g" -e "s/NR/${count}/g" File_2.txt > Output_file_${count}.txt
count=$(($count+1))
done
pearl.307>
pearl.303> temp.sh
pearl.304> ls -l Out*
-rw-rw-r-- 1 nobody nobody 59 Mar 29 18:54 Output_file_1.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_2.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_3.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_4.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_5.txt
pearl.305> cat Out*
My name is linenumber11, and I am on the line 1 of file 1.
My name is linenumber2, and I am on the line 2 of file 1.
My name is linenumber1, and I am on the line 3 of file 1.
My name is linenumber4, and I am on the line 4 of file 1.
My name is linenumber6, and I am on the line 5 of file 1.
pearl306>

Resources