How can I merge before move files? - bash

I have some files (few millions) and I keep file list in files.txt like this:
/home/user/1.txt
/home/user/2.txt
/home/user/3.txt
/home/user/4.txt
/home/user/5.txt
I need to move all, but before move I must merge too.
I can move like this:
#!/bin/bash
for files in $(cat files.txt); do
mv $files /home/user/hop/
done
I can merge all with cat * but I need to merge by twos, like this:
1.txt and 2.txt merge --> 1.txt and move.
3.txt and 4.txt merge --> 3.txt and move.
5.txt --> 5.txt and move.
But I must merge before move, in /home/user/, not in /home/user/hop/
How can I do this?

You can use $ cat file1 file2 file3 file4 file5 file6 > out.txt after you moved them, with this you can also set the order of the files to be merged.
Also works for binaries.

You can use this script:
while read -r f; do
if ((++i % 2)); then
p="$f"
else
cat "$f" >> "$p"
mv "$p" /home/user/hop/
rm "$f"
unset p
fi
done < list.txt
[[ -n $p ]] && mv "$p" /home/user/hop/

Related

how to add beginning of file to another file using loop

I have files 1.txt, 2.txt, 3.txt and 1-bis.txt, 2-bis.txt, 3-bis.txt
cat 1.txt
#ok
#5
6
5
cat 2.txt
#not ok
#56
13
56
cat 3.txt
#nothing
#
cat 1-bis.txt
5
4
cat 2-bis.txt
32
24
cat 3-bis.txt
I would like to add lines starting with # (from non bis files) at the beginning of files "bis" in order to get:
cat 1-bis.txt
#ok
#5
5
4
cat 2-bis.txt
#not ok
#56
32
24
cat 3-bis.txt
#nothing
#
I was thinking to use grep -P "#" to select lines with # (or maybe sed -n) but I don't know how to loop files to solve this problem
Thank you very much for your help
You can use this solution:
for f in *-bis.txt; do
{ grep '^#' "${f//-bis}"; cat "$f"; } > "$f.tmp" && mv "$f.tmp" "$f"
done
If you only want # lines at the beginning of the files only then use:
Change
grep '^#' "${f//-bis}"
with:
awk '!/^#/{exit}1' "${f//-bis}"
You can loop over the ?.txt files and use parameter expansion to derive the corresponding bis- filename:
for file in ?.txt ; do
bis=${file%.txt}-bis.txt
grep '^#' "$file" > tmp
cat "$bis" >> tmp
mv tmp "$bis"
done
You don't need grep -P, simple grep is enough. Just add ^ to only match the octothorpes at the beginning of a line.

check the content of the files if they are same

i have many .txt files(namely 1.txt, 2.txt 3.txt ...etc) saved in a directory and I want to check if the content of the files inside the directory are same or not
All files should be compared with other file if the content is same then print yes, if content not same print no
For Example:
1.txt
a
b
c
2.txt
a
b
c
3.txt
1
2
3
expected output when compare two file 1.txt 2.txt
1.txt 2.txt yes
expected output when compare two file 1.txt 3.txt
1.txt 3.txt no
expected output when compare two file 2.txt 3.txt
2.txt 3.txt no
I tried the script
#!/bin/sh
for file in /home/nir/dat/*.txt
do
echo $file
diff $file $file+1
done
But here problem is it doesnot give the output.Please suggest a better solution thanks.
Something like this in bash:
for i in *
do
for j in *
do
if [[ "$i" < "$j" ]]
then
if cmp -s "$i" "$j"
then
echo $i $j equal
else
echo $i $j differ
fi
fi
done
done
Output:
1.txt 2.txt equal
1.txt 3.txt differ
2.txt 3.txt differ
One idea using an array of the filenames, and borrowing jamesbrown's cmp solution:
# load list of files into array flist[]
flist=(*)
# iterate through all combinations; '${#flist[#]}' ==> number of elements in array
for ((i=0; i<${#flist[#]}; i++))
do
for ((j=i+1; j<${#flist[#]}; j++))
do
# default status = "no" (ie, are files the same?)
status=no
# if files are different this generates a return code of 1 (aka true),
# so the follow-on assignment (status=yes) is executed
cmp -s "${flist[${i}]}" "${flist[${j}]}" && status=yes
echo "${flist[${i}]} ${flist[${j}]} ${status}"
done
done
For the 3 files listed in the question this generates:
1.txt 2.txt yes
1.txt 3.txt no
2.txt 3.txt no

Execute a program over all pairs of files in a directory using bash script

I have a directory with a bunch of files. I need to create a bash file to qsub and run a program over all pairs of all files:
for $file1, $file2 in all_pairs
do
/path/program -i $file1 $file2 -o $file1.$file2.result
done
So I could do:
qsub script.sh
to get:
file1.file2.result
file1.file3.result
file2.file3.result
for directory with:
file1
file2
file3
The following is probably the easiest:
the pair a-b is different from b-a:
set -- file1 file2 file3 file4 ...
for f1; do
for f2; do
/path/program -i "$f1" "$f2" -o "$f1.$f2.result"
done
done
the pair a-b is equal to b-a:
set -- file1 file2 file3 file4 ...
for f1; do
shift
for f2; do
/path/program -i "$f1" "$f2" -o "$f1.$f2.result"
done
done
You can do it as in every other programming language:
files=(file1 file2 file3) # or use a glob to list the files automatically, for instance =(*)
max="${#files[#]}"
for ((i=0; i<max; i++)); do
for ((j=i+1; j<max; j++)); do
echo -i "${files[i]}" "${files[j]}" -o "${files[i]}${files[j]}.result"
done
done
Replace echo with /path/program when you are happy with the result

Bash pass the file names which are not in the ith element of loop

In a simple processing of files, where you want to do something on every file in a directory, you do something like this:
for i in file1 file2 file3 file5
do
echo "Processing $i"
done
What I want to do here is pass $i as well as the non-$i files as an argument to a command. Lets say my directory contains 4 files (file1, file2, file3, file5). For example in the first iteration of the loop when file1 is being processed, I want to pass the rest of the files (file2, file3, file5) to the -b argument of the command.
For example, first iteration of loop in bash should look something like this:
FILES=/path/to/directory
for i in $FILES
do
bedtools intersect -a $i -b file2 file3 file5
done
In second iteration as the file2 is in the $i the rest of the files will be passed to -b argument.
for i in $FILES
do
bedtools intersect -a $i -b file1 file3 file5
done
and so on for all the files in the directory. In short, pass the current file to -a argument and rest of the files to -b argument.
It will be great if somebody can help me with this. Thank you.
You can just use a numeric loop and take slices out of the array:
shopt -s nullglob
files=( path/to/directory/* )
for (( i = 0; i < ${#files[#]}; ++i )); do
file=${files[i]}
others=( "${files[#]:0:i}" "${files[#]:i+1}" )
bedtools intersect -a "$file" -b "${others[#]}"
done
This loops though the indices of the array files and slices the part before and after the current index i to get the others.
You can try out like this as well,
op=$(find /path/to/directory ! -iname ".*")
temp=$op
for i in $op;
do
rfile=${temp//$i/}
rfile=$(echo $rfile | tr '\n' ' ')
bedtools intersect -a $i -b $rfile
done
count=0; files=(*)
for i in ${files[*]}; do
unset files[count]
echo "bedtools intersect -a $i -b ${files[*]}"
files+=($i)
((count++))
done

Splitting CSV file into text files

I have a CSV file of the form:
1,frog
2,truck
3,truck
4,deer
5,automobile
and so on, for about 50 000 entries. I want to create 50 000 separate .txt files named with the number before the comma and containing the word after the comma, like so:
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile
and so on.
This is the script I've written so far, but it does not work properly:
#!/bin/bash
folder=/home/data/cifar10
for file in $(find "$folder" -type f -iname "*.csv")
do
name=$(basename "$file" .txt)
while read -r tag line; do
printf '%s\n' "$line" >"$tag".txt
done <"$file"
rm "$file"
done
The issue is in your inner loop:
while read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You need to set IFS to , so that tag and line are parsed correctly:
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
You can use shopt -s globstar instead of find, with Bash 4.0+. This will be immune to word splitting and globbing, unlike plain find:
shopt -s globstar nullglob
for file in /home/data/cifar10/**/*.csv; do
while IFS=, read -r tag line; do
printf '%s\n' "$line" > "$tag".txt
done < "$file"
done
Note that the name set through name=$(basename "$file" .txt) statement is not being used in your code.
An awk alternative:
awk -F, '{print $2 > $1 ".txt"}' file.csv
awk 'BEGIN{FS=","} {print $1".txt contains: "$2}' file
1.txt contains: frog
2.txt contains: truck
3.txt contains: truck
4.txt contains: deer
5.txt contains: automobile

Resources