text file multiply bash linux - bash

for example i have a text file with 5 lines:
one
two
three
four
five
and i want to make a script to make a 2000 lines file containing loops of the file above
and it would look like
one
two
three
four
five
one
two
three
four
five
one
two
three
four
five
............repeat until n times is reached

Testing showed this to be about 100 times faster than the next best approach given so far.
#!/bin/bash
IN="${1}"
OUT="${2}"
for i in {1..2000}; do
echo "${IN}"
done | xargs cat > "${OUT}"
The reason this is so much faster is because it doesn't repeatedly open, seek to end, append, and close the output file. It opens the output file once, and streams the data to it in a single large, continuous write. It also invokes cat as few times as possible. It may invoke cat only once, even, depending on the system's maximum command line length and the length of the input file name.

If you need to repeat 2000 times
for i in {1..2000}; do cat "FILE"; done > NEW_FILE

Do you need 2000 lines or 2000 copies of the original file?
If the first:
infile='/path/to/inputfile'
outfile='/path/to/outputfile'
len=$(wc -l < "$infile")
for ((i=1; i<=2000/len; i++))
do
cat "$infile"
done > "$outfile.tmp" # you can use mktemp or tempfile if you want
head -n 2000 "$outfile.tmp" > "$outfile"
rm "$outfile.tmp"
If the second:
for i in {1..2000}; do cat "$infile"; done > "$outfile"
For a small input file (avoids the overhead of forking cat 2000 times):
file=$(<"$infile"); for i in {1..2000}; do echo "$file"; done > "$outfile"

Does it need to be a script? If you just want to quickly generate that you can open on vim, cut (press esc than 5dd to cut 5 lines) and than insert n times (press esc than n p to paste n times).
Edit: if you absolutely need a script and efficiency is not a problem, you can do this "dirty" trick:
i=0;
n=5;
while(($i < $n)) ; do
cat orginal_file >> new_file;
let i+=1;
done

file_size() {
cat -- "$#" |wc -l
}
mult_file() {
local \
max_lines="$1" \
iter_size \
iters \
i
shift 1
iter_size="$(file_size "$#")"
let iters=max_lines/iter_size+1
(for ((i=0; i<iters; ++i)); do
cat -- "$#"
done) |
head --lines="$max_lines"
}
mult_file "$#"
So you would call it like script.sh LINES FILE1 FILE2 FILE3 >REPEAT_FILE.

No process in the loop, no pipes:
infile='5.txt'
outfile='2000.txt'
n=$((2000/ $(wc -l < "$infile") )) # repetitions
> "$outfile" # empty output file
IFS=''
content=$(cat "$infile") # file content as string
for (( CNTR=0; CNTR<n; CNTR+=1 )); do
echo "$content" >> "$outfile"
done

Related

paste a list of files in a range in bash

I have a list of files named "LL1.txt" to "LL1180.txt" but I am only interested in files from 1 to 50 to paste them in one file.
I tried using:
seq 1 50
for n in $(seq $1 $2); do
f="LL$n.txt"
if [ -e $f ]; then
paste "$f" > LL$1_$2.txt
fi
done
but it did not work.
for n in `seq $start $stop` ; do
if [ -e "$LL$n.txt" ] ; then
cat LL$n.txt >> output_file
fi
done
or if you enjoy harder way:
cat > output_file <<< `cat LL{1..50}.txt`
You need to give all the filenames as arguments to paste so it will combine them.
paste FILE{1..50}.txt > LL1_50.txt
Note that you can't use variables in the braced range. See Using a variable in brace expansion range fed to a for loop if you need workarounds.

How to add grouping mechanism inside for loop in bash

I have a for loop that loops through a list of files, and inside the for loop a script is called, that takes this file name as input.
Something like
for file in $(cat list_of_files) ; do
script $file
done
the file list_of_files has files like
file1
file2
file3
...
so with each iteration, one file is processed.
I have to design something like, loop through all the files, group them into groups of 3 , so that in one loop, script will be called 3 times, and not one by one,and then again the other 3 will be called in second loop iteration and so on
something like,
for file in $(cat list_of_files) ; do
# do somekind of grouping here
call one more loop to run the sript.sh 3 times, so something like
for i=1 to 3 and then next iteration from 4 to 6 and so on..
script.sh $file1
script.sh $file2
script.sh $file3
done
I am struggling currently on how to get this looping done and i am stuck here and could not think of efficient way here.
Change for ... in to while read
for file in $(cat list_of_files)
This style of loop is subtly dangerous and/or incorrect. It won't work right on file names with spaces, asterisks, or other special characters. As a general rule, avoid for x in $(...) loops. For more details, see:
Bash Pitfalls: for f in $(ls *.mp3).
A safer alternative is to use while read along with process substitution, like so:
while IFS= read -r file; do
...
done < <(cat list_of_files)
It's ugly, I'll admit, but it will handle special characters safely. It split apart file names with spaces and it won't expand * globs. For more details on what this is doing, see:
Unix.SE: Understanding “IFS= read -r line”.
You can then remove the Useless Use of Cat and use a simple redirection instead:
while IFS= read -r file; do
...
done < list_of_files
Read 3 at a time
So far these changes haven't answered your core question, how to group files 3 at a time. The switch to read has actually served a second purpose. It makes grouping easy. The trick is to call read multiple times per iteration. This is an easy change with while read; it's not so easy with for ... in.
Here's what that looks like:
while IFS= read -r file1 &&
IFS= read -r file2 &&
IFS= read -r file3
do
script.sh "$file1"
script.sh "$file2"
script.sh "$file3"
done < list_of_files
This calls read three times, and once all three succeed it proceeds to the loop body.
It will work great if you always have a multiple of 3 items to process. If not, it will mess up at the end and skip the last file or two. If that's an issue we can update it to try to handle that case.
while IFS= read -r file1; do
IFS= read -r file2
IFS= read -r file3
script.sh "$file1"
[[ -n $file2 ]] && script.sh "$file2"
[[ -n $file3 ]] && script.sh "$file3"
done < list_of_files
Run the scripts in parallel
If I understand your question right, you also want to run the scripts at the same time rather than sequentially, one after the other. If so, the way to do that is to append &, which will cause them to run in the background. Then call wait to block until they have all finished before proceeding.
while IFS= read -r file1; do
IFS= read -r file2
IFS= read -r file3
script.sh "$file1" &
[[ -n $file2 ]] && script.sh "$file2" &
[[ -n $file3 ]] && script.sh "$file3" &
wait
done < list_of_files
How about
xargs -d $'\n' -L 1 -P 3 script.sh <list_of_files
-P 3 runs 3 processes in parallel. Each of those gets the input of one line (due to -L 1), and the -d options ensures that spaces in an input line are not considered separate arguments.
You can use bash arrays to store the filenames until you get 3 of them:
#!/bin/bash
files=()
while IFS= read -r f; do
files+=( "$f" )
(( ${#files[#]} < 3 )) && continue
script.sh "${files[0]}"
script.sh "${files[1]}"
script.sh "${files[2]}"
files=()
done < list_of_files
However, I think that John Kugelman's answer is simpler, then better: it uses less bash-specific features, then it can be more easily converted to a POSIX version.
you should not mix script languages if you don't absolutely have to
you can start with that
from os import listdir
from os.path import isfile, join
PATH_FILES = "/yourfolder"
def yourFunction(file_name):
file_path = PATH_FILES + "/" + file_name
print(file_path) #or do something else
print(file_path) #or do something else
print(file_path) #or do something else
file_names = [f for f in listdir(PATH_FILES) if isfile(join(PATH_FILES, f))]
for file_name in file_names:
yourFunction(file_name)
If mapfile aka readarray is available/acceptable. (bash4+ is required)
Assuming script.sh can accept multiple input.
#!/usr/bin/env bash
while mapfile -tn3 files && (( ${#files[*]} == 3 )); do
script.sh "${files[#]}"
done < list_of_files
otherwise loop through the array named files
#!/usr/bin/env bash
while mapfile -tn3 files && (( ${#files[*]} == 3 )); do
for file in "${files[#]}"; do
script.sh "$file"
done
done < list_of_files
The body after the do will run/execute if there are always 3 lines if there is not enough lines to satisfy 3 lines until the end of the file, just remove the
&& (( ${#files[*]} == 3 ))
from the script.
or do it manually one-by-one, but it should have 3 lines to be processed (the file) until the end.
#!/usr/bin/env bash
while mapfile -tn3 files && (( ${#files[*]} == 3 )); do
script.sh "${file[0]}"
script.sh "${file[1]}"
script.sh "${file[2]}"
done < list_of_files

How to compare 2 files word by word and storing the different words in result output file

Suppose there are two files:
File1.txt
My name is Anamika.
File2.txt
My name is Anamitra.
I want result file storing:
Result.txt
Anamika
Anamitra
I use putty so can't use wdiff, any other alternative.
not my greatest script, but it works. Other might come up with something more elegant.
#!/bin/bash
if [ $# != 2 ]
then
echo "Arguments: file1 file2"
exit 1
fi
file1=$1
file2=$2
# Do this for both files
for F in $file1 $file2
do
if [ ! -f $F ]
then
echo "ERROR: $F does not exist."
exit 2
else
# Create a temporary file with every word from the file
for w in $(cat $F)
do
echo $w >> ${F}.tmp
done
fi
done
# Compare the temporary files, since they are now 1 word per line
# The egrep keeps only the lines diff starts with > or <
# The awk keeps only the word (i.e. removes < or >)
# The sed removes any character that is not alphanumeric.
# Removes a . at the end for example
diff ${file1}.tmp ${file2}.tmp | egrep -E "<|>" | awk '{print $2}' | sed 's/[^a-zA-Z0-9]//g' > Result.txt
# Cleanup!
rm -f ${file1}.tmp ${file2}.tmp
This uses a trick with the for loop. If you use a for to loop on a file, it will loop on each word. NOT each line like beginners in bash tend to believe. Here it is actually a nice thing to know, since it transforms the files into 1 word per line.
Ex: file content == This is a sentence.
After the for loop is done, the temporary file will contain:
This
is
a
sentence.
Then it is trivial to run diff on the files.
One last detail, your sample output did not include a . at the end, hence the sed command to keep only alphanumeric charactes.

KSH Shell script - Process file by blocks of lines

I am trying to write a bash script in a KSH environment that would iterate through a source text file and process it by blocks of lines
So far I have come up with this code, although it seems to go indefinitely since the tail command does not return 0 lines if asked to retrieve lines beyond those in the source text file
i=1
while [[ `wc -l /path/to/block.file | awk -F' ' '{print $1}'` -gt $((i * 1000)) ]]
do
lc=$((i * 1000))
DA=ProcessingResult_$i.csv
head -$lc /path/to/source.file | tail -1000 > /path/to/block.file
cd /path/to/processing/batch
./process.sh #This will process /path/to/block.file
mv /output/directory/ProcessingResult.csv /output/directory/$DA
i=$((i + 1))
done
Before launching the above script I perform a manual 'first injection': head -$lc /path/to/source.file | tail -1000 > /path/to/temp.source.file
Any idea on how to get the script to stop after processing the last lines from the source file?
Thanks in advance to you all
If you do not want to create so many temporary files up front before beginning to process each block, you could try the below solution. It can save lot of space when processing huge files.
#!/usr/bin/ksh
range=$1
file=$2
b=0; e=0; seq=1
while true
do
b=$((e+1)); e=$((range*seq));
sed -n ${b},${e}p $file > ${file}.temp
[ $(wc -l ${file}.temp | cut -d " " -f 1) -eq 0 ] && break
## process the ${file}.temp as per your need ##
((seq++))
done
The above code generates only one temporary file at a time.
You could pass the range(block size) and the filename as command line args to the script.
example: extractblock.sh 1000 inputfile.txt
have a look to man split
NAME
split - split a file into pieces
SYNOPSIS
split [OPTION]... [INPUT [PREFIX]]
-l, --lines=NUMBER
put NUMBER lines per output file
For example
split -l 1000 source.file
Or to extract the 3rd chunk for example (1000 here is not the number of lines , it is the number of chunks, or a chunk is 1/1000 of source.file)
split -nl/3/1000 source.file
A note on condition :
[[ `wc -l /path/to/block.file | awk -F' ' '{print $1}'` -gt $((i * 1000)) ]]
Maybe it should be source.file instead of block.file, and it is quite inefficient on a big file because it will read (count the lines of the file) for each iteration ; number of lines can be stored in a variable, also using wc on standard input prevents from using awk:
nb_lines=$(wc -l </path/to/source.file )
With Nahuel's recommendation I was able to build the script like this:
i=1
cd /path/to/sourcefile/
split source.file -l 1000 SF
for sf in /path/to/sourcefile/SF*
do
DA=ProcessingResult_$i.csv
cd /path/to/sourcefile/
cat $sf > /path/to/block.file
rm $sf
cd /path/to/processing/batch
./process.sh #This will process /path/to/block.file
mv /output/directory/ProcessingResult.csv /output/directory/$DA
i=$((i + 1))
done
This worked great

how to read file from line x to the end of a file in bash

I would like know how I can read each line of a csv file from the second line to the end of file in a bash script.
I know how to read a file in bash:
while read line
do
echo -e "$line\n"
done < file.csv
But, I want to read the file starting from the second line to the end of the file. How can I achieve this?
tail -n +2 file.csv
From the man page:
-n, --lines=N
output the last N lines, instead of the last 10
...
If the first character of N (the number of bytes or lines) is a '+',
print beginning with the Nth item from the start of each file, other-
wise, print the last N items in the file.
In English this means that:
tail -n 100 prints the last 100 lines
tail -n +100 prints all lines starting from line 100
Simple solution with sed:
sed -n '2,$p' <thefile
where 2 is the number of line you wish to read from.
Or else (pure bash)...
{ for ((i=1;i--;));do read;done;while read line;do echo $line;done } < file.csv
Better written:
linesToSkip=1
{
for ((i=$linesToSkip;i--;)) ;do
read
done
while read line ;do
echo $line
done
} < file.csv
This work even if linesToSkip == 0 or linesToSkip > file.csv's number of lines
Edit:
Changed () for {} as gniourf_gniourf enjoin me to consider: First syntax generate a sub-shell, whille {} don't.
of course, for skipping only one line (as original question's title), the loop for (i=1;i--;));do read;done could be simply replaced by read:
{ read;while read line;do echo $line;done } < file.csv
There are many solutions to this. One of my favorite is:
(head -2 > /dev/null; whatever_you_want_to_do) < file.txt
You can also use tail to skip the lines you want:
tail -n +2 file.txt | whatever_you_want_to_do
Depending on what you want to do with your lines: if you want to store each selected line in an array, the best choice is definitely the builtin mapfile:
numberoflinestoskip=1
mapfile -s $numberoflinestoskip -t linesarray < file
will store each line of file file, starting from line 2, in the array linesarray.
help mapfile for more info.
If you don't want to store each line in an array, well, there are other very good answers.
As F. Hauri suggests in a comment, this is only applicable if you need to store the whole file in memory.
Otherwise, you best bet is:
{
read; # Just a scratch read to get rid (pun!) of the first line
while read line; do
echo "$line"
done
} < file.csv
Notice: there's no subshell involved/needed.
This will work
i=1
while read line
do
test $i -eq 1 && ((i=i+1)) && continue
echo -e "$line\n"
done < file.csv
I would just get a variable.
#!/bin/bash
i=0
while read line
do
if [ $i != 0 ]; then
echo -e $line
fi
i=$i+1
done < "file.csv"
UPDATE Above will check for the $i variable on every line of csv. So if you have got very large csv file of millions of line it will eat significant amount of CPU cycles, no good for Mother nature.
Following one liner can be used to delete the very first line of CSV file using sed and then output the remaining file to while loop.
sed 1d file.csv | while read d; do echo $d; done

Resources