How to save files' names containing spaces to an array in bash? - bash

Let's say that there are 3 files in a directory: file1, file 2, file 3.
I must use ls to get the file names and save them to an array. The problem is that, for example, bash sees the name file 2 as two separate elements: file and 2, while file1 is properly interpreted.
I understand that bash uses spaces as separators of array elements.
Is it possible to change the separator temporally in the script to double quotes for example?
It would solve the problem 'cause in that case it would simply be the matter of using ls with -Q option.

Related

how to merge multiple text files using bash and preserving column order

I'm new to bash, I have a folder in which there are many text files among them there's a group which are named namefile-0, namefile-1,... namefile-100. I need to merge these file all in a new file. The format of each of these files is: header and 3 columns of data.
It is very important that the format of the new file is:
3 * 100 columns of data respecting the order of the columns (123123123...).
I don't mind if the header is also repeated or not.
I'm also willing, in case it was necessary, to place all these files in a folder in which no other files are present.
I've tried to do something like this:
for i in {1..100}
do
paste `echo "namefile$i"` >> `echo "b"
done
which prints only the first file into b.
I've also tried to do this:
STR=""
for i in {1..100}
do
STR=$STR"namefile"$i" "
done
paste $STR > b
which prints everything but does not preserve the order of the columns.
You need to mention what delimeter separates columns in your file.
Assuming the columns are separated by a single space,
paste -d' ' namefile-* > newfile
Other conditions like existence of other similar files or directories in the working directory, stripping of headers etc can also be tackled but some more information needs to be provided in the question.
paste namefile-{0..100} >combined
paste namefile* > new_file_name

Single file contain files name and scores | text processing

I have a folder called files that has 100 files, each one has one value inside;such as: 0.974323
This my code to generate those files and store the single value inside:
DIR="/home/XX/folder"
INPUT_DIR="/home/XX/folder/eval"
OUTPUT_DIR="/home/XX/folder/files"
for i in $INPUT_DIR/*
do
groovy $DIR/calculate.groovy $i > $OUTPUT_DIR/${i##*/}_rates.txt
done
That will generate 100 files inside /home/XX/folder/files, but what I want is one single file that has in each line two columns separated by tab contain the score and the name of the file (which is i).
the score \t name of the file
So, the output will be:
0.9363728 \t resultFile.txt
0.37229 \t outFile.txt
And so on, any help with that please?
Assuming your Groovy program outputs just the score, try something like
#!/bin/sh
# ^ use a valid shebang
# Don't use uppercase for variables
dir="/home/XX/folder"
input_dir="/home/XX/folder/eval"
output_dir="/home/XX/folder/files"
# Always use double quotes around file names
for i in "$input_dir"/*
do
groovy "$dir/calculate.groovy" "$i" |
sed "s%^%$i\t%"
done >"$output_dir"/tabbed_file.txt
The sed script assumes that the file names do not contain percent signs, and that your sed recognizes \t as a tab (some variants will think it's just a regular t with a gratuitous backslash; replace it with a literal tab, or try ctrl-v tab to enter a literal tab at the prompt in many shells).
A much better fix is probably to change your Groovy program so that it accepts an arbitrary number of files as command-line arguments, and includes the file name in the output (perhaps as an option).

Modify text file based on file's name, repeat for all files in folder

I have a folder with several files named : something_1001.txt; something_1002.txt; something_1003.txt; etc.
Inside the files there is some text. Of course each file has a different text but the structure is always the same: some lines identified with the string ">TEXT", which are the ones I am interested in.
So my goal is :
for each file in the folder, read the file's name and extract the number between "_" and ".txt"
modify all the lines in this particular file that contain the string ">TEXT" in order to make it ">{NUMBER}_TEXT"
For example : file "something_1001.txt"; change all the lines containing ">TEXT" by ">1001_TEXT"; move on to file "something_1002.txt" change all the lines containing ">TEXT" by ">1002_TEXT"; etc.
Here is the code I wrote so far :
for i in /folder/*.txt
NAME=`echo $i | grep -oP '(?<=something_/).*(?=\.txt)'`
do
sed -i -e 's/>TEXT/>${NAME}_TEXT/g' /folder/something_${NAME}.txt
done
I created a small bash script to run the code but it's not working. There seems to be syntax errors and a loop error, but I can't figure out where.
Any help would be most welcome !
There are two problems here. One is that your loop syntax is wrong; the other is that you are using single quotes around the sed script, which prevents the shell from interpolating your variable.
The grep can be avoided, anyway; the shell has good built-in facilities for extracting the base name of a file.
for i in /folder/*.txt
do
base=${i#/folder/something_}
sed -i -e "s/>TEXT/>${base%.txt}_TEXT/" "$i"
done
The shell's ${var#prefix} and ${var%suffix} variable manipulation facility produces the value of $var with the prefix and suffix trimmed off, respectively.
As an aside, avoid uppercase variable names, because those are reserved for system use, and take care to double-quote any variable whose contents may include shell metacharacters.

Reading data from file to execute Shell Script

I have a 'testfiles' files that has list of files
Ex-
Tc1
Tc2
calling above file in script
test=`cat testfiles`
for ts in $test
do
feed.sh $ts >>results
done
This script runs fine when there only 1 test file in 'testfiles',but when there are multiple files ,it fails with 'file not found'
Let me know if this is correct approach
you ll have to read files one by one since you are taking testfiles='Tc1 Tc2' cat is searching for file named 'Tc1 Tc2' which does not exist so use cut command with " " as the delimiter and rad files one by one in a loop.or u can use sed command also to seperate file names
Your approach should work if the filenames have no spaces or other tricky characters. An approach that handles spaces in file names successfully is:
while IFS= read -r ts
do
feed.sh "$ts" >>results
done <testfiles
If your file names have newline characters in them, then the above won't work and you would need to create testfiles with the names separated by a null character in place of a newline.
Let's consider the original code. When bash substitutes for $test in the for statement, all the file names appear on the same line and bash will perform word splitting which will make a mess of any file names containing white space. The same happens on the line feed.sh $ts. Since $ts is not quoted, it will also undergo word splitting.

Shell script to iterate through files with only one '4' in the file name

I am trying to iterate through files in the same directory with only one 4 in them.
Here is what I have so far. The problem with my current script is that files with any number of 4's get selected, not files with only one 4.
for i in *4*.cpp;
do
...
Sort of like [!4] but for any number of non 4 characters.
*http://www.tuxfiles.org/linuxhelp/wildcards.html
I want to iterate through file names such as me4.cpp, 4.cpp, and hi4hi.cpp
I want to ignore file names such as lala.cpp, 44.cpp, 4hi4.cpp
Thank you!
Figured it out. I tried [!4]* on a whim.
Oops turned out I didn't. That is interpreted as ([!4]) then (*)
The grep style regex you need is:
^[^4]*4[^4]*$
A bunch of not-4's after the start of the line, a 4, and another bunch of not-4's to the end of the line.
In pure shell, consider using a case statement:
for file in *4*.cpp
do
case "$file" in
(*4*4*) : Ignore;;
(*) : Process;;
esac
done
That looks for names containing 4's, and then ignores those containing 2 or more 4's.
How about using find
find ./ -regex "<regular expression>"
Assuming bash:
shopt -s extglob
for file in *([^4])4*([^4]).cpp; ...
where *([^4]) means zero or more characters that are not "4"

Resources