Single file contain files name and scores | text processing - shell

I have a folder called files that has 100 files, each one has one value inside;such as: 0.974323
This my code to generate those files and store the single value inside:
DIR="/home/XX/folder"
INPUT_DIR="/home/XX/folder/eval"
OUTPUT_DIR="/home/XX/folder/files"
for i in $INPUT_DIR/*
do
groovy $DIR/calculate.groovy $i > $OUTPUT_DIR/${i##*/}_rates.txt
done
That will generate 100 files inside /home/XX/folder/files, but what I want is one single file that has in each line two columns separated by tab contain the score and the name of the file (which is i).
the score \t name of the file
So, the output will be:
0.9363728 \t resultFile.txt
0.37229 \t outFile.txt
And so on, any help with that please?

Assuming your Groovy program outputs just the score, try something like
#!/bin/sh
# ^ use a valid shebang
# Don't use uppercase for variables
dir="/home/XX/folder"
input_dir="/home/XX/folder/eval"
output_dir="/home/XX/folder/files"
# Always use double quotes around file names
for i in "$input_dir"/*
do
groovy "$dir/calculate.groovy" "$i" |
sed "s%^%$i\t%"
done >"$output_dir"/tabbed_file.txt
The sed script assumes that the file names do not contain percent signs, and that your sed recognizes \t as a tab (some variants will think it's just a regular t with a gratuitous backslash; replace it with a literal tab, or try ctrl-v tab to enter a literal tab at the prompt in many shells).
A much better fix is probably to change your Groovy program so that it accepts an arbitrary number of files as command-line arguments, and includes the file name in the output (perhaps as an option).

Related

sed command for inserting text inside single quote

Suppose there's a text file with the following line:
export MYSQL_ADMIN=''
I want to insert text inside that single quote using the sed command, so that it changes to something like this for example:
export MYSQL_ADMIN='abc1'
What is the appropriate sed command for that in Linux?
I tried
sed -i -e ''/MYSQL_ADMIN/s/''/'abc1'/g"
but it didn't work.
Something like sed -i "s;export MYSQL_ADMIN=.*;export MYSQL_ADMIN='abc1';" /path/to/file.ext
-i modify file in place
s means substitute,
First block is what you are matching as an regular expression - the .* matches everything to the end of the line, this ensures you don't keep any text on that line after the substitue - and second block is what you are replacing with that match.
Always check the file after each run of sed if there is no error and check what changed.
To get the single quotes to print you may have to do ""'"" like ""'""abc1""'""
It is important to understand that although
I want to insert text inside that single quote using the sed command
is a perfectly good characterization of the effect you want to achieve, it does not map directly onto operations from sed's repertoire. With sed, the appropriate tool for most line modifications is the s command, which substitutes specified text for one or more matches to a specified regular expression. That would be the most natural thing to use for your case.
Additionally, it is important with sed to understand how and when to bind commands to specific lines. If you don't do that for a given command then it is applied to all lines. Sometimes that's fine, but other times it will produce unwanted results.
I tried
sed -i -e ''/MYSQL_ADMIN/s/''/'abc1'/g"
but it didn't work.
The two leading single quotes in that sed expression match each other, leaving the trailing double quote unmatched. Also, you do not specify the name of the file to modify. This variation would at least be valid shell syntax, and it would have the desired effect on the specified line appearing in file my_script:
sed -i -e "/MYSQL_ADMIN/s/''/'abc1'/g" my_script
That might also make other, unwanted changes, however.
You need to make some assumptions about the content of the file in order to do such a thing at all. The above depends on the text MYSQL_ADMIN and '' to appear on the same line only in the line(s) you want to modify. That may turn out to hold, but it seems unnecessarily risky. An assumption more likely to hold in general would be that there will be only one assignment to variable MYSQL_ADMIN, or that it is acceptable to modify all such assignments that assign a single-quote-delimited empty value.
Going with the latter, one might end up with this:
sed -i -e "s/\<MYSQL_ADMIN=''\(\s\|$\)/MYSQL_ADMIN='abc1'\1/g" my_script
The pattern \<MYSQL_ADMIN=''\(\s\|$\) improves on your plain MYSQL_ADMIN in these significant ways:
the \< causes it to match only immediately after a word boundary -- start of line, whitesepace, or punctuation. This prevents substitutions for other variables whose names happen to end with MYSQL_ADMIN. If you prefer, it would be even stronger to instead anchor the match to the beginning of the line with ^.
including the ='' in the pattern distinguishes between MYSQL_ADMIN and variables whose names contain that as an initial substring. It also ensures that the '' that gets replaced, if any, goes with the variable and does not merely appear somewhere else on the line.
the \(\s\|$\) both matches and captures either a whitespace character or the empty string at the end of a line. This distinguishes between assignments of an empty value and assignments of values that are merely prefixed by '' (which is valid if the file is a shell script). Having included it in the match, the capture allows the matched text, if any, to be preserved in the output (via the \1 in the replacement).
Because that matches the whole assignment, a complete assignment must appear in the replacement, too. On the other hand, this means that (probably) you can apply the command to every line, as shown, with no particular loss of efficiency relative to the previous command.
Even that might produce changes you didn't want, however, such as in comment lines or quoted text.

Changing special characters into space

I have a variable which contains several special characters. Now I want to change these into space
If I do it one by one, the change goes fine
e.g. txt=$(printf "$txt" | sed 's/\xE2/ /g')
Now to change them all at once I inserted all special characters into a file like this :
\xE1
\xE2
\xC3
...
If I want to change this by doing this nothing happens :
while IFS=: read -r special
do
txt=$(printf "txt" | sed 's/$special/ /g')
done <"/home/u555/specialchar.txt"
What is wrong with this loop ?
sed (or any other external utility or a loop) is not needed for this job. You can use builtin bash parameter expansions:
var=${var//[$'\xe1\xe2\xc3']/ }
will do the job at once.
Update after the comment "But what if you need to change a lot of special characters (approx 50) I want to keep them in a file, so if I add to add one I don't need to change the program, only the file"
One method is to define a variable, say, spchars, as spchars=$'\xe1\xe2\xc3' within a file named, say, special_characters and source that file into your script:
. special_characters
var=${var//[$spchars]/ }
then you won't need to modify your script, only the parameter spchars in the file special_characters.

How to replace or comment double quotes in a CSV file when the doublequotes are preceded by a number

Ubuntu 16.04
GNU bash, version 4.3.48
I have some csv files that are not being parsed correctly because of "" that are placed inside the fields to represent inches.
In our csv file, columns with multiple values must be separated by commas and then the column must be wrapped with double quotes like so:
"one","two","three, three, three, three","four","five"
Example of the foreign ""
... star","Radio data system,Radio: AM/FM 8"" Diagonal Color Touch Screen,Single Slot CD/MP3 Player, Nicer","Siera ...
... star","Rear Wheelhouse Liners,Thin Profile LED Fog Lamps,4.2"" Diagonal Color Display Driver Info Center,Chevrolet Connected Access","Chevrolet ...
I know I can use sed to replace the "" quotes like so
sed -i 's/""/inch/g' filename.csv
But this causes issues when a column does not contain information, like so:
... star","Program. Exp. 10/01/2018","","All Star Edition,LT Plus Package, somemore ...","Felix ...
So I am looking for a way to replace double quotes when they are preceded by a number.
Do it this way:
line1='... star","Radio data system,Radio: AM/FM 8"" Diagonal Color Touch Screen,Single Slot CD/MP3 Player, Nicer","Siera ...'
line2='... star","Rear Wheelhouse Liners,Thin Profile LED Fog Lamps,4.2"" Diagonal Color Display Driver Info Center,Chevrolet Connected Access","Chevrolet ...'
line3='... star","Program. Exp. 10/01/2018","","All Star Edition,LT Plus Package, somemore ...","Felix ...'
echo $line1 | sed 's/\([0-9]\)""/\1inch/g'
echo $line2 | sed 's/\([0-9]\)""/\1inch/g'
echo $line3 | sed 's/\([0-9]\)""/\1inch/g'
\([0-9]\): any number from 0 to 9. The parenthesis are there since we need that keep number in the replacement.
\1inch: \1 is replaced by the number we kept in the matching part, "inch", well that is obvious ;-)
Not sure if you want to keep one ", that would be done with one simple modification: echo $line3 | sed 's/\([0-9]\)""/\1inch"/g'
You don't have to (and should not!) replace or remove those embedded quotes. The second quote is there to escape the double quote inside your field.
Taking your first example:
"one","two","three, three, three, three","four","five"
Say we want to insert "test" inside the third field, including those quotes:
"one","two","three, "test", three, three, three","four","five"
This would be a problem for the parser. Therefore those quotes have to be escaped with another quote:
"one","two","three, ""test"", three, three, three","four","five"
See rfc4180 for more details on the format.
So in your csv file the data is correct (the quotes are properly escaped):
,"Radio data system,Radio: AM/FM 8"" Diagonal",
All you have to do is tell the parser that the fields are quoted and (optionally) embedded quotes are escaped with another quote (some systems use \ to escape those quotes).
Removing or replacing those pairs of quotes before parsing could introduce all kinds of problems and errors.

Modify text file based on file's name, repeat for all files in folder

I have a folder with several files named : something_1001.txt; something_1002.txt; something_1003.txt; etc.
Inside the files there is some text. Of course each file has a different text but the structure is always the same: some lines identified with the string ">TEXT", which are the ones I am interested in.
So my goal is :
for each file in the folder, read the file's name and extract the number between "_" and ".txt"
modify all the lines in this particular file that contain the string ">TEXT" in order to make it ">{NUMBER}_TEXT"
For example : file "something_1001.txt"; change all the lines containing ">TEXT" by ">1001_TEXT"; move on to file "something_1002.txt" change all the lines containing ">TEXT" by ">1002_TEXT"; etc.
Here is the code I wrote so far :
for i in /folder/*.txt
NAME=`echo $i | grep -oP '(?<=something_/).*(?=\.txt)'`
do
sed -i -e 's/>TEXT/>${NAME}_TEXT/g' /folder/something_${NAME}.txt
done
I created a small bash script to run the code but it's not working. There seems to be syntax errors and a loop error, but I can't figure out where.
Any help would be most welcome !
There are two problems here. One is that your loop syntax is wrong; the other is that you are using single quotes around the sed script, which prevents the shell from interpolating your variable.
The grep can be avoided, anyway; the shell has good built-in facilities for extracting the base name of a file.
for i in /folder/*.txt
do
base=${i#/folder/something_}
sed -i -e "s/>TEXT/>${base%.txt}_TEXT/" "$i"
done
The shell's ${var#prefix} and ${var%suffix} variable manipulation facility produces the value of $var with the prefix and suffix trimmed off, respectively.
As an aside, avoid uppercase variable names, because those are reserved for system use, and take care to double-quote any variable whose contents may include shell metacharacters.

Reading data from file to execute Shell Script

I have a 'testfiles' files that has list of files
Ex-
Tc1
Tc2
calling above file in script
test=`cat testfiles`
for ts in $test
do
feed.sh $ts >>results
done
This script runs fine when there only 1 test file in 'testfiles',but when there are multiple files ,it fails with 'file not found'
Let me know if this is correct approach
you ll have to read files one by one since you are taking testfiles='Tc1 Tc2' cat is searching for file named 'Tc1 Tc2' which does not exist so use cut command with " " as the delimiter and rad files one by one in a loop.or u can use sed command also to seperate file names
Your approach should work if the filenames have no spaces or other tricky characters. An approach that handles spaces in file names successfully is:
while IFS= read -r ts
do
feed.sh "$ts" >>results
done <testfiles
If your file names have newline characters in them, then the above won't work and you would need to create testfiles with the names separated by a null character in place of a newline.
Let's consider the original code. When bash substitutes for $test in the for statement, all the file names appear on the same line and bash will perform word splitting which will make a mess of any file names containing white space. The same happens on the line feed.sh $ts. Since $ts is not quoted, it will also undergo word splitting.

Resources