Changing special characters into space - bash

I have a variable which contains several special characters. Now I want to change these into space
If I do it one by one, the change goes fine
e.g. txt=$(printf "$txt" | sed 's/\xE2/ /g')
Now to change them all at once I inserted all special characters into a file like this :
\xE1
\xE2
\xC3
...
If I want to change this by doing this nothing happens :
while IFS=: read -r special
do
txt=$(printf "txt" | sed 's/$special/ /g')
done <"/home/u555/specialchar.txt"
What is wrong with this loop ?

sed (or any other external utility or a loop) is not needed for this job. You can use builtin bash parameter expansions:
var=${var//[$'\xe1\xe2\xc3']/ }
will do the job at once.
Update after the comment "But what if you need to change a lot of special characters (approx 50) I want to keep them in a file, so if I add to add one I don't need to change the program, only the file"
One method is to define a variable, say, spchars, as spchars=$'\xe1\xe2\xc3' within a file named, say, special_characters and source that file into your script:
. special_characters
var=${var//[$spchars]/ }
then you won't need to modify your script, only the parameter spchars in the file special_characters.

Related

Single file contain files name and scores | text processing

I have a folder called files that has 100 files, each one has one value inside;such as: 0.974323
This my code to generate those files and store the single value inside:
DIR="/home/XX/folder"
INPUT_DIR="/home/XX/folder/eval"
OUTPUT_DIR="/home/XX/folder/files"
for i in $INPUT_DIR/*
do
groovy $DIR/calculate.groovy $i > $OUTPUT_DIR/${i##*/}_rates.txt
done
That will generate 100 files inside /home/XX/folder/files, but what I want is one single file that has in each line two columns separated by tab contain the score and the name of the file (which is i).
the score \t name of the file
So, the output will be:
0.9363728 \t resultFile.txt
0.37229 \t outFile.txt
And so on, any help with that please?
Assuming your Groovy program outputs just the score, try something like
#!/bin/sh
# ^ use a valid shebang
# Don't use uppercase for variables
dir="/home/XX/folder"
input_dir="/home/XX/folder/eval"
output_dir="/home/XX/folder/files"
# Always use double quotes around file names
for i in "$input_dir"/*
do
groovy "$dir/calculate.groovy" "$i" |
sed "s%^%$i\t%"
done >"$output_dir"/tabbed_file.txt
The sed script assumes that the file names do not contain percent signs, and that your sed recognizes \t as a tab (some variants will think it's just a regular t with a gratuitous backslash; replace it with a literal tab, or try ctrl-v tab to enter a literal tab at the prompt in many shells).
A much better fix is probably to change your Groovy program so that it accepts an arbitrary number of files as command-line arguments, and includes the file name in the output (perhaps as an option).

Replacing string with variable, output to variably named files

I have the template file 12A-r.inp . I want to prepare files from this file whose name will be 16A-r.inp, 20A-r.inp, 24A-r.inp. And I want to change some parameter in those files according to their names. For example, I want to replace the string "12A" in all places in file 12A-r.inp, with 16A in 16A-r.inp, and 20A in 20A-r.inp. I have written the code below for this:
for ((i=12;i<=24;i=i+4))
do
cat 12A-r.inp >> $i\A-r.inp
done
for ((i=12;i<=24;i=i+4))
do
sed -i "s/12A/${i}/g" $i\A-r.inp
done
But the problem is 12A gets replaced by ${i}, not with strings like 16A, 20A etc.
Observations:
In for ((i=12;i<=24;i=i+4)) counts 12,16,20,24. There's no need
to start at 12, since the template is already correct. Worse,
when i=12, this code cat 12A-r.inp >> $i\A-r.inp appends a
copy of the template file onto itself, doubling it, which causes every ensuing
created file to be twice as long as the original template.
The \ in $i\A-r.inp is unnecessary, since A is not a special character.
The cat is unnecessary, sed without -i can do it all.
In sed, s/12A/${i}/g would replace the string "12A", with whatever number $i is, without the "A", unless the variable includes that letter.
The for loop uses a bashism to enumerate i... in this instance there's a simpler equivalent bashism, (see below).
Suggested revision:
for i in {16..24..4}A
do
sed "s/12A/${i}/g" 12A-r.inp > ${i}-r.inp
done
How it works:
$i is set to 16A,20A, and 24A.
sed repeatedly reads in the template, replaces 12A with $i,
prints everything to STDOUT...
which is redirected to the appropriately named file.

how to edit url string with sed

My Linux repository file contain a link that until now was using http with a port number to point to it repository.
baseurl=http://host.domain.com:123/folder1/folder2
I now need a way to replace that URL to use https with no port or a different port .
I need also the possibility to change the server name for example from host.domain.com to host2.domain.com
So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.
Im now using this code (im using echo just for the example):
the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around
and both code i was using the same sed for generic reasons.
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
OR
WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
is that the correct way doing so?
sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:
echo 'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'
result:
https://host.domain.com/folder1/folder2
(BTW you don't have to escape slashes because you are using an alternate separating character)
the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).
You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).
Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.
Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:
$ echo 'http://host.domain.com:123/folder1/folder2' |
sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2
$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2
If that isn't what you need then edit your question to clarify your requirements and in particular explain why:
You want to use hard-coded URLs, and
You need 1 script to do both transformations.
and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).
wrt what you had:
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
The main issues are:
Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Modify text file based on file's name, repeat for all files in folder

I have a folder with several files named : something_1001.txt; something_1002.txt; something_1003.txt; etc.
Inside the files there is some text. Of course each file has a different text but the structure is always the same: some lines identified with the string ">TEXT", which are the ones I am interested in.
So my goal is :
for each file in the folder, read the file's name and extract the number between "_" and ".txt"
modify all the lines in this particular file that contain the string ">TEXT" in order to make it ">{NUMBER}_TEXT"
For example : file "something_1001.txt"; change all the lines containing ">TEXT" by ">1001_TEXT"; move on to file "something_1002.txt" change all the lines containing ">TEXT" by ">1002_TEXT"; etc.
Here is the code I wrote so far :
for i in /folder/*.txt
NAME=`echo $i | grep -oP '(?<=something_/).*(?=\.txt)'`
do
sed -i -e 's/>TEXT/>${NAME}_TEXT/g' /folder/something_${NAME}.txt
done
I created a small bash script to run the code but it's not working. There seems to be syntax errors and a loop error, but I can't figure out where.
Any help would be most welcome !
There are two problems here. One is that your loop syntax is wrong; the other is that you are using single quotes around the sed script, which prevents the shell from interpolating your variable.
The grep can be avoided, anyway; the shell has good built-in facilities for extracting the base name of a file.
for i in /folder/*.txt
do
base=${i#/folder/something_}
sed -i -e "s/>TEXT/>${base%.txt}_TEXT/" "$i"
done
The shell's ${var#prefix} and ${var%suffix} variable manipulation facility produces the value of $var with the prefix and suffix trimmed off, respectively.
As an aside, avoid uppercase variable names, because those are reserved for system use, and take care to double-quote any variable whose contents may include shell metacharacters.

Reading data from file to execute Shell Script

I have a 'testfiles' files that has list of files
Ex-
Tc1
Tc2
calling above file in script
test=`cat testfiles`
for ts in $test
do
feed.sh $ts >>results
done
This script runs fine when there only 1 test file in 'testfiles',but when there are multiple files ,it fails with 'file not found'
Let me know if this is correct approach
you ll have to read files one by one since you are taking testfiles='Tc1 Tc2' cat is searching for file named 'Tc1 Tc2' which does not exist so use cut command with " " as the delimiter and rad files one by one in a loop.or u can use sed command also to seperate file names
Your approach should work if the filenames have no spaces or other tricky characters. An approach that handles spaces in file names successfully is:
while IFS= read -r ts
do
feed.sh "$ts" >>results
done <testfiles
If your file names have newline characters in them, then the above won't work and you would need to create testfiles with the names separated by a null character in place of a newline.
Let's consider the original code. When bash substitutes for $test in the for statement, all the file names appear on the same line and bash will perform word splitting which will make a mess of any file names containing white space. The same happens on the line feed.sh $ts. Since $ts is not quoted, it will also undergo word splitting.

Resources