replace first line of files by another made by changing path from unix to windows - bash

I am trying to do a bash script that:
loop over some files : OK
check if the first line matches this pattern (#!f:\test\python.exe) : OK
create a new path by changing the unix style to windows style : KO
Precisely,
From: \c\tata\development\tools\virtualenvs\test2\Scripts\python.exe
I want to get: c:\tata\development\tools\virtualenvs\test2\Scripts\python.exe
insert the new line by appending #! and the new path : KO
Follow is my script but I'm really stuck!
for f in $WORKON_HOME/$env_name/$VIRTUALENVWRAPPER_ENV_BIN_DIR/*.py
do
echo "----"
echo file=$f >&2
FIRSTLINE=`head -n 1 $f`
echo firstline=$FIRSTLINE >&2
unix_path=$WORKON_HOME/$env_name/$VIRTUALENVWRAPPER_ENV_BIN_DIR/python.exe
new_path=`echo $unix_path | awk '{gsub("/","\\\")}1'`
echo new_path=$new_path >&2
# I need to change the new_path by removing the first \ and adding : after the first letter => \c -> c:
new_line="#!"$new_path
echo new_line=$new_line >&2
case "$FIRSTLINE" in
\#!*python.exe* )
# Rewrite first line
sed -i '1s,.*,'"$new_line"',' $f
esac
done
Output:
file=/c/tata/development/tools/virtualenvs/test2/Scripts/pip-script.py
firstline=#!f:\test\python.exe
new_path=\c\tata\development\tools\virtualenvs\test2\Scripts\python.exe
new_line=#!\c\tata\development\tools\virtualenvs\test2\Scripts\python.exe
Line that is written in the file: (some weird characters are written I do not know why...)
#!tatadevelopment oolsirtualenvs est2Scriptspython.exe
Line I am expecting:
#!c:\tata\development\tools\virtualenvs\test2\Scripts\python.exe

sed is interpreting the backslashes and characters following them as escapes, so you're getting, e.g. tab. You need to escape the backslashes.
sed -i "1s,.*,${new_line//\\/\\\\}," "$f"

Related

Using sed in order to change a specific character in a specific line

I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...

Making a script in debian which would create a new file from names file with different order of the names

Existing names in the "names" file is in form of lastname1,firstname1 ; lastname2,firstname2.
In the new file it should be like down below.
Create a script that outputs a list of existing users (from the "names" file) in the form:
firstname1.lastname1
firstname2.lastname2
etc.
And saves a file called "cat list"
This kind of command line should be a solution for you :
awk -F '\.' '{print $2","$1}' source_file >> "cat list"
First awk revers the order of the field and put the char ',' under
">>" Second step redirect full output to a file called "cat list" as requested
I don't think I have the most efficient solution here but it works and outputs the different stages of translation to help illustrate the process:
#!/bin/sh
echo "lastname1,firstname1 ; lastname2,firstname2" >testfile
echo "original file:"
cat testfile
echo "\n"
# first replace semi-colon with newline
tr ';' '\n' <testfile >testfile_n
echo "after first translation:"
cat testfile_n
echo "\n"
# also remove extra spaces
tr -d '[:blank:]' <testfile_n >testfile_n_s
echo "after second translation:"
cat testfile_n_s
echo "\n"
# now swap name order using sed and use periods instead of commas
sed -E 's/([a-zA-Z0-9]*),([a-zA-Z0-9]*)/\2\.\1/g' testfile_n_s >"cat list"
echo "after third iteration:"
cat "cat list"
echo "\n"
The script above will save a file called 'cat list' and output something similar to:
original file:
lastname1,firstname1 ; lastname2,firstname2
after first translation:
lastname1,firstname1
lastname2,firstname2
after second translation:
lastname1,firstname1
lastname2,firstname2
after third iteration:
firstname1.lastname1
firstname2.lastname2

Trying to comment a source file

I am trying to comment lines of source so that something like
LANG = 'ENG';
becomes
// LANG = 'ENG';
There are over a thousand lines in the source file and 'ENG' is not unique but the entire line IS.
I gave up on wild carding the spaces and just tried the entire extant line 'as-is' but no joy.
Something like (commented shell)--
enter code here
#!/bin/bash
#if [ -n "$5" ] ; then
#if [ "$5" == "ENG" ] ; then
sed -i "s/' LANG = '\''ENG'\''/\/\/' LANG'
= '\''ENG'\''/" vc.pas > vc.out
#fi
#fi
So it reduces down to a single line. No joy whatever I try.
TIA !
Howie
This works for me, passing any whitespace through to the output:
$ echo "LANG='ENG';" | sed "s#^\(\s*LANG\s*=\s*'ENG'\s*;\)#// \1#"
// LANG='ENG';
$ echo " LANG = 'ENG' ; " | sed "s#^\(\s*LANG\s*=\s*'ENG'\s*;\)#// \1#"
// LANG = 'ENG' ;
Technically it needs double backslashes inside the double-quoted string, but because none of the sequences form a valid escape sequence, bash doesn't mind.
With GNU sed, use
sed -i "s,.*LANG *= *'ENG';.*,//&," vc.pas
where
-i - enables inline file modification
s - substitution command
, - delimiter
.*LANG *= *'ENG';.* - text containing LANG = 'ENG'; with any amount of spaces around =
//& - replaces the matched line with // and the line itself
Use a regular expression for the line selection, followed by a substitution. Put the replacement commands in a file:
$ cat dt.sed
/^LANG[[:space:]]*=[[:space:]]*'[^']*'/s;^;// ;
$
Then run sed(1) with that script:
$ echo "LANG='FOO'" | sed -f dt.sed
// LANG='FOO'
This works on a Fedora 34 system, but should work on any Linux.

How to check if a file have been complete and reach to EOF?

My collaborator was processing a large batch of files, but some of the output files seem to be interrupted before they were completed. It seems that these incomplete files do not have the end of the file character (EOF). I would like to do a script in batch to loop through all of these files and check if the EOF character is there for every one of the ~500 files. Can you give me any idea of how to do this? Which command can I use to know if a file has EOF character at the end?
I am not sure if there is supposed to be a special character at the end of the files when they are complete, but normal files looks like this
my_user$ tail CHSA0011.fastq
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
#HS40_15367:8:1106:6878:29640/2
TGATCCATCGTGATGTCTTATTTAAGGGGAACGTGTGGGCTATTTAGGCTTTATGACCCTGAAGTAGGAACCAGA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
#HS40_15367:8:1202:14585:48098/1
TGATCCATCGTGATGTCTTATTTAAGGGGAACGTGTGGGCTATTTAGGCTTTATGACCCTGAAGTAGGAACCAGA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
my_user$
But when I do tail tho thse interrupted files they look like:
my_user$ tail IST-MES1.fastq
#HS19_13305:3:1115:13001:3380/2
GTGGAGACGAGGTTTCACCATGTTGGCCAGGCTGGTCTCGAGCTCCTGACCTCAAGTGATCCGTCTGCCTTGGCC
+
#B#FFFFFHHHHFHHIJJJJJIIJJJJJJJIJJJJGIIJJGIIGIIJJJJFDHHIJFHGIGHIHHHFFFFFFEEE
#HS19_13305:3:1106:5551:75750/2
CGAGGTTTCACCATGTTGGCCAGGCTGGTCTCGAGCTCCTGACCTCAAGTGATCCGTCTGCCTTGGCCCCCCAAA
+
CCCFFADFHHHHHJJIJJJJJJJJJJJJEGGIJGGHIIJIIIIIIJJJJDEGGIJJJGIIIJJIJJJHHHFDDDD
#HS19_13305:3:2110:17731:73616/2
CGAGGTTTCACCATGTTGGCCAGGCTGmy_user$
As you can see, in normal files my_user$ is displayed one line below the end of the file. But in these interrupted ones my_user$ is next to the end of the files. Maybe it just because the file does not end with a line breaker \n ?
I am sorry if the question is a bit confusing,
cheers,
Guillermo
Yes, the difference is because in the first case the file ends with \n (new line).
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
my_user$
In this case it doesn't have a new line so the next thing it prints is your use (actually your PS1)
CGAGGTTTCACCATGTTGGCCAGGCTGmy_user$
You can try this:
echo "CCCFFADFHHHHH" # <--- implicitly includes newline at the end
echo -n "CCCFFADFHHHHH" # <--- does not include newline at the end
There are actually two endline options, \r and \n and there are different standards according to your OS. I will assume you are working on linux and only \n is used. So in this example the newline character is 0x0a (number 10) in the ascii map.
If you want to know the last char of each file, you can do:
echo -n "CCCFFADFHHHHH" > uglyfile.txt
echo "CCCFFADFHHHHH" > nicefile.txt
for file in *.txt; do
echo -n "$file ends with: 0x";
tail -c 1 $file | xxd -p;
done;
If you want to know which files end with a char that is not a newline, you can do:
echo -n "CCCFFADFHHHHH" > uglyfile.txt
echo "CCCFFADFHHHHH" > nicefile.txt
for file in *.txt; do
lastchar_hex=`tail -c 1 $file | xxd -p`
if [[ $lastchar_hex != '0a' ]]; then
echo "File $file does not end with newline"
fi;
done;

How can I accept unquoted strings containing backslashes?

I want a command to convert from windows to unix filenames, simply to replace backslashes with frontslashes... but without quoting the argument with "" because that's a chore when copy-pasting.
It works in the other direction (u2w) with the input quoted and without, but not for w2u.
machine:~/glebbb> w2u "a\b\c"
a/b/c
machine:~/glebbb> w2u a\b\c
abc
How can I make it work? I tried every form of escaping, echo -E, printf etc, nothing seems to work!
function w2u {
if [ -z "$1" ] ; then
echo "w2u: must provide path to convert!"
return 1
else
printf "\n%s\n\n" "$1" | sed -e 's#\\#\/#g'
return 0
fi
}
If you're copy-pasting and the path is contained in the X clipboard, you can use xclip:
xclip -o | sed -e 's#\\#\/#g'
If you've got a ton of file paths to convert, you can process the whole file instead:
sed ... < file
will produce a new stream with the backslashes changed to slashes.
Otherwise I can't think of any way how to not-escape the parameters to w2u and yet have backslashes lose their meaning.

Resources