How to get $this->translate('Content') => Content in a Textfile - bash

I am looking for a shell script which scans a direcotry and all its subdirectories for .php and .phtml files. Within these files, I am looking for $this->translate('') statements (also $this->view->translate('')) and I want to save the content of these statements in a textfile.
The problem is, that there are several different types of this statements:
$this->translate('Single Quotes') // I need: Single Quotes
$this->translate("Double Quotes") // I need: Double Quotes
$this->translate('Single quotes with %1$s placeholders', $xy) // I need: Single quotes with %1$s placeholders
$this->translate("Double quotes with %1\$s", $xy) // I need: Dobule quotes with %1$s
$this->view->translate('With view') // I need: With view
$this->view->translate("With view 2") // I need: With view 2
$this->translate('Single Quotes with "Doubles"') // I need: Single Quotes with "Doubles"
$this->translate("Double Quotes with 'Singles') // I need: Double Quotes with 'Singles'
I have already programmed a script and a guy from starmind.com sent me the following lines:
echo -n > give_me_your_favorite_outfile_name.txt
for i in `find . -iname '*php' `
do
echo -n "Processing $i ..."
# echo " +++++++ from $i ++++++++" >> give_me_your_favorite_outfile_name.txt
cat $i | sed -n -e '/->translate(*/p' | sed -e 's/\(.*->translate(.\)\([a-z A-Z \d092\d039\d034]*\)\(.*\)/\2/g' | sed -e 's/\(.*\)\(\d039\)/\1/g' | sed -e 's/\(.*\)\(\d034\)/\1/g' >> give_me_your_favorite_outfile_name.txt
echo " done"
done
for i in `find . -iname '*phtml' `
do
echo -n "Processing $i ..."
# echo " +++++++ from $i ++++++++" >> give_me_your_favorite_outfile_name.txt
cat $i | sed -n -e '/->translate(*/p' | sed -e 's/\(.*->translate(.\)\([a-z A-Z \d092\d039\d034]*\)\(.*\)/\2/g' | sed -e 's/\(.*\)\(\d039\)/\1/g' | sed -e 's/\(.*\)\(\d034\)/\1/g' >> give_me_your_favorite_outfile_name.txt
echo " done"
done
Unfortunately, it does not cover all the above cases, especially the Quotes within Quotes cases. As I am not a shell expert at all and need that script for a verification process, I would be very happy to get help from you guys.
Important: It has to be written in Shell. A PHP Version exists.

find /path -type f \( -name "*.php" -o -name "*.phtml" \) | while IFS= read -r -d $'\0' file
do
while read -r line
do
case "$line" in
*'$this->translate'* | *'$this->view->translate'* )
line="${line#*this*translate(}"
line="${line%%)*}"
case ${line:0:1} in
\$) s=${line:0};;
*) s=${line:1:${#line}-2};;
esac
case "$s" in
*[\"\'],* )
s=${s/\\/}
echo ${s%%[\"\'],*};;
* ) echo "$s";;
esac
esac
done < "$file"
done

This does it using sed in a Bash while loop and demonstrates another way to do the find for variety's sake:
find . -iregex ".*\.php\|.*\.phtml" |
while read f
do
sed -n '/[\"\o047]/ {s/$this->\(view->\|\)translate([\"\o047]\(.*\)[\"\o047].*)/\2/; s.\\..;p}' $f
done > outputfile.txt
Edit:
To take care of other text on the line change the sed command to this:
sed -n '/[\"\o047]/ {s/.*$this->\(view->\|\)translate([\"\o047]\(.*\)[\"\o047].*).*/\2/; s.\\..;p}' $f
(Just add a .* at the beginning and end of the search string.)

Related

why shell for expression cannot parse xargs parameter correctly

I have a black list to save tag id list, e.g. 1-3,7-9, actually it represents 1,2,3,7,8,9. And could expand it by below shell
for i in {1..3,7..9}; do for j in {$i}; do echo -n "$j,"; done; done
1,2,3,7,8,9
but first I should convert - to ..
echo -n "1-3,7-9" | sed 's/-/../g'
1..3,7..9
then put it into for expression as a parameter
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # for i in {#}; do for j in {$i}; do echo -n "$j,"; done; done
zsh: parse error near `do'
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # echo #
1..3,7..9
but for expression cannot parse it correctly, why is so?
Because you didn't do anything to stop the outermost shell from picking up the special keywords and characters ( do, for, $, etc ) that you mean to be run by xargs.
xargs isn't a shell built-in; it gets the command line you want it to run for each element on stdin, from its arguments. just like any other program, if you want ; or any other sequence special to be bash in an argument, you need to somehow escape it.
It seems like what you really want here, in my mind, is to invoke in a subshell a command ( your nested for loops ) for each input element.
I've come up with this; it seems to to the job:
echo -n "1-3,7-9" \
| sed 's/-/../g' \
| xargs -I # \
bash -c "for i in {#}; do for j in {\$i}; do echo -n \"\$j,\"; done; done;"
which gives:
{1..3},{7..9},
Could use below shell to achieve this
# Mac newline need special treatment
echo "1-3,7-9" | sed -e 's/-/../g' -e $'s/,/\\\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,%
#Linux
echo "1-3,7-9" | sed -e 's/-/../g' -e 's/,/\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,
but use this way is a little complicated maybe awk is more intuitive
# awk
echo "1-3,7-9,11,13-17" | awk '{n=split($0,a,","); for(i=1;i<=n;i++){m=split(a[i],a2,"-");for(j=a2[1];j<=a2[m];j++){print j}}}' | tr '\n' ','
1,2,3,7,8,9,11,13,14,15,16,17,%
echo -n "1-3,7-9" | perl -ne 's/-/../g;$,=",";print eval $_'

Looping over filtered find and performing an operation

I have a garbage dump of a bunch of Wordpress files and I'm trying to convert them all to Markdown.
The script I wrote is:
htmlDocs=($(find . -print | grep -i '.*[.]html'))
for html in "${htmlDocs[#]}"
do
P_MD=${html}.markdown
echo "${html} \> ${P_MD}"
pandoc --ignore-args -r html -w markdown < "${html}" | awk 'NR > 130' | sed '/<div class="site-info">/,$d' > "${P_MD}"
done
As far as I understand, the first line should be making an array of all html files in all subdirectories, then the for loop has a line to create a variable with the Markdown name (followed by a debugging echo), then the actual pandoc command to do the conversion.
One at a time, this command works.
However, when I try to execute it, OSX gives me:
$ ./pandoc_convert.command
./pandoc_convert.command: line 1: : No such file or directory
./pandoc_convert.command: line 1: : No such file or directory
o_0
Help?
There may be many reasons why the script fails, because the way you create the array is incorrect:
htmlDocs=($(find . -print | grep -i '.*[.]html'))
Arrays are assigned in the form: NAME=(VALUE1 VALUE2 ... ), where NAME is the name of the variable, VALUE1, VALUE2, and the rest are fields separated with characters that are present in the $IFS (input field separator) variable. Suppose you find a file name with spaces. Then the expression will create separate items in the array.
Another issue is that the expression doesn't handle globbing, i.e. file name generation based on the shell expansion of special characters such as *:
mkdir dir.html
touch \ *.html
touch a\ b\ c.html
a=($(find . -print | grep -i '.*[.]html'))
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
Output
>>>./a<<<
>>>b<<<
>>>c.html<<<
>>>./<<<
>>>a b c.html<<<
>>>dir.html<<<
>>> *.html<<<
>>>./dir.html<<<
I know two ways to fix this behavior: 1) temporarily disable globbing, and 2) use the mapfile command.
Disabling Globbing
# Disable globbing, remember current -f flag value
[[ "$-" == *f* ]] || globbing_disabled=1
set -f
IFS=$'\n' a=($(find . -print | grep -i '.*[.]html'))
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
# Restore globbing
test -n "$globbing_disabled" && set +f
Output
>>>./ .html<<<
>>>./a b c.html<<<
>>>./ *.html<<<
>>>./dir.html<<<
Using mapfile
The mapfile is introduced in Bash 4. The command reads lines from the standard input into an indexed array:
mapfile -t a < <(find . -print | grep -i '.*[.]html')
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
The find Options
The find command selects all types of nodes, including directories. You should use the -type option, e.g. -type f for files.
If you want to filter the result set with a regular expression use -regex option, or -iregex for case-insensitive matching:
mapfile -t a < <(find . -type f -iregex .*\.html$)
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
Output
>>>./ .html<<<
>>>./a b c.html<<<
>>>./ *.html<<<
echo vs. printf
Finally, don't use echo in new software. Use printf instead:
mapfile -t a < <(find . -type f -iregex .*\.html$)
for html in "${a[#]}"; do printf '>>>%s<<<\n' "$html"; done
Alternative Approach
However, I would rather pipe a loop with a read:
find . -type f -iregex .*\.html$ | while read line
do
printf '>>>%s<<<\n' "$line"
done
In this example, the read command reads a line from the standard input and stores the value into line variable.
Although I like the mapfile feature, I find the code with the pipe more clear.
Try adding the bash shebang and set IFS to handle spaces in folders and filenames:
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
htmlDocs=($(find . -print | grep -i '.*[.]html'))
for html in "${htmlDocs[#]}"
do
P_MD=${html}.markdown
echo "${html} \> ${P_MD}"
pandoc --ignore-args -r html -w markdown < "${html}" | awk 'NR > 130' | sed '/<div class="site-info">/,$d' > "${P_MD}"
done
IFS=$SAVEIFS

Shell script print the path of all sub directory

My Script:
cd /var/www/try/
sort -u
files="$(find -L "/var/www/try/" -type d)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
echo $file >> filename.csv
done
Output :
/var/www/try
/var/www/try/cat
Output Should be :
var-www-try
var-www-try-cat
Second Case :
any character except / as some fo my folder name contain / like for e.g.
"tv/dvd"
Output genrated :
/var/www/try/cat/tv/dvd
Output Should be :
var-www-try-cat-tv/dvd-
Have a look at sed, which is a popular tool for string replacing tasks.
>> x=$(echo '/var/www/try/cat/tv/' | sed 's/\//-/g')
>> echo $x
-var-www-try-cat-tv-
>> x=${x:1}
>> echo $x
var-www-try-cat-tv-
edit:
In response to your second case:
as some fo my folder name contain /
Maybe there's a misunderstanding here but you should not have a / in a filename. See Is it possible to use "/" in a filename?

extract characters from filename of newest file

I am writing a bash script where i will need to check a directory for existing files and look at the last 4 digits of the first segment of the file name to set the counter when adding new files to the directory.
Naming Scructure:
yymmddHNAZXLCOM0001.835
I need to put the portion in the example 0001 into a CTR variable so the next file it puts into the directory will be
yymmddHNAZXLCOM0002.835
and so on.
what would be the easiest and shortest way to do this?
You can do this with sed:
filename="yymmddHNAZXLCOM0001.835"
first_part=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\1/')
counter=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\2/')
suffix=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\3/')
echo "$first_part$(printf "%04u" $(($counter + 1))).$suffix"
=> "yymmddHNAZXLCOM0002.835"
All three sed calls use the same regular expression. The only thing that changes is the group selected to return. There's probably a way to do all of that in one call, but my sed-fu is rusty.
Alternate version, using a Bash array:
filename="yymmddHNAZXLCOM0001.835"
ary=($(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\1 \2 \3/'))
echo "${ary[0]}$(printf "%04u" $((${ary[1]} + 1))).${ary[2]}"
=> "yymmddHNAZXLCOM0002.835"
Note: This version assumes that the filename does not have spaces in it.
Try this...
current=`echo yymmddHNAZXLCOM0001.835 | cut -d . -f 1 | rev | cut -c 1-4 | rev`
next=`echo $current | awk '{printf("%04i",$0+1)}'`
f() {
if [[ $1 =~ (.*)([[:digit:]]{4})(\.[^.]*)$ ]]; then
local -a ctr=("${BASH_REMATCH[#]:1}")
touch "${ctr}$((++ctr[1]))${ctr[2]}"
# ...
else
echo 'no matches'
fi
}
shopt -s nullglob
f *

Escape single quotes in long directory name then pass it to xargs [Bash 3.2.48]

In my directory I have subfolders, and I want to list all directories like this:
- ./subfolder
- ./subfolder/subsubfolder1
- ./subfolder/subsubfolder2
- ./subfolder/subsubfolder2/subsubsubfolder
I want to list this structure:
./fol'der/subfol'der/
Here is my code:
echo -n "" > myfile
find . -type d -print0 | xargs -0 -I# | cat | grep -v -P "^.$" | sed -e "s/'/\\\'/g" | xargs -I# echo "- #" >> myfile
The desired output would be like this:
- ./fol'der
- ./fol'der/subfol'der
But the output is:
- ./fol'der
- #
It seems like sed fails at the second occurrence of the single quote (') character, or something. I have no idea. Can you help me? (I'm on OS X 10.7.4.)
I've been grep-ing and sed-ing like an idiot. Thought about a little bit, and I came up with a much more simple solution, a for loop.
echo -n "" > myfile
for folder in $(find . -type d)
do
if [[ $folder != "." ]]
then
echo "- ${folder}" >> myfile
fi
done
My previous solution wasn't working with names containing whitespaces, so the correct one is:
echo -n "" > myfile
find . -type d -print0 | while read -d $'\0' folder
do
if [[ "${folder}" != "." ]]
then
echo "- ${folder}" >> myfile
fi
done
With GNU Parallel you can do:
find . -type d -print0 | parallel -q -0 echo '- '{}
Your output will be screwed up if you have any dirs with \n in its name. If you do not have any dirs with \n in the name you can do:
find . -type d -print | parallel -q echo '- '{}
The -q is only needed if you really need two spaces after '-'.
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
This is on Linux, but it should work on OS X:
find . -type d -print0 | xargs -0 -I # echo '- #'
It works for me regardless of whether the last set of quotes are single or double.
Output:
- ./fol'der
- ./fol'der/subfol'der

Resources