How to find complete file names in UNIX if i know only extention of file. - shell

Suppose I have a file which contains other file names with some extention [.dat,.sum etc].
text file containt
gdsds sd8ef g/f/temp_temp.sum
ghfp hrwer h/y/test.text.dat
if[-r h/y/somefile.dat] then....
I want to get the complete file names, like for above file I should get output as
temp_temp.sum
test.text.dat
somefile.dat
I am using AIX unix in which grep -ow [a-zA-Z_] filename is not working as for AIX -o switch is not present.

sed is good, but as you have a range of types of 'records', maybe awk can help.
My target is any 'word' found by awk that has a '/' in it, then take that word, remove everything up to the last '/', leaving just the filename.
{
cat -<<EOS
gdsds sd8ef g/f/temp_temp.sum
ghfp hrwer h/y/test.text.dat
if[-r h/y/somefile.dat] then....
EOS
} \
| awk '{
for (i=1; i<=NF;i++) {
if ($i ~ /.*\//) {
fName=$i
sub(/.*\//, "", fName)
# put any other chars you to to delete inside the '[ ... ]' char list
sub(/[][]/, "", fName)
if (fName) {
print file
}
}
}
}'
output
temp_temp.sum
test.text.dat
somefile.dat
(Also, your question headline doesn't seem to match up with your description, if I'm missing something, please feel free to update your question and post a comment indicating the edits (or you can edit your headline). )
P.S. Welcome to StackOverflow. Please remember to accept the answer that best solves your problem, if any, by pressing the checkmark sign, http://i.imgur.com/uqJeW.png. When you see good Q&A, vote them up by using the gray triangles, http://i.imgur.com/kygEP.png. Note that 'giving' reputation points to others does not mean a deduction to your reputation points (unless you have posted a bounty).

Related

How to get paragraphs of text by index number

I am wondering if there is a way to get paragraphs of text (source file would be a pyx file) by number as sed does with lines
sed -n ${i}p
At this moment I'd be interested to use awk with:
awk '/custom-pyx-tag\(/,/\)custom-pyx-tag/'
but I can't find documentation or examples about that.
I'm also trying to trim "\r\n" with gsub(/\r\n/,"; ") int the same awk command but it doesn't work, and I can't really figure out why.
Any hint would be very appreciated, thanks
EDIT:
This is just one example and not my exact need but I would need to know how to do it for a multipurpose project
Let's take the case that I have exported the ID3Tags of a huge collection of audio files and these have been stored in a pyx-like format, so in the end I will have a nice big file with this pattern repeating for each file in the collection:
audio-genre(
blablabla
)audio-genre
audio-artist(
bla.blabla
)audio-artist
audio album(
bla-bla-bla
)audio-album
audio-track-num(
0x
)audio-track-num
audio-track-title(
bla.bla-bla
)audio-track-title
audio-lyrics(
blablablablabla
bla.bla.bla.bla
blah-blah-blah
blabla-blabla
)audio-lyrics
...
Now if I want to extract the artist of the 1234th audio file I can use:
awk '/audio-artist\(/, /)audio-artist/' | sed '/audio-artist/d' | sed -n 1234p
so being one line it can be obtained with sed, but I don't know how to get an entire paragraph given its index, for example if I want to get the lyrics of the 6543th file how could I do it?
In the end it is just a question of whether there is a command equivalent to
sed -n $ {num} p
but to be used for paragraphs
awk -v indx=1024
'BEGIN {
RS=""
}
{ split($0,arr,"audio-artist");
for (i=2;i<=length(arr);i=i+2)
{ gsub("[()]","",arr[i]);
arts[cnt+=1]=arr[i]
}
}
END {
print arts[indx]
}' audioartist
One liner:
awk -v indx=1234 'BEGIN {RS=""} NR==1 { split($0,arr,"audio-artist");for (i=2;i<=length(arr);i=i+2) { gsub("[()]","",arr[i]);arts[cnt+=1]=arr[i] } } END { print arts[indx] }' audioartist
Using awk, and the file called audioartist, we consume the file as one line by setting the records separator (RS) to "". We then split the whole file into an array arr, based on the separator audio-artist. We look through the array arr starting from 2 in steps of 2 till the end of the array and strip out the opening and closing brackets, creating another array called arts with an incrementing count as the index and the stripped artist as the value. At the end we print the arts index specified by the passed indx variable (in this case 1234).

How to print matching all names given as a argument?

I want to write a script for any name given as an argument and prints the list of paths
to home directories of people with the name.
I am new at scripts. Is there any simple way to do this with awk or egrep command?
Example:
$ show names jakub anna (as an argument)
/home/users/jakubo
/home/students/j_luczka
/home/students/kubeusz
/home/students/jakub5z
/home/students/qwertinx
/home/users/lazinska
/home/students/annalaz
Here is the my friend's code but I have to write it from a different way and it has to be simple like this code
#!/bin/bash
for name in $#
do
awk -v n="$name" -F ':' 'BEGIN{IGNORECASE=1};$5~n{print $6}' /etc/passwd | while read line
do
echo $line
done
done
Possible to use a simple awk script to look for matching names.
The list of names can be passed as a space separated list to awk, which will construct (in the BEGIN section) a combined pattern (e.g. '(names|jakub|anna)'). The pattern is used for testing the user name column ($5) of the password file.
#! /bin/sh
awk -v "L=$*" -F: '
BEGIN {
name_pat = "(" gensub(" ", "|", "g", L) ")"
}
$5 ~ name_pat { print $6 }
' /etc/passwd
Since at present the question as a whole is unclear, this is more of a long comment, and only a partial answer.
There is one easy simplification, since the sample code includes:
... | while read line
do
echo $line
done
All of the code shown above after and including the | is needless, and does nothing, (like a UUoC), and should therefore be removed. (Actually echo $line with an unquoted $line would remove formatting and repeated spaces, but that's not relevant to the task at hand, so we can say the code above does nothing.)

How to make and name multiple text files after using the cut command?

I have about 50 data text files that I need to remove several columns from.
I have been using the cut command to remove and rename them individually but I will have many more of the files and need a way to do it large scale.
Currently I have been using:
cut -f1,6,7,8 filename.txt >> filename_Fixed.txt
And I am able to remove the columns from all the files using:
cut -f1,6,7,8 *.txt
But I'm only able to get all the output in the terminal or I can write it to a single text file.
What I want is to edit several files using cut to remove the required columns:
filename1.txt
filename2.txt
filename3.txt
filename4.txt
.
.
.
And get the edited output to write to individual files:
filename_Fixed1.txt
filename_Fixed2.txt
filename_Fixed3.txt
filename_Fixed4.txt
.
.
.
But haven't been able to find a way to write the output to new text files. I'm new to using the command line and not much of a coder, so maybe I don't know what terms to search for? I haven't even been able to find anything doing google searches that has helped me. It seems like it should be simple, but I am struggling.
In desperation, I did try this bit of code, knowing it wouldn't work:
cut -f1,6,7,8 *.txt >> ( FILENAME ".fixed" )
I found the portion after ">>" nested in an awk command that output multiple files.
I also tried (again knowing it wouldn't work) to wild card the output files but got an ambiguous redirect error.
Did you try for?
for f in *.txt ; do
cut -f 1,6,7,8 "$f" > $(basename "$f" .txt)_fixed.txt
done
(N.B. I can't try the basename now, you can replace it with "${f}_fixed")
You can also process it all in awk itself which would make the process much more efficient, especially for large numbers of files, for example:
awk '
NF < 8 {
print "contains less than 8 fields: ", FILENAME
next
}
{ fn=FILENAME
idx=match(fn, /[0-9]+.*$/)
if (idx == 0) {
print "no numeric suffix for file: ", fn
next;
}
newfn=substr(fn,1,idx-1) "_Fixed" substr(fn,idx)
print $1,$6,$7,$8 > newfn
}
' *.txt
Which contains two rules (the expressions between {...}). The first:
NF < 8 {
print "contains less than 8 fields: ", FILENAME
next
}
simply checks that the file contains at least 8 fields (since you want field 8 as your last field). If the file contains less than 8 fields, it just skips to the next file in your list.
The second rule:
{ fn=FILENAME
idx=match(fn, /[0-9]+.*$/)
if (idx == 0) {
print "no numeric suffix for file: ", fn
next;
}
newfn=substr(fn,1,idx-1) "_Fixed" substr(fn,idx)
print $1,$6,$7,$8 > newfn
}
fn=FILENAME stores the current filename as fn to cut down typing,
idx=match(fn, /[0-9]+.*$/) locates the index where the numeric suffix for the filename begins (e.g. were "3.txt" starts),
if (idx == 0) then a numeric suffix was not found, warn, and move on to the next file,
newfn=substr(fn,1,idx-1) "_Fixed" substr(fn,idx) form the new filename from the non-numeric prefix (e.g. "filename"), add "_Fixed" with string-concatenation and then add the numeric suffix, and finally
print $1,$6,$7,$8 > newfn print fields (columns) 1,6,7,8 redirecting output to the new filename.
For more information on each of the string-functions used above, see the GNU awk User's Guide - 9.1.3 String-Manipulation Functions
If I understand what you were attempting, this should be able to handle as many files as you have -- so long as the files have a numeric suffix to place "_Fixed" before in the filename and each file has at least 8 fields (columns). You can just copy/middle-mouse-paste the entire command at the command-line to test.

Sed search&replace from CSV file inserts carriage return

I have a file retimp_info.csv with two columns and ~500 rows like this:
rettag, retid
231,1
and a file mdb_ret_exp.csv with multiple rows and columns:
a,s,d,231,f,g
a,s,d,345,f,g
So the goal is to find and replace the occurrences of the rettag with retid from the first file. Now there are multiple rettags that need to be replaced inside the mdb_ret_exp.csv. (using commas so the column can be specified incase that number occurs anywhere else i may not know about ie - different column).
Here's what I tried:
while IFS="," read -r rettag retid; do
sed -i "s/,$rettag,/,$retid,/" mdb_ret_exp.csv
done < $HOME/retimp_info.csv
It almost works, but it adds an extra carriage return on every replacement:
a,s,d,1
,f,g
a,s,d,345,f,g
I expected it to still remain on one line:
a,s,d,1,f,g
a,s,d,345,f,g
How do I avoid the extra carriage return?
This is most likely caused by your retimp_info.csv having DOS/Windows style \r\n line endings. You could remove them from the file while reading:
cat "$HOME/retimp_info.csv" | tr -d '\r' | while IFS="," read -r rettag retid; do
sed -i "s/,$rettag,/,$retid,/" mdb_ret_exp.csv
done
or strip them from the file in advance with dos2unix or by opening the file in a text editor, choosing "Unix line endings" or equivalent option, and then saving it again.
You're barking up the wrong tree. Just do this:
awk '
BEGIN { FS=OFS="," }
NR==FNR { map[$1] = $2; next }
{
for (i=1; i<=NF; i++) {
if ($i in map) {
$i = map[$i]
}
}
print
}
' $HOME/retimp_info.csv mdb_ret_exp.csv
That will solve all of your current problems and the ones you may not have hit yet, but probably will, related to:
doing regexp instead of string comparisons, and
the fact your current approach can't work for the first or last
fields on each line, and
as written your sed loop could replace the replacements after making them
In addition to being far more robust, the awk approach will also be at least an order of magnitude faster than your current approach. See also why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
Oh, and run dos2unix or similar on your input files first as they currently have Windows control-M line endings (use cat -v file to see them).
Update: used the following -
while IFS="," read -r rettag retid; do
sed -i "s/,$rettag,/,$retid,/g" mdb_ret_exp.csv
done < $home/retimp_info.csv
worked fine but now after it replaces the proper value (which resides in the middle of the line/row) it inserts a carriage return - causing the following information to be moved to the next row
ie:
a,s,d,231,f,g
now is -
a,s,d,1
,f,g
Need ,f,g to remain on the same line...

Remove ] from string based on whether it used as index

Trying with sed (in bash script) to do some substring editing
string1=randomthing0]
string2=otherthing[15]}]
string3=reallyotherthing[5]]
The aim is to remove the ]s when it is not used as an index-type as in the second one.
The output should be
string1=randomthing0
string2=otherthing[15]}
string3=reallyotherthing[5]
This works for me:
s/\[\([^]]\+\)\]/#B#\1#E#/g
s/\]//g
s/#B#/[/g
s/#E#/]/g
It first replaces all [...] with #B#...#E#, i.t. the only remaining ]'s are the non-balanced ones. Then, it just removes them and replaces the #-strings back.
Be careful: your input should never contain the #-strings.
if awk is accepted as well, check the awk solution below:
awk 'BEGIN{OFS=FS=""}{ for(i=1;i<=NF;i++){
s+=$i=="["?1:0;
e+=$i=="]"?1:0;
if(e>s){$i="";e--} }
s=e=0; print $0; }' file
Note
the script above is NOT generic enough. it only remove unbalanced "]", which means foo[a[b[c] won't be modified
if there is unbalanced ], they would be deleted, no matter they are at the end of the line or not. so foo[x]bar]blah will be changed into foo[x]barblah
an example explains it better: (I added two more lines in your input)
#in my new lines(1,2) all "]"s surrounded with * should be removed
kent$ cat a.txt
stringx=randomthi[foo]bar*]*xx*]*
stringy=random[f]x*]*bar[b]*]*blah
string1=randomthing0]
string2=otherthing[15]}]
string3=reallyotherthing[5]]
kent$ awk 'BEGIN{OFS=FS=""}{ for(i=1;i<=NF;i++){
s+=$i=="["?1:0;
e+=$i=="]"?1:0;
if(e>s){$i="";e--} }
s=e=0; print $0; }' a.txt
stringx=randomthi[foo]bar**xx**
stringy=random[f]x**bar[b]**blah
string1=randomthing0
string2=otherthing[15]}
string3=reallyotherthing[5]
hope it helps
sed 's/\([^\[0-9]\)\([0-9\]*\)\]/\1\2/'
This removes any ] which is preceded by something not in [ or 0-9 followed by zero or more 0-9 characters.
This might work for you (GNU sed):
sed -r 's/([^][]*(\[[^]]*\][^][]*)*)\]/\1/g' file

Resources