Bash - extract file name and extension from a string - bash

Here is grep command:
grep "%SWFPATH%/plugins/" filename
And its output:
set(hotspot[hs_bg_%2].url,%SWFPATH%/plugins/textfield.swf);
set(hotspot[hs_%2].url,%SWFPATH%/plugins/textfield.swf);
url="%SWFPATH%/plugins/textfield.swf"
url="%SWFPATH%/plugins/scrollarea.swf"
alturl="%SWFPATH%/plugins/scrollarea.js"
url="%SWFPATH%/plugins/textfield.swf"
I'd like to generate a file containing the names of the all files in the 'plugins/' directory, that are mentioned in a certain file.
Basically I need to extract the file name and the extension from every line.
I can manage to delete any duplicates but I can't figure out how to extract the information that I need.
This would be the content of the file that I would like to get:
textfield.swf
scrollarea.swf
strollarea.js
Thanks!!!
PS: The thread "Extract filename and extension in bash (14 answers)" explains how to get filename and extension from a 'variable'. What I'm trying to achieve is extracting these from a 'file', which is completely different'

Using awk:
grep "%SWFPATH%/plugins/" filename | \
awk '{ match($0, /plugins\/([^\/[:space:]]+)\.([[:alnum:]]+)/,submatch);
print "filename:"submatch[1];
print "extension:"submatch[2];
}'
Some explanation:
the match function takes every line processed by awk (indicated by $0) and looks for matches to that regex. Submatches (the parts of the string that match the parts of the regex between parentheses) are saved in the array submatch. print is as straightforward as it looks, it just prints stuff.

For this specific problem
awk '/\/plugins\// {sub(/.*\//, ""); sub(/(\);|")?$/, "");
arr[$0] = $0} END {for (i in arr) print arr[i]}' filename

Use awk to simply extract the filename and then sed to clean up the trailing )"; characters.
awk -F/ '{print $NF}' a | sed -e 's/);//' -e 's/"$//'

Related

How can I prefix the output of each match in grep with some text?

I have a file with a list of phrases
apples
banananas
oranges
I'm running cat file.txt | xargs -I% sh -c "grep -Eio '(an)' >> output.txt"
What I can't figure out, is that I want the output to contain the original line, for example:
bananas,an
oranges,an
How can I prefix the output of grep to also include the value being piped to it?
This should be a task for awk, could you please try following.
awk '/an/{print $0",an"}' Input_file
This will look for string an in all lines of Input_file and append an in them too.
Solution with sed:
sed '/an/s/$/,an/' intput_file
This finds lines that match the pattern /an/, and appends ,an to the end of the pattern space $.
Use awk instead of grep:
$ awk -v s="an" ' # search string
BEGIN {
OFS="," # separating comma
}
match($0,s) { # when there is a match
print $0,substr($0,RSTART,RLENGTH) # output
}' file
Output:
banananas,an
oranges,an

Extract one word after a specific word on the same line but there is no space between them

How can I extract a word that comes after a specific word in bash ? More precisely, I have a file which has a line which looks like this:
IN=../files/d.txt
i want to read "d" from above line.
awk '{for(i=1;i<=NF;i++) if ($i=="files/") print $(i+1)}' inputFile
this code not helping me becouse it is trying to reach word after space. but here we have "d" after "files/" (continue word) and end with "."
You may use sed:
sed -n '/\/files\// s~.*/files/\([^.]*\)\..*~\1~p' file
d
You may also use awk command:
awk -F/ '$(NF-1) == "files"{sub(/\..*/, "", $NF); print $NF}' file
You can use look behind (?<=match-before here) to this.
echo "IN=../files/d.txt" | grep -Po "(?<=files\/)\w+"
Results:
d
Using bash pattern substitution and basename (as your input data looks like a filename):
basename ${IN%.*}

awk: copy from A to B and output..?

my file is bookmarks, backup-6.session
inside file is long long letters, i need copy all url (many) see here example inside
......"charset":"UTF-8","ID":3602197775,"docshellID":0,"originalURI":"https://www.youtube.com/watch?v=axxxxxxxxsxsx","docIdentifier":470,"structuredCloneState":"AAAAA.....
result to output text.txt
https://www.youtube.com/watch?v=axxxxxxxxsxsx
https://www.youtube.com/watch?v=bxxxxxxxxsxsx
https://www.youtube.com/watch?v=cxxxxxxxxsxsx
https://www.youtube.com/watch?v=dxxxxxxxxsxsx
....
....
there are start before than A "originalURI":" to end "
comand to be: AWK, SED.. (i dont know what is best command for me)
thank you
With GNU awk for multi-char RS and RT:
$ awk -v RS='"originalURI":"[^"]+' 'sub(/.*"/,"",RT){print RT}' file
https://www.youtube.com/watch?v=axxxxxxxxsxsx
You could also use grep, for example:
grep -oh "https://www\.youtube\.com/watch?v=[A-Za-z0-9]*" backup-6.session > text.txt
That is if the axxxxxxxxsxsx part contains only letters from A-Z, a-z or digits 0-9, and is not followed by any of those.
Notice the flags for grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
The awk solution would be as follows:
awk -F, '{ for (i=1;i<=NF;i++) { if ( $i ~ "originalURI") { spit($i,add,":");print gensub("\"","","g",add[2])":"gensub("\"","","g",add[3])} } }' filename
We loop through each field separated by "," and then pattern match against "originalURI" Then we split this string using ":" and the function split and remove the quotation marks with the function gensub.
The sed solution would be as follows:
sed -rn 's/^.*originalURI":"(.*)","docIdentifier.*$/\1/p' filename
Run sed with extended regular expression (-r) and suppress the output (-n) Substitute the string with the regular expression enclosed in brackets (/1) printing the result.

How to save the name of the file if it is being treated in the script

I have 88 folders, each of which contains the file "pair.'numbers'." (pair.3472, pair.7829 and so on). I need to treat the files with awk to extract the second column, but I need to save the numbers. If I try:
#!/bin/bash
for i in {1..88}; do
awk '{print $2}' ~/Documents/attempt.$i/pair* > ~/Results/pred.pair*
done
It doesn't save the numbers, but gives only one file: pred.pair*
Thanks for any tips.
You don't need a loop (and see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for why that's a Good Thing):
awk '
FNR==1 { close(out); out=FILENAME; sub(/\/Documents.*\//,"/Results/pred.",out) }
{ print $2 > out }
' ~/Documents/attempt.{1..88}/pair*
#!/bin/bash
for i in {1..88}; do
awk '{fname=FILENAME;sub(".*/", "", fname);print $2 > ("~/Results/pred."fname)}' ~/Documents/attempt.$i/pair*
done
Use AWK build in variable FILENAME. We need to get the basename fname from FILENAME. Then redirect $2 value to "~/Results/pred."fname
There are several ways to do it: awk has a FILENAME variable and you can redirect the output from within your awk script to a manipulated string which is based on FILENAME.
Or you can do it with bash
for i in {1..88}; do
to_be_processed_fname=$(ls ~/Documents/attempt.$i/pair*)
extension="${to_be_processed_fname/*./}"
awk '{print $2}' "${to_be_processed_fname}" > "$HOME/Results/pred.${extension}"
done
Now the above of course fails if you have more than one pair* files within the same directory. But I'm leaving that to you.

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

Resources