bash get dirname from urls.txt

bash get dirname from urls.txt - bash

$ cat urls.txt
/var/www/example.com.com/upload/email/email-inliner.html
/var/www/example.com.com/upload/email/email.html
/var/www/example.com.com/upload/email/email2-inliner.html
/var/www/example.com.com/upload/email/email2.html
/var/www/example.com.com/upload/email/AquaTrainingBag.png
/var/www/example.com.com/upload/email/fitex/fitex-ecr7.jpg
/var/www/example.com.com/upload/email/fitex/fitex-ect7.jpg
/var/www/example.com.com/upload/email/fitex/fitex-ecu7.jpg
/var/www/example.com.com/upload/email/fitex/fitex.html
/var/www/example.com.com/upload/email/fitex/logo.png
/var/www/example.com.com/upload/email/fitex/form.html
/var/www/example.com.com/upload/email/fitex/fitex.txt
/var/www/example.com.com/upload/email/bigsale.html
/var/www/example.com.com/upload/email/logo.png
/var/www/example.com.com/upload/email/bigsale.png
/var/www/example.com.com/upload/email/bigsale-shop.html
/var/www/example.com.com/upload/email/bigsale.txt
Can anyone help me to get dirname for this?
dirname /var/www/example.com.com/upload/email/sss.png works fine, but what about a list of URLs?
Is it possible to achieve this without the use of any form of a loop (for or while). As the number of URLs can be more than several tens of millions. The best way would be with the help of redirection (tee) to a file

As always when it boils down to things like this, Awk comes to the rescue:
awk 'BEGIN{FS=OFS="/"}{NF--}1' <file>
Be aware that this is an extremely simplified version of dirname and does not have the complete identical implementation as dirname, but it will work for most cases. A correct version, which covers all cases is:
awk 'BEGIN{FS=OFS="/"}{gsub("/+","/")}
{s=$0~/^\//;NF-=$NF?1:2;$0=$0?$0:(s?"/":".")};1' <file>
The following table shows the difference:
| path | dirname | awk full | awk short |
|------------+---------+----------+-----------|
| . | . | . | |
| / | / | / | |
| foo | . | . | |
| foo/ | . | . | foo |
| foo/bar | foo | foo | foo |
| foo/bar/ | foo | foo | foo/bar |
| /foo | / | / | |
| /foo/ | / | / | /foo |
| /foo/bar | /foo | /foo | /foo |
| /foo/bar/ | /foo | /foo | /foo/bar |
| /foo///bar | /foo | /foo | /foo// |
note: various alternative solutions can be found in Extracting directory name from an absolute path using sed or awk. The solutions of Kent will all work, the solution of Solid Kim just needs a tiny tweak to fix the multiple slashes (and misses upvotes!)

Related

Moving all files from subfolders to main folders with duplicate file names

I've been trying to write a little script to sort image files in my Linux server.
I tried multiple solution found all over StackExchange but it never meets my requirements.
Explanation:
photo_folder are filled with images (various extensions).
Mostly, images are already in this folder.
But sometime, like the example below, images are hidden in one or multiple photo_subfolder and file names are often the same such as 1.jpg, 2.jpg... in each of them.
Basically, I would like to move all image files from photo_subfolder to their photo_folder and all duplicated filenames to be renamed before merging together.
Example:
|parent_folder
| |photo_folder
| | |photo_subfolder1
| | | 1.jpg
| | | 2.jpg
| | | 3.jpg
| | |photo_subfolder2
| | | 1.jpg
| | | 2.jpg
| | | 3.jpg
| | |photo_subfolder3
| | | 1.jpg
| | | 2.jpg
| | | 3.jpg
Expectation:
|parent_folder
| |photo_folder
| | 1_a.jpg
| | 2_a.jpg
| | 3_a.jpg
| | 1_b.jpg
| | 2_b.jpg
| | 3_b.jpg
| | 1_c.jpg
| | 2_c.jpg
| | 3_c.jpg
Note that files names are just an example. Could be anything.
Thank you!

You can replace the / of the subdirectories with another character, e.g. _ , and then cp/mv the original file to the parent directory.
I try to recreate an example of your directory tree here - very simple, but I hope it can be adapted to your case. Note that I am using bash.
#!/bin/bash
bd=parent
mkdir ${bd}
for i in $(seq 3); do
mkdir -p "${bd}/photoset_${i}/subset_${i}"
for j in $(seq 5); do
touch "${bd}/photoset_${i}/${j}.jpg"
touch "${bd}/photoset_${i}/${j}.png"
touch "${bd}/photoset_${i}/subset_${i}/${j}.jpg"
touch "${bd}/photoset_${i}/subset_${i}/${j}.gif"
done
done
Here is the script that will cp the files from the subdirectories to the parent directory. Basically
find all the files recursively in the subdirectories and loop on them
use sed to replace \ with '_' and store this in a variable new_filepath (I also remove the initial parent_, but this is optional)
copy (or move) the old filepath into parent with filename new_filepath
for xtension in jpg png gif; do
while IFS= read -r -d '' filepath; do
new_filepath=$(echo "${filepath}" | sed s#/#_#g)
cp "${filepath}" "${bd}/${new_filepath}"
done < <(find ${bd} -type f -name "*${xtension}" -print0)
done
ls ${bd}
If you want to remove also the additional parent_ from the new_filepath you can replace the new_filepath above with:
new_filepath=$(echo ${filepath} | sed s#/#_#g | sed s/${bd}_//g)
I assumed that you define all the possible extension in the script. Otherwise to find all the extensions in the directory tree you can use the following snippet from a previous answer
find . -type f -name '*.*' | sed 's|.*\.||' | sort -u

Use AWK with delimiter to print specific columns

My file looks as follows:
+------------------------------------------+---------------+----------------+------------------+------------------+-----------------+
| Message | Status | Adress | Changes | Test | Calibration |
|------------------------------------------+---------------+----------------+------------------+------------------+-----------------|
| Hello World | Active | up | 1 | up | done |
| Hello Everyone Here | Passive | up | 2 | down | none |
| Hi there. My name is Eric. How are you? | Down | up | 3 | inactive | done |
+------------------------------------------+---------------+----------------+------------------+------------------+-----------------+
+----------------------------+---------------+----------------+------------------+------------------+-----------------+
| Message | Status | Adress | Changes | Test | Calibration |
|----------------------------+---------------+----------------+------------------+------------------+-----------------|
| What's up? | Active | up | 1 | up | done |
| Hi. I'm Otilia | Passive | up | 2 | down | none |
| Hi there. This is Marcus | Up | up | 3 | inactive | done |
+----------------------------+---------------+----------------+------------------+------------------+-----------------+
I want to extract a specific column using AWK.
I can use CUT to do it; however when the length of each table varies depending on how many characters are present in each column, I'm not getting the desired output.
cat File.txt | cut -c -44
+------------------------------------------+
| Message |
|------------------------------------------+
| Hello World |
| Hello Everyone Here |
| Hi there. My name is Eric. How are you? |
+------------------------------------------+
+----------------------------+--------------
| Message | Status
|----------------------------+--------------
| What's up? | Active
| Hi. I'm Otilia | Passive
| Hi there. This is Marcus | Up
+----------------------------+--------------
or
cat File.txt | cut -c 44-60
+---------------+
| Status |
+---------------+
| Active |
| Passive |
| Down |
+---------------+
--+--------------
| Adress
--+--------------
| up
| up
| up
--+--------------
I tried using AWK but I don't know how to add 2 different delimiters which would take care of all the lines.
cat File.txt | awk 'BEGIN {FS="|";}{print $2,$3}'
Message Status
------------------------------------------+---------------+----------------+------------------+------------------+-----------------
Hello World Active
Hello Everyone Here Passive
Hi there. My name is Eric. How are you? Down
Message Status
----------------------------+---------------+----------------+------------------+------------------+-----------------
What's up? Active
Hi. I'm Otilia Passive
Hi there. This is Marcus Up
The output I'm looking for:
+------------------------------------------+
| Message |
|------------------------------------------+
| Hello World |
| Hello Everyone Here |
| Hi there. My name is Eric. How are you? |
+------------------------------------------+
+----------------------------+
| Message |
|----------------------------+
| What's up? |
| Hi. I'm Otilia |
| Hi there. This is Marcus |
+----------------------------+
or
+------------------------------------------+---------------+
| Message | Status |
|------------------------------------------+---------------+
| Hello World | Active |
| Hello Everyone Here | Passive |
| Hi there. My name is Eric. How are you? | Down |
+------------------------------------------+---------------+
+----------------------------+---------------+
| Message | Status |
|----------------------------+---------------+
| What's up? | Active |
| Hi. I'm Otilia | Passive |
| Hi there. This is Marcus | Up |
+----------------------------+---------------+
or random other columns
+------------------------------------------+----------------+------------------+
| Message | Adress | Test |
|------------------------------------------+----------------+------------------+
| Hello World | up | up |
| Hello Everyone Here | up | down |
| Hi there. My name is Eric. How are you? | up | inactive |
+------------------------------------------+----------------+------------------+
+----------------------------+---------------+------------------+
| Message |Adress | Test |
|----------------------------+---------------+------------------+
| What's up? |up | up |
| Hi. I'm Otilia |up | down |
| Hi there. This is Marcus |up | inactive |
+----------------------------+---------------+------------------+
Thanks in advance.

One idea using GNU awk:
awk -v fldlist="2,3" '
BEGIN { fldcnt=split(fldlist,fields,",") } # split fldlist into array fields[]
{ split($0,arr,/[|+]/,seps) # split current line on dual delimiters "|" and "+"
for (i=1;i<=fldcnt;i++) # loop through our array of fields (fldlist)
printf "%s%s", seps[fields[i]-1], arr[fields[i]] # print leading separator/delimiter and field
printf "%s\n", seps[fields[fldcnt]] # print trailing separator/delimiter and terminate line
}
' File.txt
NOTES:
requires GNU awk for the 4th argument to the split() function (seps == array of separators; see gawk string functions for details)
assumes our field delimiters (|, +) do not show up as part of the data
the input variable fldlist is a comma-delimited list of columns that mimics what would be passed to cut (eg, when a line starts with a delimiter then field #1 is blank)
For fldlist="2,3" this generates:
+------------------------------------------+---------------+
| Message | Status |
|------------------------------------------+---------------+
| Hello World | Active |
| Hello Everyone Here | Passive |
| Hi there. My name is Eric. How are you? | Down |
+------------------------------------------+---------------+
+----------------------------+---------------+
| Message | Status |
|----------------------------+---------------+
| What's up? | Active |
| Hi. I'm Otilia | Passive |
| Hi there. This is Marcus | Up |
+----------------------------+---------------+
For fldlist="2,4,6" this generates:
+------------------------------------------+----------------+------------------+
| Message | Adress | Test |
|------------------------------------------+----------------+------------------+
| Hello World | up | up |
| Hello Everyone Here | up | down |
| Hi there. My name is Eric. How are you? | up | inactive |
+------------------------------------------+----------------+------------------+
+----------------------------+----------------+------------------+
| Message | Adress | Test |
|----------------------------+----------------+------------------+
| What's up? | up | up |
| Hi. I'm Otilia | up | down |
| Hi there. This is Marcus | up | inactive |
+----------------------------+----------------+------------------+
For fldlist="4,3,2" this generates:
+----------------+---------------+------------------------------------------+
| Adress | Status | Message |
+----------------+---------------|------------------------------------------+
| up | Active | Hello World |
| up | Passive | Hello Everyone Here |
| up | Down | Hi there. My name is Eric. How are you? |
+----------------+---------------+------------------------------------------+
+----------------+---------------+----------------------------+
| Adress | Status | Message |
+----------------+---------------|----------------------------+
| up | Active | What's up? |
| up | Passive | Hi. I'm Otilia |
| up | Up | Hi there. This is Marcus |
+----------------+---------------+----------------------------+
Say that again? (fldlist="3,3,3"):
+---------------+---------------+---------------+
| Status | Status | Status |
+---------------+---------------+---------------+
| Active | Active | Active |
| Passive | Passive | Passive |
| Down | Down | Down |
+---------------+---------------+---------------+
+---------------+---------------+---------------+
| Status | Status | Status |
+---------------+---------------+---------------+
| Active | Active | Active |
| Passive | Passive | Passive |
| Up | Up | Up |
+---------------+---------------+---------------+
And if you make the mistake of trying to print the '1st' column, ie, fldlist="1":
+
|
|
|
|
|
+
+
|
|
|
|
|
+

If GNU awk is available, please try markp-fuso's nice solution.
If not, here is a posix-compliant alternative:
#!/bin/bash
# define bash variables
cols=(2 3 6) # bash array of desired columns
col_list=$(IFS=,; echo "${cols[*]}") # create a csv string
awk -v cols="$col_list" '
NR==FNR {
if (match($0, /^[|+]/)) { # the record contains a table
if (match($0, /^[|+]-/)) # horizontally ruled line
n = split($0, a, /[|+]/) # split into columns
else # "cell" line
n = split($0, a, /\|/)
len = 0
for (i = 1; i < n; i++) {
len += length(a[i]) + 1 # accumulated column position
pos[FNR, i] = len
}
}
next
}
{
n = split(cols, a, /,/) # split the variable `cols` on comma into an array
for (i = 1; i <= n; i++) {
col = a[i]
if (pos[FNR, col] && pos[FNR, col+1]) {
printf("%s", substr($0, pos[FNR, col], pos[FNR, col + 1] - pos[FNR, col]))
}
}
print(substr($0, pos[FNR, col + 1], 1))
}
' file.txt file.txt
Result with cols=(2 3 6) as shown above:
+---------------+----------------+-----------------+
| Status | Adress | Calibration |
+---------------+----------------+-----------------|
| Active | up | done |
| Passive | up | none |
| Down | up | done |
+---------------+----------------+-----------------+
+---------------+----------------+-----------------+
| Status | Adress | Calibration |
+---------------+----------------+-----------------|
| Active | up | done |
| Passive | up | none |
| Up | up | done |
+---------------+----------------+-----------------+
It detects the column width in the 1st pass then splits the line on the column position in the 2nd pass.
You can control the columns to print with the bash array cols which is assigned at the beginning of the script. Please assign the array to the list of desired column numbers in increasing order. If you want to use the bash variable in different way, please let me know.

Fetch particular column value from rows with specified condition using shell script

I have a sample output from a command
+--------------------------------------+------------------+---------------------+-------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+-------------------------------------+
| 04584e8a-c210-430b-8028-79dbf741797c | | 99.99.99.91 | |
| 12d2257c-c02b-4295-b910-2069f583bee5 | 20.0.0.92 | 99.99.99.92 | 37ebfa4c-c0f9-459a-a63b-fb2e84ab7f92 |
| 98c5a929-e125-411d-8a18-89877d3c932b | | 99.99.99.93 | |
| f55e54fb-e50a-4800-9a6e-1d75004a2541 | 20.0.0.94 | 99.99.99.94 | fe996e76-ffdb-4687-91a0-9b4df2631b4e |
+--------------------------------------+------------------+---------------------+-------------------------------------+
Now I want to fetch all the "floating _ip_address" for which "port_id" & "fixed_ip_address" fields are blank/empty (In above sample 99.99.99.91 & 99.99.99.93)
How can I do it with shell scripting?

You can use sed:
fl_ips=($(sed -nE 's/\|.*\|.*\|(.*)\|\s*\|/\1/p' inputfile))
Here inputfile is the table provided in the question. The array fl_ips contains the output of sed:
>echo ${#fl_ips[#]}
2 # Array has two elements
>echo ${fl_ips[0]}
99.99.99.91
>echo ${fl_ips[1]}
99.99.99.93

Find references to files, recursively

In a project where XML/JS/Java files can contain references to other such files, I'd like to be able to have a quick overview of what has to be carefully checked, when one file has been updated.
So, it means I need to eventually have a look at all files referencing the modified one, and all files referencing files which refer to the modified one, etc. (recursively on matched files).
For one level, it's quite simple:
grep -E -l -o --include=*.{xml,js,java} -r "$FILE" . | xargs -n 1 basename
But how can I automate that to match (grand-(grand-))parents?
And how can that be, maybe, made more readable? For example, with a tree structure?
For example, if the file that interests me is called modified.js...
show-referring-files-to modified.js
... I could wish such an output:
some-file-with-ref-to-modified.xml
|__ a-file-referring-to-some-file-with-ref-to-modified.js
another-one-with-ref-to-modified.xml
|__ a-file-referring-to-another-one-with-ref-to-modified.js
|__ a-grand-parent-file-having-ref-to-ref-file.xml
|__ another-file-referring-to-another-one-with-ref-to-modified.js
or any other output (even flat) which allows for quickly checking which files are potentially impacted by a change.
UPDATE -- Results of current proposed answer:
ahmsff.js
|__ahmsff.xml
| |__ahmsd.js
| | |__ahmsd.xml
| | | |__ahmst.xml
| | | | |__BESH.java
| |__ahru.js
| | |__ahru.xml
| | | |__ahrut.xml
| | | | |__ashrba.js
| | | | | |__ashrba.xml
| | | | | | |__STR.java
| | |__ahrufrp.xml
| | | |__ahru.js
| | | | |__ahru.xml
| | | | | |__ahrut.xml
| | | | | | |__ashrba.js
| | | | | | | |__ashrba.xml
| | | | | | | | |__STR.java
| | | | |__ahrufrp.xml
| | | | | |__ahru.js
| | | | | | |__ahru.xml
| | | | | | | |__ahrut.xml
| | | | | | | | |__ashrba.js
| | | | | | | | | |__ashrba.xml
| | | | | | | | | | |__STR.java
| | | | | | |__ahrufrp.xml
(...)

I'd use a shell function (for the recursion) inside an shell script:
Assuming the filenames are unique have no characters that need escaping in them:
File: /usr/local/bin/show-referring-files-to
#!/bin/sh
get_references() {
grep -F -l --include=*.{xml,js,java} -r "$1" . | grep -v "$3" | while read -r subfile; do
#read each line of the grep result into the variable subfile
subfile="$(basename "$subfile")"
echo "$2""$subfile"
get_references "$subfile" ' '"$2" "$3"'\|'"$subfile"
done
}
while test $# -gt 0; do
#loop so more than one file can be given as argument to this script
echo "$1"
get_references "$1" '|__' "$1"
shift
done
There still are lots of performance enhancements possible.
Edit: Added $3 to prevent infinite-loop.

shell script to extract the name and IP address

Is there a way to use shell script to get only the name and net from the result as below:
Result
6cb7f14e-6466-4211-9a09-2b8e7ad92703 | name-erkoev4ja3rv | 2e3900ff36574cf9937d88223403da77 | ACTIVE | Running | net0=10.1.1.2; ing-net=10.1.1.3; net=10.1.1.4;
Expected Result
name-erkoev4ja3rv: 10.1.1.4

$ input="6cb7f14e-6466-4211-9a09-2b8e7ad92703 | name-erkoev4ja3rv | 2e3900ff36574cf9937d88223403da77 | ACTIVE | Running | net0=10.1.1.2; ing-net=10.1.1.3; net=10.1.1.4;"
$ echo "$input" | sed -E 's,^[^|]+ \| ([^ ]+).* net=([0-9.]+).*$,\1: \2,g'
name-erkoev4ja3rv: 10.1.1.4

echo "6cb7f14e-6466-4211-9a09-2b8e7ad92703 | name-erkoev4ja3rv | 2e3900ff36574cf9937d88223403da77 | ACTIVE | Running | net0=10.1.1.2; ing-net=10.1.1.3; net=10.1.1.4;" | awk -F ' ' '{print $3}{print $13}'
Does this satisfy your case?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash get dirname from urls.txt - bash

Related

Moving all files from subfolders to main folders with duplicate file names

Use AWK with delimiter to print specific columns

Fetch particular column value from rows with specified condition using shell script

Find references to files, recursively

shell script to extract the name and IP address

Categories

Resources