How to extract links from a text file? [duplicate] - bash

This question already has answers here:
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 3 months ago.
Suppose there is a text file test.txt. It contains text and links to resources such as https://example.com/kqodbjcuic49w95rofwjue. How can I extract only the list of these links from there? (preferably via bash, but not required)
I tried this solution:
sed 's/^.*href="\([^"]*\).*$/\1/'
But it didn't help me.

grep -o "((?:(?:http|ftp|ws)s?|sftp):\/\/?)?([^:/\s.#?]+\.[^:/\s#?]+|localhost)(:\d+)?((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?([^#]+)?(#[\w-]*)?" test.txt
will display all URLs inside the file.
(The regex comes from BSimjoo's link)
Grep text files guide at https://www.linode.com/docs/guides/how-to-grep-for-text-in-files/

Related

bash script to create folder names out of a portion of a filename [duplicate]

This question already has answers here:
Remove a fixed prefix/suffix from a string in Bash
(9 answers)
Closed last month.
I'm new to bash, and coding. I have a list of files:
test-T01___2022.txt
test-T01__2021.txt
test-T01_NONE.txt
test-T02___2022.txt
test-T02__2021.txt
test-T02_NONE.txt
test-T03___2022.txt
test-T03__2021.txt
test-T03_NONE.txt
I'm trying to write a script to create folders T01 (containing *T01 files), T02 (containing all files with T02), etc. I'm trying with wildcards and regexps and something similar to this post but having some trouble. I appreciate some help.
Many thanks!
Use the bash prefix and suffix removal operations. See the link in the comments for more details.
For example:
files=...
for file in $files
do
a=${file#test-}
dir=${a%%_*}
mkdir "$dir"
mv "$file" "$dir"
done

grep names in a small file matching a large file [duplicate]

This question already has answers here:
Are shell scripts sensitive to encoding and line endings?
(14 answers)
grep not showing result which read id from file
(2 answers)
Closed 12 months ago.
My small file contains this information line by line:
abc.123
abc.258
abc.952
I wanted to get those lines matching in my bigger file (~30Gb). I tried this command but it didn't give me any result.
grep -f small.txt big.txt
I have tested all abc.123, abc.258 and abc.952 does exist in my bigger file, meaning that I tried to grep each of these names one by one it gave me the exact result I want.
grep "abc.123" big.txt
I have no idea where I could possibly go wrong?

Bash: how to rename a file to a string containing forward slashes? [duplicate]

This question already has answers here:
Is it possible to use "/" in a filename?
(8 answers)
Closed 1 year ago.
I have a file that I want to rename to a date like "20/02/21", but if I do mv file.txt 20/02/21 it interprets the forward slashes as referencing sub-folders. Is there a way to do this?
No, there's no way to do it. On Unix forward slash / is used to
separate directories and cannot be used in the filename. You have to
use another delimiter - 20\02\21, 20-02-21, 20.02.21 etc.

Replace a whole line using sed [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 4 years ago.
I am very new to this all and have used this website to help me find the answers i'm looking for.
I want to replace a line in multiple files across multiple directories. However I have struggled to do this.
I have created multiple directories 'path_{0..30}', each directory has the same 'input' file, and another file 'opt_path_rx_00i.xyz' where i corresponds to the directory that the file is in (i = {0..30}).
I need to be able to change one of the lines (line 7) in the input file, so that it changes with the directory that the input file is in (path_{0..30}). The line is:
pathfile opt_path_rx_00i.xyz
Where i corresponds to the directory that the file is in (i={0..30})
However, i'm struggling to do this using sed. I manage to change the line for each input file in the respective directories, but i'm unable to ensure that the number i changes with the directory. Instead, the input file in each directory just changes line 7 to:
pathfile opt_path_rx_00i.xyz
where i, in this case, is the letter i, and not the numbers {0..30}.
I'll show what i've done below in order to make more sense.
for i in {0..30}
do
sed -i '7s/.*/pathfile-opt_path_rx_00$i.xyz/' path_$i/input
done
What I want to happen is, for example in directory path_3, line 7 in the input file will be:
pathfile opt_path_rx_003.xyz
Any help would be much appreciated
can you try with double quotes
for i in {0..30}; do
sed -i "7s/.*/pathfile-opt_path_rx_00$i.xyz/" "path_$i/input"
done

How to replace quotes inside a quoted field of a non-standard CSV file using a one-liner bash command? [duplicate]

This question already has answers here:
What's the most robust way to efficiently parse CSV using awk?
(6 answers)
Closed 4 years ago.
This post was edited and submitted for review 11 months ago and failed to reopen the post:
Original close reason(s) were not resolved
I have a file like this:
col1×col2×col3
12×"Some field with "quotes" inside it"×"Some field without quotes inside but with new lines \n"
And I would like to replace the interior double quotes with single quotes so the result will look like this:
col1×col2×col3
12×"Some field with 'quotes' inside it"×"Some field without quotes inside but with new lines \n"
I guess this can be done with sed, awk or ex but I haven't been able to figure out a clean and quick way of doing it. Real CSV files are of the order of millions of lines.
The preferred solution would be a one-liner using the aforementioned programs.
A simple workaround using sed, based on your fields separator ×, could be:
sed -E "s/([^×])\"([^×])/\1'\2/g" file
This replace each " which is preceded and followed by any characters other that ×, with '.
Note that sed not support positive lookahead, so we have to group and reinsert the patterns.

Resources