Extract numbers from filenames disregarding extensions - bash

I'm making a script to rename some video files. Some are named XXX blah blah.ext and some are XXX - XXX blah blah.ext where "X" are digits. Furthermore, some files are .avi and some are mp4. What I'd like is to extract the numbers from these files, separated by a space if there is more than one, and to disregard the "4" in ".mp4" files.
My current implementation is egrep -o "[[:digit:]]*", and while this does separate numbers into different outputs, it also considers ".mp4".
Using sed I've not only not been able to produce different outputs for every number, but it also includes the "4". Note: I'm very new to sed i.e. I began learning it for the purpose of writing this script.
How can I do this?

for file in *
do
echo $file | sed 's/\..*$//' | egrep -o "[[:digit:]]*"
done

You should find this to be pretty robust:
sed 's/^[^[:digit:]]*\([[:digit:]]\+\)[^[:digit:]]\+\( [[:digit:]]\+\)\?[^[:digit:]]\+[[:digit:]]\?$/\1\2/'
If your sed supports -r, you can eliminate the backslashes which are used for escaping:
sed -r 's/^[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+( [[:digit:]]+)?[^[:digit:]]+[[:digit:]]?$/\1\2/'
Demo:
$ echo '123 blah blah.avi
234 blah blah.mp4
345 - 678 blah blah.avi
901 - 234 blah blah.mp4' |
sed -r 's/^[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+( [[:digit:]]+)?[^[:digit:]]+[[:digit:]]?$/\1\2/'
123
234
345 678
901 234
This depends on there being a space in the filename before the second number (when there is one). If there are files that don't have that, then a simple modification can make it work.

This might work for you:
# echo '123 bla bla.avi
456 - 789 bla bla.avi
012bla bla.avi
345-678blabla.avi
901 bla bla.mp4
234 - 567 bla bla.mp4
890bla bla.mp4
123 - 456 - 789 bla bla.mp4' |
sed 's/[^0-9]*[0-9]$//;s/[^0-9]\+/ /g'
123
456 789
012
345 678
901
234 567
890
123 456 789

Related

how to combine more than two file into one new file with specific name using bash

I have many file
list file name:
p004c01.txt
p004c05.txt
p006c01.txt
p006c02.txt
p007c01.txt
p007c03.txt
p007c04.txt
...
$cat p004c01.txt
#header
122.5 -0.256 547
123.6 NaN 325
$cat p004c05.txt
#header
122.1 2.054 247
122.2 -1.112 105
$cat p006c01.txt
#header
99 -0.200 333
121.4 -1.206 243
$cat p006c02.txt
#header
122.5 2.200 987
99 -1.335 556
I want the file be like this
file1
$cat p004.txt
122 -0.256 547
122 2.054 247
122 -1.112 105
file2
$cat p006.txt
122.5 2.200 987
121.4 -1.206 243
99 -1.335 556
99 -0.200 333
And the other file too
File that contain the same value (?) in
p????cxx.txt
is in the same new file
I tried one by one file like this
cat p004* | sed '/#/d'| sort -k 1n | sed '/NaN/d' |awk '{print substr($1,2,3),$2,$3,$4,$5}' > p004.txt
Anyone can help me with the simple script for all the data?
Thank you :)
Perhaps this will work for you:
for f in {001..999}; do tail -n +2 p"$f"c* > p"$f".txt; done 2>/dev/null

A UNIX Command to Find the Name of the Student who has the Second Highest Score

I am new to Unix Programming. Could you please help me to solve the question.
For example, If the input file has the below content
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
The output will be
ABC
I tried something like this
sort -k3,3 -rn -t" " | head -n2 | awk '{print $2}'
Using awk
awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}'
Demo:
$cat file.txt
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
$awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}' file.txt
ABC
$
Explanation:
NR>1 --> Skip first record
{arr[$3]=$2} --> Create associtive array with marks as index and name as value
END <-- read till end of file
n=asorti(arr,arr_sorted) <-- Sort array arr on index value(i.e marks) and save in arr_sorted. n= number of element in array
print arr[arr_sorted[n-1]]} <-- n-1 will point to second last value in arr_sorted (i.e marks) and print corresponding value from arr
Your attempt is 90% correct just a single change
Try this...it will work.
sort -k3,3 -rn -t" " | head -n1 | awk '{print $2}'
Instead of using head -n2 replace it with head -n1

Search phrases and terms (stored in a file) in a text article

I have two text files, one containing keywords and phrases (file1.txt) and a paragraph-based text file (file2.txt). I'm trying to find the keywords/phrases in file1.txt that appeared on file2.txt
Here's a sample data:
File 1 (file1.txt):
123 111 1111
ABC 000
A 999
B 000
C 111
Thank you
File 2 (file2.txt)
Hello!
The following order was completed: ABC 000
Item 1 (A 999)
Item 2 (X 412)
Item 3 (8 357)
We will call: 123 111 1111 if we encounter any issues
Thank you very much!
Desired output:
123 111 1111
ABC 000
A 999
Thank you
I've tried the grep command:
grep -Fxf file1.txt file2.txt > output.txt
And I'm getting a blank output.txt
What suggestions do you have to get the right output?
try
grep -o -f file1.txt <file2.txt
-o < print only matching pattern
-f < search for this string line by line
< < Standard input
Demo :
$cat file1.txt
123 111 1111
ABC 000
A 999
B 000
C 111
Thank you
$cat file2.txt
Hello!
The following order was completed: ABC 000
Item 1 (A 999)
Item 2 (X 412)
Item 3 (8 357)
We will call: 123 111 1111 if we encounter any issues
Thank you very much!
$grep -o -f file1.txt <file2.txt
ABC 000
A 999
123 111 1111
Thank you
$

reading a file into an array in bash

Here is my code
#!bin/bash
IFS=$'\r\n'
GLOBIGNORE='*'
command eval
'array=($(<'$1'))'
sorted=($(sort <<<"${array[*]}"))
for ((i = -1; i <= ${array[-25]}; i--)); do
echo "${array[i]}" | awk -F "/| " '{print $2}'
done
I keep getting an error that says "line 5: array=($(<)): command not found"
This is my problem.
As a whole my code should read in a file as a command line argument, sort the elements, then print out column 2 of the last 25 lines. I haven't been able to test this far so if there's a problem there too any help would be appreciated.
This is some of what the file contains:
290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
16227 nicole
15308 daniel
15163 babygirl
14726 monkey
14331 lovely
14103 jessica
13984 654321
13981 michael
13488 ashley
13456 qwerty
13272 111111
13134 iloveu
13028 000000
12714 michelle
11761 tigger
11489 sunshine
11289 chocolate
11112 password1
10836 soccer
10755 anthony
10731 friends
10560 butterfly
10547 purple
10508 angel
10167 jordan
9764 liverpool
9708 justin
9704 loveme
9610 fuckyou
9516 123123
9462 football
9310 secret
9153 andrea
9053 carlos
8976 jennifer
8960 joshua
8756 bubbles
8676 1234567890
8667 superman
8631 hannah
8537 amanda
8499 loveyou
8462 pretty
8404 basketball
8360 andrew
8310 angels
8285 tweety
8269 flower
8025 playboy
7901 hello
7866 elizabeth
7792 hottie
7766 tinkerbell
7735 charlie
7717 samantha
7654 barbie
7645 chelsea
7564 lovers
7536 teamo
7518 jasmine
7500 brandon
7419 666666
7333 shadow
7301 melissa
7241 eminem
7222 matthew
In Linux you can simply do a
sort -nbr file_to_sort | head -n 25 | awk '{print $2}'
read in a file as a command line argument, sort the elements, then
print out column 2 of the last 25 lines.
From that discription of the problem, I suggest:
#! /bin/sh
sort -bn $1 | tail -25 | awk '{print $2}'
As a rule, use the shell to operate on filenames, and never use the
shell to operate on data. Utilities like sort and awk are far
faster and more powerful than the shell when it comes to processing a
file.

How can I compare two 2D-array files with bash?

I have two 2D-array files to read with bash.
What I want to do is extract the elements inside both files.
These two files contain different rows x columns such as:
file1.txt (nx7)
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
.
.
.
file2.txt (mx3)
DESC W S
AAA 100 100
CCC 135 135
EEE 789 789
.
.
.
Here is what I want to do:
Extract the element in DESC column of file2.txt then find the corresponding element in file1.txt.
Extract the W,S elements in such row of file2.txt then find the corresponding W,S elements in such row of file1.txt.
If [W1==W2 && S1==S2]; then echo "${DESC[colindex]} ok"; else echo "${DESC[colindex]} NG"
How can I read this kind of file as a 2D array with bash or is there any convenient way to do that?
bash does not support 2D arrays. You can simulate them by generating 1D array variables like array1, array2, and so on.
Assuming DESC is a key (i.e. has no duplicate values) and does not contain any spaces:
#!/bin/bash
# read data from file1
idx=0
while read -a data$idx; do
let idx++
done <file1.txt
# process data from file2
while read desc w2 s2; do
for ((i=0; i<idx; i++)); do
v="data$i[1]"
[ "$desc" = "${!v}" ] && {
w1="data$i[4]"
s1="data$i[5]"
if [ "$w2" = "${!w1}" -a "$s2" = "${!s1}" ]; then
echo "$desc ok"
else
echo "$desc NG"
fi
break
}
done
done <file2.txt
For brevity, optimizations such as taking advantage of sort order are left out.
If the files actually contain the header NO DESC ID TYPE ... then use tail -n +2 to discard it before processing.
A more elegant solution is also possible, which avoids reading the entire file in memory. This should only be relevant for really large files though.
If the rows order is not needed be preserved (can be sorted), maybe this is enough:
join -2 2 -o 1.1,1.2,1.3,2.5,2.6 <(tail -n +2 file2.txt|sort) <(tail -n +2 file1.txt|sort) |\
sed 's/^\([^ ]*\) \([^ ]*\) \([^ ]*\) \2 \3/\1 OK/' |\
sed '/ OK$/!s/\([^ ]*\) .*/\1 NG/'
For file1.txt
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
and file2.txt
DESC W S
AAA 000 100
CCC 135 135
EEE 789 000
FCK xxx 135
produces:
AAA NG
CCC OK
EEE NG
Explanation:
skip the header line in both files - tail +2
sort both files
join the needed columns from both files into one table like, in the result will appears only the lines what has common DESC field
like next:
AAA 000 100 100 100
CCC 135 135 135 135
EEE 789 000 789 789
in the lines, which have the same values in 2-4 and 3-5 columns, substitute every but 1st column with OK
in the remainder lines substitute the columns with NG

Resources