how can i make grep extract string by length in bash? - sorting

I want to be able to extract a word thats a specific length in a wordlist and extract it to another list.
If i run:
grep -oP '\b(\w{7})\b' infile >> outfile
eveything run fine and the words get extracted but when i run it in a bash script, nothing gets outputted. if i put double quotes as it was causing the length number from being read, i still get syntax error.so the script looks like this:
read -p "what is in:" in
read -p "what is out:" out
read -p "what is char num:" char
grep -oP '\b(\w{$char})\b' $in >> $out
What am i missing?

You need to have double quotes so that bash expands variables in strings
Try
grep -oP "\b(\w{$char})\b"
Seeing it in action:
Input file
$ cat file1
dockerkill container
docker anothercontainer
dockerput contain
Script:
$ cat script.sh
char="6"
grep -oP "\b(\w{$char})\b" file1
Output
$ ./script.sh
docker

Related

How to add string to cat result in bash?

File info has some certain info on a line starting with myline. Im trying to pass it to a script like this:
bash myscript `cat info | grep myline`
This works well. Script gets "myline" as first argument. But now i want to add a "w" at the end of that. I tried
bash myscript `cat info | grep myline`w
This is already problematic, the script gets "wyline" as first argument.
And now the next step is that i actually want to have an if statement whether i want to add w or not. Tried this:
bash myscript `cat info | grep myline``[ "condition" == "condition"] && echo "w"`
This works the same way. Script gets "wyline" as first argument.
So I have two questions:
1) How to fix the "wyline" result to get desired "mylinew"
2) Is there a better way to write this if statement after cat?
Do not use backticks `, use $(...) instead. bash hackers obsolete deprecated syntax
cat file | grep is a useless use of cat useless use of cat award. Just grep file.
Just quote the result and add w:
myscript "$(grep myline info)w"
You can add a trailing w to the last line of input with sed:
myscript "$(grep myline info | sed '$s/$/w/')"
I would advise to always quote your variable expansions.
Script gets "wyline" as first argument.
Your input file has dos line endings. Inspect output with cut -v or hexdump -C or xxd. Use dos2unix and remove carriage return characters.

print lines where the third character is a digit

for example our bash script's name is masodik and there is a text.txt with these lines:
qwer
qw2qw
12345
qwert432
Then I write ./masodik text.txt and i got
qw2qw
12345
I tried it many ways and I dont know why this is not working
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
$ grep -E '^.{2}[0-9]' text.txt
qw2qw
12345
, and in script it could be something like:
#!/bin/sh
grep -E '^.{2}[0-9]' "$1"
To print lines whose third character is a digit:
grep ^..[0-9] text.txt
^ matches the start of the line. The dot . matches any character. [0-9] matches any digit.
You can do it with awk quite easily as well:
awk '/^..[0-9]/' file
Result
With your input in file:
$ awk '/^..[0-9]/' file
qw2qw
12345
(sed works as well, sed -n '/^..[0-9]/p' file)
The problem with the code here:
#!/bin/bash
for i in read u ; do
echo $i $u | grep '^[a-zA-Z0-9][a-zA-Z0-9][0-9]'
done
...is that the for syntax is wrong:
read u is treated as a word list. So the $u variable is never set, so $u stays empty.
The for loop will run twice -- the 1st time $i will be set to the string "read", the 2nd time $i will be set to the string "u". Since neither string contains a number, the grep returns nothing.
The code never reads text.txt.
See Sasha Khapyorsky's answer for actual working code.
If for some odd reason all external utils, (grep, awk, etc.), are forbidden, this pure POSIX code would work:
#!/bin/sh
while read u ; do
case "$u" in
[a-zA-Z0-9][a-zA-Z0-9][0-9]*) echo "$u" ;;
esac
done
If perl is installed into the system then shell script will look like
#!/bin/bash
perl -e 'print if /^.{2}\d/' text.txt

Extract specific string from line with standard grep,egrep or awk

i'm trying to extract a specific string from a grep output
uci show minidlna
produces a large list
.
.
.
minidlna.config.enabled='1'
minidlna.config.db_dir='/mnt/sda1/usb/db'
minidlna.config.enable_tivo='1'
minidlna.config.wide_links='1'
.
.
.
so i tried to narrow down what i wanted by running
uci show minidlna | grep -oE '\bdb_dir=\S+'
this narrows the output to
db_dir='/mnt/sda1/usb/db'
what i want is to output only
/mnt/sda1/usb/db
without the quotes and without the starting "db_dir" so i can run rm /mnt/sda1/usb/db/file.db
i've used the answers found here
How to extract string following a pattern with grep, regex or perl
and that's as close as i got.
EDIT: after using Ed Morton's awk command i needed to pass the output to rm command.
i used:
| ( read DB; (rm $DB/files.db) .
read DB passes the output into the vairable DB.
(...) combines commands.
rm $DB/files.db deletes the the file files.db.
Is this what you're trying to do?
$ awk -F"'" '/db_dir/{print $2}' file
/mnt/sda1/usb/db
That will work in any awk in any shell on every UNIX box.
If that's not what you want then edit your question to clarify your requirements and post more truly representative sample input/output.
Using sed with some effort to avoid single quotes:
sed -n 's/^minidlna.config.db_dir=\s*\S\(\S*\)\S\s*$/\1/p' input
Well, so you end up having a string like db_dir='/mnt/sda1/usb/db'.
I would first remove the quotes by piping this to
.... | tr -d "'"
Now you end up with a string like db_dir=/mnt/sda1/usb/db.
Say you have this string stored in a variable named confstr, then
${confstr##*=}
gives you just /mnt/sda1/usb/db, since *= denotes everything from the start to the equal sign, and ## denotes removal.
I would do this:
Once you either extracted your line about into file.txt (or pipe it into this command), split the fields using the quote character. Use printf to generate the rm command and pass this into bash to execute.
$ awk -F"'" '{printf "rm %s.db/file.db\n", $2}' file.txt | bash
rm: /mnt/sda1/usb/db.db/file.db: No such file or directory
With your original command:
$ uci show minidlna | grep -oE '\bdb_dir=\S+' | \
awk -F"'" '{printf "rm %s.db/file.db\n", $2}' | bash

How to decode \u003d escape in bash?

I have some strings like:
dimension\u003d1920x1024:format\u003djpg
In a file. I want to decode them so they will look like:
dimension=1920x1024:format=jpg
I know that:
$ echo -e dimension\u003d1920x1024:format\u003djpg
dimensionu003d1920x1024:formatu003djpg
$ echo -e 'dimension\u003d1920x1024:format\u003djpg'
dimension=1920x1024:format=jpg
$ echo -e "dimension\u003d1920x1024:format\u003djpg"
dimension=1920x1024:format=jpg
So I tried this to get what I want:
$ cat file | xargs -L1 echo -e
dimensionu003d1920x1024:formatu003djpg
But as you can see it doesn't work. How can I get this to work? How can I make xargs pass parameters to echo as if they were quoted?
You are actually asking how to convert the sequence \uXXXX into the corresponding Unicode code point. That's quite different from other backslash escapes, or handling backslashes in general. Neither echo -e nor xargs is particularly suited for this task.
Here is one way:
perl -CSD -pe 's/\\u(\X{4})/chr(oct("0x$1"))/ge' <<<"string"
Obscurely, oct("0xff") actually performs hex decoding, because of the "0x" prefix.
Obviously, if your input is the text in a file rather than just a string in the shell, simply pass that as the argument to Perl.
For small files:
Bash:
cat file | echo -e "$(cat -)"
Zsh:
cat file | { echo -e "$(cat -)"; }
For large files in both bash and zsh:
cat file | while read -r LINE; do echo -e "$LINE"; done
(loses spaces at the beginning of the line)
This is a try with ruby where the changes are written to the file
$ cat ./file
dimension\u003d1920x1024:format\u003djpg
dimension=800x600:format\u003djpg
The example above is made a bit more real-world.
$ cat ./script.rb
#!/usr/bin/ruby
contents=File.read("#{ARGV[0]}")
file=File.open("#{ARGV[0]}","w")
if file
file.syswrite(contents.gsub(/\\[uU]\{?([0-9A-F]{4})\}?/i) { $1.hex.chr(Encoding::UTF_8) })
file.close()
else
puts "No file with name #{ARGV[0]} present, Usage script <filename>"
end
$ ./script file
# The changes are written to the file with nothing printed to stdout
$ cat ./file
dimension=1920x1024:format=jpg
dimension=800x600:format=jpg

Use sed substitution from different files

Okay, I am a newbie to Unix scripting. I was given the task to find a temporary work around for this:
cat /directory/filename1.xml |sed -e "s/ABCXYZ/${c}/g" > /directory/filename2.xml
$c is a variable from a sqlplus count query. I totally understand how this sed command is working. But here is where I am stuck. I am storing the count associated with the variable in another file called filename3 as count[$c] where $c is replaced with a number. So my question is how can I update this sed command to substitute ABCXYZ with the count from file3?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
UPDATE: In case anyone has a similar issue I got mine to work using:
rm /directory/folder/variablefilename.dat
echo $c >> /directory/folder/variablefilename.dat
d=$(grep [0-9] /directory/folder/variablefilename.dat)
sed -3 "s/ABC123/${d}/g" /directory/folder/inputfile.xml >> /directory/folder/outputfile.xml
thank you to Kaz for pointing me in the right direction
Store the count in filename3 using the syntax c=number. Then you can source the file as a shell script:
. /filename3 # get c variable
sed -e "s/ABCXYZ/${c}/g" /directory/filename1.xml > /directory/filename2.xml
If you can't change the format of filename3, you can write a shell function which scrapes the number out of that file and sets the c variable. Or you can scrape the number out with an external program like grep, and then interpolate its output into a variable assignment using command substitution: $(command arg ...) syntax.
Suppose we can rely on file3 to contain exactly one line of the form count[42]. Then we can just extract the digits with grep -o:
c=$(grep -E -o '[0-9]+' filename3)
sed -e "s/ABCXYZ/$c/g" /directory/filename1.xml > /directory/filename2.xml
The c variable can be eliminated, of course; you can stick the $(grep ...) into the sed command line in place of $c.
A file which contains numerous instances of syntax like count[42] for various variables could be transformed into a set of shell variable assignments using sed, and then sourced into the current shell to make those assignments happen:
$ sed -n -e 's/^\([A-Za-z_][A-Za-z0-9_]\+\)\[\(.*\)\]/\1=\2/p' filename3 > vars.sh
$ . ./vars.sh
you can use sed like this
sed -r "s/ABCXYZ/$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)/g" path_to_file
the expression is double quoted which allow the shell to execute below and find the number in count[$c] in the file and use it as a substitute
$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)

Resources