help on sorting a file using sort - bash

I have this file:
100: pattern1
++++++++++++++++++++
1:pattern2
9:pattern2
+++++++++++++++++++
79: pattern1
61: pattern1
+++++++++++++++++++
and I want to sort it like this:
++++++++++++++++++++
1:pattern2
9:pattern2
+++++++++++++++++++
61:pattern1
79:pattern1
100:pattern1
+++++++++++++++++++
Is it possible using Linux sort command only ?
If I had :
4:pat1
3:pat2
2:pat2
1:pat1
O/p should be:
1:pat1
++++++++++++
2:pat2
3:pat2
++++++++++++
4:pat1
So, want to sort on first group, but "group" on the pattern of second group.
Please note, the thing after : is a regex pattern not a literal.

Best you can do is to sort it according to the numerical values. But you cannot do anything with the "+"-string.
$ sort -n input
+++++++++++++++++++
+++++++++++++++++++
++++++++++++++++++++
1:wow
9:wow
61: this is it
79: this is it
100: this is it

I don't believe sort alone can do what you need.
Create a new shell script and put this in its contents (ie mysort.sh):
#!/bin/sh
IFS=$'\n' # This makes the for loop below split on newline instead of whitespace.
delim=+++++++++++++++++++
for l in `grep -v ^+| sort -g` # Ignore all + lines and sort by number
do
current=`echo $l | sed s/^[0-9]*://g` # Get what comes after the number
if [ ! -z "$prev" ] && [ "$prev" != "$current" ] # If it has changed...
then # then output a ++++ delimiter line.
echo $delim
fi
prev=$current
echo $l # Output this line.
done
To use it, pipe in the contents of your file like so:
cat input | sh mysort.sh

Probably not -- it's not in the sort of format sort(1) expects. And if you did it would be one of those amazing hacks, not easily used. If you have some sort of rule for what goes between the lines of plus signs, you can do it readily enough with an AWK or Perl or Python script.

If your input was space delimited, not ':' delimited:
sort -rk2 | uniq -D -f1
will do the grouping;
I guess you'd need to sort the 'subsections' later (unfortunately my sort(1) doesn't do composite key ordering. I do believe there are version that allow you to do sort -k2,1n and you'd be done at once).
use --all-repeated=separate instead of -D to get blank separators between groups. Look at man uniq for more ideas!
However, since your input is colon delimited, a hack is required:
sed 's/\([0123456789]\+\):/\1 /' t | sort -rk2 | uniq -D -f1
HTH

Related

How to get values in a line while looping line by line in a file (shell script)

I have a file which looks like this (file.txt)
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
I want to loop trough this line by line an extract the key values.
so the result should be like ,
AJGUIGIDH568
AJGUIGIDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
So I wrote a code like this to loop line by line and extract the value between {"key":" and ","rule": because key values is in between these 2 patterns.
while read p; do
echo $p | sed -n "/{"key":"/,/","rule":,/p"
done < file.txt
But this is not working. can someone help me to figure out me this. Thanks in advance.
Your sample input is almost valid json. You could tweak it to make it valid and then extract the values with jq with something like:
sed -e 's/squid/"squid/' -e 's/$/"}/' file.txt | jq -r .key
Or, if your actual input really is valid json, then just use jq:
jq -r .key file.txt
If the "random-txt" may include double quotes, making it difficult to massage the input to make it valid json, perhaps you want something like:
awk '{print $4}' FS='"' file.txt
or
sed -n '/{"key":"\([^"]*\).*/s//\1/p' file.txt
or
while IFS=\" read open_brace key colon val _; do echo "$val"; done < file.txt
For the shown data, you can try this awk:
awk -F '"[:,]"' '{print $2}' file
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
With the give example you can simple use
cut -d'"' -f4 file.txt
Assumptions:
there may be other lines in the file so we need to focus on just the lines with "key" and "rule"
the only text between "key" and "rule" is the desired string (eg, squid never shows up between the two patterns of interest)
Adding some additional lines:
$ cat file.txt
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
ignore this line}
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
ignore this line}
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
One sed idea:
$ sed -nE 's/^(.*"key":")([^"]*)(","rule".*)$/\2/p' file.txt
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
Where:
-E - enable extended regex support (and capture groups without need to escape sequences)
-n - suppress printing of pattern space
^(.*"key":") - [1st capture group] everything from start of line up to and including "key":"
([^"]*) - [2nd capture group] everything that is not a double quote (")
(","rule".*)$ - [3rd capture group] everything from ",rule" to end of line
\2/p - replace the line with the contents of the 2nd capture group and print

How to grep a specific pattern before match?

I'm currently working on multiple configuration files which use the following format:
[Stanza1]
action.script=1
action.ping=0
action.lookup=1
action.notable.param=0
action.script.filename=script.pl
[Stanza2]
action.script=0
action.ping=0
action.lookup=1
[Stanza3]
action.script=1
action.ping=0
action.lookup=0
action.script.filename=script.pl
I want to know which stanzas include "action.script.filename=script.pl", so the expected result would be
[Stanza1]
[Stanza3]
Using something like:
grep -B 10 "action.script.filename = script.pl" file
doesn't work for cases where the stanza name is more than 10 lines before the match, and proves quite cumbersome to use.
Any suggestions on how to do this?
The following sed command would do the trick :
sed -n '/^\[/h;/^action\.script\.filename=script\.pl$/{x;p}'
You can try it here.
When it encounters a line that starts with "[", it stores it into its hold buffer. When it encounters a "action.script.filename=script.pl" line, it prints the content of the hold buffer.
I'm not sure this can be done purely with grep. I would recommend a small bash script:
while read line
do
if [[ $line =~ \[.* ]]; then
# save stanza for later
stanza=$line
fi
if [[ $line =~ action.script.filename=script.pl ]]; then
echo $stanza
fi
done < file
With awk
$ awk '/action\.script\.filename=script\.pl/{print h} /^\[/{h=$0}' ip.txt
[Stanza1]
[Stanza3]
/^\[/ lines starting with [ character, you can also use something like /Stanza/ as long as it uniquely identifies header lines
h=$0 for such lines, save the content ($0) to variable h
/action\.script\.filename=script\.pl/ if input line matches the given search criteria
print h print the value of h variable
if you are matching whole line, then you can also use string match $0 == "action.script.filename=script.pl" instead of regex match
This line of code works for me
grep '^\[Stanza\|^action.script.filename=script.pl$' fileName | grep -B1 'action.script.filename=script.pl' | grep -v 'action.script.filename=script.pl\|\-\-'
Explanation:
grep '^\[Stanza\|^action.script.filename=script.pl$' fileName
matches either [Stanza]* lines or action.script.filename=script.pl ones. Output is something like this
[Stanza1]
action.script.filename=script.pl
[Stanza2]
[Stanza3]
action.script.filename=script.pl
Adding this filter | grep -B1 'action.script.filename=script.pl' will result in this
[Stanza1]
action.script.filename=script.pl
--
[Stanza3]
action.script.filename=script.pl
Now you just need to clean the output from unwanted parts
| grep -v 'action.script.filename=script.pl\|\-\-'
This is the final output
[Stanza1]
[Stanza3]
awk '/^\[.*\]$/{stanza=$0;next} /action.script.filename=script.pl/{print stanza}' filename
[Stanza1]
[Stanza3]
You can store each stanza in a variable called stanza and move to next line. Whenever you see the string action.script.filename=script.pl , print the variable stanza.

how to iterate in a file using keyword on a bash

in some file there is some content like:
scenario1{
user_range:="1..100"
ip_low:="192.168.1.1"
ip_high:=192.168.1.100
...
}
scenario2{
user_range:="101..200"
ip_low:="192.168.2.1"
ip_high:=192.168.2.100"
...
}
...
I want replace some values using sed -i. But I can't figure out how to iterates by keyword "scenario" in order to change user_ranges and ips for the whole file.
awk to the rescue!
$ awk -v RS='\n}' 'BEGIN{OFS="\n"}
{from=250*c+1; to=250*(++c);
sub(/:=.*/,":=\""from".."to"\"",$2)}
{print $0 RT}' file
scenario1{
user_range:="1..250"
ip_low:="192.168.1.1"
ip_high:=192.168.1.100
...
}
scenario2{
user_range:="251..500"
ip_low:="192.168.2.1"
ip_high:=192.168.2.100"
...
}
ip addresses can be done similarly if there is a regular pattern.
If you insist on using sed You may find it easier if you convert your file to a csv-format first.
tr '\n' ',' <testfile | tr '}' '\n' | tr -d "{" |sed 's/^,*//g;s/,*$//g' >csvfile
Since this results in one scenario per line, it will be much easier to use sed
It is quite easy with plain bash to seperate the values. I assume that the order of the key-value pairs and the number of newlines per stanza stay the same (just for demonstration purpose)
while read line
do
scenario=${line//\{/}
read line; user_range=${line}
read line; ip_low=${line}
read line; ip_high=${line}
read line; endchar=${line}
# here you can insert every piece of code you need
# to change your variables
cat<<-EOF
$scenario{
$user_range
$ip_low
$ip_high
}
EOF
done <file_like_your_example >new_file

UNIX - Replacing variables in sql with matching values from .profile file

I am trying to write a shell which will take an SQL file as input. Example SQL file:
SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'
Now the script should extract all variables, which in this case everything starting with %%. So the output file will be something as below:
%%DB
%%TBLEXT
%%CITY
Now I should be able to extract the matching values from the user's .profile file for these variables and create the SQL file with the proper values.
SELECT *
FROM tempdb.TBL_abc
WHERE CITY = 'Chicago'
As of now I am trying to generate the file1 which will contain all the variables. Below code sample -
sed "s/[(),']//g" "T:/work/shell/sqlfile1.sql" | awk '/%%/{print $NF}' | awk '/%%/{print $NF}' > sqltemp2.sql
takes me till
%%DB.TBL_%%TBLEXT
%%CITY
Can someone help me in getting to file1 listing the variables?
You can use grep and sort to get a list of unique variables, as per the following transcript:
$ echo "SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'" | grep -o '%%[A-Za-z0-9_]*' | sort -u
%%CITY
%%DB
%%TBLEXT
The -o flag to grep instructs it to only print the matching parts of lines rather than the entire line, and also outputs each matching part on a distinct line. Then sort -u just makes sure there are no duplicates.
In terms of the full process, here's a slight modification to a bash script I've used for similar purposes:
# Define all translations.
declare -A xlat
xlat['%%DB']='tempdb'
xlat['%%TBLEXT']='abc'
xlat['%%CITY']='Chicago'
# Check all variables in input file.
okay=1
for key in $(grep -o '%%[A-Za-z0-9_]*' input.sql | sort -u) ; do
if [[ "${xlat[$key]}" == "" ]] ; then
echo "Bad key ($key) in file:"
grep -n "${key}" input.sql | sed 's/^/ /'
okay=0
fi
done
if [[ ${okay} -eq 0 ]] ; then
exit 1
fi
# Process input file doing substitutions. Fairly
# primitive use of sed, must change to use sed -i
# at some point.
# Note we sort keys based on descending length so we
# correctly handle extensions like "NAME" and "NAMESPACE",
# doing the longer ones first makes it work properly.
cp input.sql output.sql
for key in $( (
for key in ${!xlat[#]} ; do
echo ${key}
done
) | awk '{print length($0)":"$0}' | sort -rnu | cut -d':' -f2) ; do
sed "s/${key}/${xlat[$key]}/g" output.sql >output2.sql
mv output2.sql output.sql
done
cat output.sql
It first checks that the input file doesn't contain any keys not found in the translation array. Then it applies sed substitutions to the input file, one per translation, to ensure all keys are substituted with their respective values.
This should be a good start, though there may be some edge cases such as if your keys or values contain characters sed would consider important (like / for example). If that is the case, you'll probably need to escape them such as changing:
xlat['%%UNDEFINED']='0/0'
into:
xlat['%%UNDEFINED']='0\/0'

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources