I have a large file that contains 2 IPs per line - and there's about 3 million lines total.
Here's an example of the file:
1.32.0.0,1.32.255.255
5.72.0.0,5.75.255.255
5.180.0.0,5.183.255.255
222.127.228.22,222.127.228.23
222.127.228.24,222.127.228.24
I need to convert each IP to an IP Decimal, like this:
18874368,18939903
88604672,88866815
95682560,95944703
3732923414,3732923415
3732923416,3732923416
I'd prefer a way to do this strictly via command line. I'm okay with perl or python being used, as long as it doesn't require extra modules to be installed.
I thought I had come across a way that someone converted IPs like this using sed but can't seem to find that tutorial anymore. Any help would be appreciated.
If you have gnu awk installed (for the RT variable), you could use this one-liner:
awk -F. -v RS='[\n,]' '{printf "%d%s", (($1*256+$2)*256+$3)*256+$4, RT}' file
18874368,18939903
88604672,88866815
95682560,95944703
3732923414,3732923415
3732923416,3732923416
Here it is python solution, that use only standard modules (re, sys):
import re
import sys
def multiplier_generator():
""" Cyclic generator of powers of 256 (from 256**3 down to 256**0)
The mulitpliers tupple could be replaced by inline calculation
of power, but this approach has better performance.
"""
multipliers = (
256**3,
256**2,
256**1,
256**0,
)
idx = 0
while 1 == 1:
yield multipliers[idx]
idx = (idx + 1) % 4
def replacer(match_object):
"""re.sub replacer for ip group"""
multiplier = multiplier_generator()
res = 0
for i in xrange(1,5):
res += multiplier.next()*int(match_object.group(i))
return str(res)
if __name__ == "__main__":
std_in = ""
if len(sys.argv) > 1:
with open(sys.argv[1],'r') as f:
std_in = f.read()
else:
std_in = sys.stdin.read()
print re.sub(r"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)", replacer, std_in )
This solution replace every ip address, that can be found in text from standard input or from file passed as first parameter, i.e:
python convert.py < input_file.txt, or
python convert.py file.txt, or
echo "1.2.3.4, 5.6.7.8" | python convert.py.
With bash:
ip2dec() {
set -- ${1//./ } # split $1 with "." to $1 $2 $3 $4
declare -i dec # set integer attribute
dec=$1*256*256*256+$2*256*256+$3*256+$4
echo -n $dec
}
while IFS=, read -r a b; do ip2dec $a; echo -n ,; ip2dec $b; echo; done < file
Output:
18874368,18939903
88604672,88866815
95682560,95944703
3732923414,3732923415
3732923416,3732923416
With bash and using shift (one CPU instruction) instead of multiply (a lot of instructions):
ip2dec() { local IFS=.
set -- $1 # split $1 with "." to $1 $2 $3 $4
printf '%s' "$(($1<<24+$2<<16+$3<<8+$4))"
}
while IFS=, read -r a b; do
printf '%s,%s\n' "$(ip2dec $a)" "$(ip2dec $b)"
done < file
Related
I am trying to write a util function in a bash script that can take a multi-line string and append it to the supplied file if it does not already exist.
This works fine using grep if the pattern does not contain \n.
if grep -qF "$1" $2
then
return 1
else
echo "$1" >> $2
fi
Example usage
append 'sometext\nthat spans\n\tmutliple lines' ~/textfile.txt
I am on MacOS btw which has presented some problems with some of the solutions I've seen posted elsewhere being very linux specific. I'd also like to avoid installing any other tools to achieve this if possible.
Many thanks
If the files are small enough to slurp into a Bash variable (you should be OK up to a megabyte or so on a modern system), and don't contain NUL (ASCII 0) characters, then this should work:
IFS= read -r -d '' contents <"$2"
if [[ "$contents" == *"$1"* ]]; then
return 1
else
printf '%s\n' "$1" >>"$2"
fi
In practice, the speed of Bash's built-in pattern matching might be more of a limitation than ability to slurp the file contents.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I replaced echo with printf.
Using awk:
awk '
BEGIN {
n = 0 # length of pattern in lines
m = 0 # number of matching lines
}
NR == FNR {
pat[n++] = $0
next
}
{
if ($0 == pat[m])
m++
else if (m > 0 && $0 == pat[0])
m = 1
else
m = 0
}
m == n {
exit
}
END {
if (m < n) {
for (i = 0; i < n; i++)
print pat[i] >>FILENAME
}
}
' - "$2" <<EOF
$1
EOF
if necessary, one would need to properly escape any metacharacters inside FS | OFS :
jot 7 9 |
{m,g,n}awk 'BEGIN { FS = OFS = "11\n12\n13\n"
_^= RS = (ORS = "") "^$" } _<NF || ++NF'
9
10
11
12
13
14
15
jot 7 -2 | (... awk stuff ...)
-2
-1
0
1
2
3
4
11
12
13
Reading a text file into an array, extracting elements and sorting them is taking a very long time.
The text file is ffmpeg console output for R128 audio analysis. I need to get the highest M and S values. Example:
[Parsed_ebur128_0 # 0x7fd32a60caa0] t: 4.49998 M: -22.2 S: -29.9 I: -27.0 LUFS LRA: 9.8 LU FTPK: -12.4 dBFS TPK: -9.7 dBFS
[Parsed_ebur128_0 # 0x7fd32a60caa0] t: 4.69998 M: -22.5 S: -28.6 I: -25.9 LUFS LRA: 11.3 LU FTPK: -12.7 dBFS TPK: -9.7 dBFS
The text file can be hundreds or thousands of lines long depending on the duration of the audio file being analysed
I want to find the highest M (-22.2) and S Values (-28.6) and assign them to variables M and S
This is what I am using currently:
ARRAY=()
while read LINE
do
ARRAY+=("$LINE")
done < $tempDir/text.txt
for LINE in "${ARRAY[#]}"
do
echo "$LINE" | sed -n ‘/B:/p' | sed 's/S:.*//' | sed -n -e 's/^.*M://p' | sed -n -e 's/-//p' >>/$tempDir/R128M.txt
done
for LINE in "${ARRAY[#]}"
do
echo "$LINE" | sed -n '/M:/p' | sed 's/I:.*//' | sed -n -e 's/^.*S://p' | sed -n -e 's/-//p' >>$tempDir/R128S.txt
done
cat $tempDir/R128M.txt
M=( $(sort $tempDir/R128M.txt) )
cat $tempDir/R128S.txt
S=( $(sort $tempDir/R128S.txt) )
Is there a faster way of doing this?
Rather than reading in the whole file in memory, writing bits of it out to separate file, and reading those in again, just parse it and pick out the largest values:
$ awk '$7 > m || m == "" { m = $7 } $9 > s || s == "" { s = $9 } END { print m, s }' data
-22.2 -28.6
In your data, field 7 and 9 contains the values of M and S. The awk script will update its m and s variables if it finds larger values in these fields and then print the largest found at the end. The m == "" and s == "" are needed to trigger initialization of the values if no values has been read yet.
Another way with awk, which may look cleaner:
$ awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { print m, s }' data
To assign them to M and S in the shell:
$ declare $( awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { printf("M=%f S=%f\n", m, s) }' data )
$ echo $M $S
-22.200000 -28.600000
Adjust the printf() format to use %s instead of %f if you want the original strings instead of float values, or set the number of decimals you might want with, e.g., %.2f in place of %f.
First of all, three-process pipe is a bit redundant for a single value extraction, especially taking into account you reinstantiate it anew for every line.
Next, you save all the values into a file and then sort that file, while all you need is the maximum value. You can easily find it during the very first (value extraction) loop, for additional O(N) running time, instead of I/O and sorting with all the I/O overhead and O(NlogN) sorting expenses. See ARITHMETIC EXPANSION and conditional expressions in bash manual.
So today, April 23rd 2015, the Internet Assigned Numbers Authority had decreed the use of port 6379 to Redis, a frabjous day indeed!
I wish to com·mem·o·rate this splendid occasion by adding the following line to my /etc/services file:
redis 6379/tcp
What would be the best way to go about it? By best I mean, of course, the following:
Needless to say, the new line should be inserted in its proper place (i.e.g. under the Assigned Numbers block, right after gnutella-rtr 6347/udp on my system)
I've considered the use of various text editors, but it feels out of place
Ideally, the solution should be a copy-pastable one-liner
I can envision the awk script that could do that but I'm looking for something more, a certain je ne sais quoi
Update re #Markus' sed proposal: I'm afraid the problem would be applying this "patch" on other systems that do not necessarily have the same /etc/services file so, expanding on point #1 above, the solution must ensure that regardless the specifics of to-be-preceding service in the file, order is kept.
Update 2: a few points that seem important to state - a) while not mandatory, the solution's length (or lack of rather) is certainly an important part of its elegance (similarly for external dependencies [i.e. lack of these]); b) I/we assumed that /etc/services is sorted, but it would be interesting to see what happens when it isn't; c) assume that you have root privileges and be careful with that rm / -rf command.
A single line sort that puts it in the right place:
echo -e "redis\t\t6379/tcp" | sort -k2 -n -o /etc/services -m - /etc/services
Nothing in the rules against answering my own question - this one uses Redis exclusively:
cat /etc/services | redis-cli -x SET services; redis-cli --raw EVAL 'local s = redis.call("GET", KEYS[1]) local b, p, name, port, proto; p = 1 repeat b, p, name, port, proto = string.find(s, "([%a%w\-]*)%s*(%d+)/(%w+)", p) if (p and tonumber(port) > tonumber(ARGV[2])) then s = string.sub(s, 1, b-1) .. ARGV[1] .. "\t\t" .. ARGV[2] .. "/" .. ARGV[3] .. "\t\t\t# " .. ARGV[4] .. "\n" .. string.sub(s, b, string.len(s)) return s end until not(p)' 1 services redis 6379 tcp "remote dictionary server" > /etc/services
Formatted Lua code:
local s = redis.call("GET", KEYS[1])
local b, p, name, port, proto
p = 1
repeat
b, p, name, port, proto = string.find(s, "([%a%w\-]*)%s*(%d+)/(%w+)", p)
if (p and tonumber(port) > tonumber(ARGV[2])) then
s = string.sub(s, 1, b-1) .. ARGV[1] .. "\t\t" .. ARGV[2]
.. "/" .. ARGV[3] .. "\t\t\t# " .. ARGV[4] .. "\n"
.. string.sub(s, b, string.len(s))
return s
end
until not(p)
Note: a similar challenge (https://gist.github.com/jorinvo/2e43ffa981a97bc17259#gistcomment-1440996) had inspired this answer. I chose a pure Lua script approach instead of leveraging Sorted Sets... although I could :)
An idempotent awk one-liner that inserts 6379 in order would be:
awk -v inserted=0 '/^[a-z]/ { if ($2 + 0 == 6379) { inserted=1 }; if (inserted == 0 && $2 + 0 > 6379) { print "redis\t\t6379/tcp"; inserted=1 }; print $0 }' /etc/services > /tmp/services && mv /tmp/services /etc/services
Real men don't use sed/awk :)
TMP_SERVICES=/tmp/services.$RANDOM
while read line
do
printf %b "$line\n" >> $TMP_SERVICES
if [[ $line == *"6347/udp"* ]]; then
printf %b "redis\t\t6379/tcp\n" >> $TMP_SERVICES
fi
done<"/etc/services"
mv -fb $TMP_SERVICES /etc/services
How about a python script, Itamar?
It works on the notion of extracting the port number (called the index in the code) and if we are above 6378 but have not yet printed our Redis line, print it, then mark that sentinel true and just print all lines (including the one we are on) after.
#!/usr/bin/python
lines = open("/etc/services").readlines()
printed=False
for line in lines:
if printed:
print line.rstrip()
continue
datafields = line.split()
if line[0] == "#":
print line.rstrip()
else:
datafields = line.split()
try:
try:
index,proto = datafields[1].split("/")
index = int(index)
except:
index,proto = datafields[0].split("/")
index = int(index)
if index > 6378:
if not printed:
print "redis 6379/tcp #Redis DSS"
printed = True
print line.rstrip()
except:
print datafields
raise
The relevant section on my file for comparison:
gnutella-rtr 6347/tcp # gnutella-rtr
# Serguei Osokine <osokin#paragraph.com>
# 6348-6381 Unassigned
redis 6379/tcp #Redis DSS
metatude-mds 6382/udp # Metatude Dialogue Server
metatude-mds 6382/tcp # Metatude Dialogue Server
Notice the line above Redis is a range. Short of breaking the range this is a workable solution for me. You could break the range but IMO this works just fine. Splitting the range seems a bit much for a simple, elegant script. Especially considering the likelihood most services files don't have the unassigned ranges listed (this is on OS X) - and that they are in a comment anyway.
UPDATE
If you don't care about the local file and it's comments, this gets you all currently assigned ports which are not Reserved or Discard-ed:
curl -s http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.csv| awk -F',' '$4!~/(Discard|Unassigned|Reserved)/ && $1 && $2+0>0 && $1!~/FIX/ {printf "%-16s\t%s/%s\t#%s\n", $1,$2,$3,$4}' > /etc/services
The FIX test is because some of those lines have embedded newlines - which can be a pain in awk.
sed -i '/6347\/udp/a redis 6379\/tcp' /etc/services
good luck!
update
sed -i '/6347\/udp/a redis \t\t 6379\/tcp' /etc/services
looks better ...
update 2
lol
sed -i ''"$(echo $(echo $(grep -n $(awk {'print$2'} /etc/services | awk -F "/" '$1<6379'{'print$1'} | tail -1) /etc/services | awk -F ':' {'print$1'}|tail -1) + 1)|bc)"'i redis \t\t 6379\/tcp' /etc/services
I like the in-place subsitution of sed, so I combine it with an awk search of the next line in services
R="redis\t\t6379/tcp\t\t\t# data structure server" N=$(awk '{if($2+0 == 6379) exit(1);if ($2+0 > 6379) {print $0;exit(0)}}' /etc/services) && sed -i "s~$N~$R\n$N~" /etc/services
I work in telecoms and regularly need to expand number ranges.
For example, 6121234567X [note that there are 10 numbers preceeding the X] is shorthand for:
61212345670
61212345671
61212345672....... etc (a 10 number range)
and 612123456X [note that there are only 9 numbers preceeding the X] is shorthand for
61212345600
61212345601....... etc (a 100 number range)
So I need a grep command that...
reads how many characters in the line preceeding the X (to determine how many suffixes)
writes the appropriate amount of lines (10, 100, or 100) with ascending suffixes
hopefully removes the original line
Below is the Python script that does it, file-name is the expected first argument. Example usage: python script.py file.in > file.out
#!/usr/bin/env python
import sys
def generate(pattern):
p = pattern.lower().find('x')
ret = ""
for i in range(10**(10-p+1)):
ret += pattern[:p] + str(i).zfill(10-p+1) + " "
return ret
if __name__ == "__main__":
if len(sys.argv) <= 1:
print("Filename needed!")
else:
with open(sys.argv[1]) as f:
for ln in f:
print(generate(ln.rstrip()))
You can do this in awk quite quickly:
awk -v val=$a -v max=10
'BEGIN {
gsub("X","",val)
items=max - length(val)
for (i=0; i<=10^items; i++)
print val*(10^items)+i
}'
This works as an example. To do the same reading from a file, you just need to play with $1 (first field of the field) instead of val and move all the code from BEGIN into the main block.
Explanation
-v val=$a -v max=10 pass parameters: $a is the variable containing the string on the form 12345678X AND max contains the maximum amount of digits the number will have (10 in your case).
BEGIN {} perform all these actions [before/without] reading a file.
gsub("X","",val) remove X from val.
items=max - length(val) count the size of the variable without the X.
for (i=0; i<=10^items; i++) print val*(10^items)+i loop from 0 to 10^remaining_size. This means from 0 to 10 or from 0 to 100... depending on the result of 10 - size without X.
Test
With 9 as maximum:
$ a=12345678X
$ awk -v val=$a -v max=9 'BEGIN {gsub("X","",val); items=max - length(val); for (i=0; i<=10^items; i++) print val*(10^items)+i}'
123456780
123456781
123456782
123456783
123456784
123456785
123456786
123456787
123456788
123456789
123456790
echo 6121234567X | perl -nE 'm/(.*)X/;
say $1. $_ foreach (0..10**(11-length $1)-1)'
61212345670
61212345671
61212345672
61212345673
61212345674
61212345675
61212345676
61212345677
61212345678
61212345679
It's a little uglier to get the zero padded format:
echo 611234567X | perl -wne 'm/(.*)X/; $b=$1; $r=11 - length $b;
$fmt="%0" . $r . "s\n";
printf "$b$fmt", $_ foreach (0..10**$r-1) '
I am trying to use bc in an awk script. In the code below, I am trying to convert hexadecimal number to binary and store it in a variable.
#!/bin/awk -f
{
binary_vector = $(bc <<< "ibase=16;obase=2;FF")
}
Where do I go wrong?
Not saying it's a good idea but:
$ awk 'BEGIN {
cmd = "bc <<< \"ibase=16;obase=2;FF\""
rslt = ((cmd | getline line) > 0 ? line : -1)
close(cmd)
print rslt
}'
11111111
Also see http://gnu.org/software/gawk/manual/gawk.html#Bitwise-Functions and http://gnu.org/software/gawk/manual/gawk.html#Nondecimal-Data
The following one-liner Awk script should do what you want:
awk -vVAR=$(read -p "Enter number: " -u 0 num; echo $num) \
'BEGIN{system("echo \"ibase=16;obase=2;"VAR"\"|bc");}'
Explanation:
-vVAR Passes the variable VAR into Awk
-vVAR=$(read -p ... ) Sets the variable VAR from the
shell to the user input.
system("echo ... |bc") Uses the Awk system built in command to execute the shell commands. Notice how the quoting stops at the variable VAR and then continues just after it, thats so that Awk interprets VAR as an Awk variable and not as part of the string put into the system call.
Update - to use it in an Awk variable:
awk -vVAR=$(read -p "Enter number: " -u 0 num; echo $num) \
'BEGIN{s="echo \"ibase=16;obase=2;"VAR"\"|bc"; s | getline awk_var;\
close(s); print awk_var}'
s | getline awk_var will put the output of the command s into the Awk variable awk_var. Note the string is built before sending it to getline - if not (unless you parenthesize the string concatenation) Awk will try to send it to getline in separate pieces %s VAR %s.
The close(s) closes the pipe - although for bc it doesn't matter and Awk automatically closes pipes upon exit - if you put this into a more elaborate Awk script it is best to explicitly close the pipe. According to the Awk documentation some commands such as mail will wait on the pipe to close prior to completion.
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_39.html
By the way you wrote your example, it looks like you want to convert an awk record ( line ) into an associative array. Here's an awk executable script that allows that by running the bc command over values in a split type array:
#!/usr/bin/awk -f
{
# initialize the a array
cnt = split($0, a, FS)
if( convertArrayBase(10, 2, a, cnt) > -1 ) {
# use the array here
for(i=1; i<=cnt; i++) {
print a[i]
}
}
}
# Destructively updates input array, converting numbers from ibase to obase
#
# #ibase: ibase value for bc
# #obase: obase value for bc
# #a: a split() type associative array where keys are numeric
# #cnt: size of a ( number of fields )
#
# #return: -1 if there's a getline error, else cnt
#
function convertArrayBase(ibase, obase, a, cnt, i, b, cmd) {
cmd = sprintf("echo \"ibase=%d;obase=%d", ibase, obase)
for(i=1; i<=cnt; i++ ) {
cmd = cmd ";" a[i]
}
cmd = cmd "\" | bc"
i = 0 # reset i
while( (cmd | getline b) > 0 ) {
a[++i] = b
}
close( cmd )
return i==cnt ? cnt : -1
}
When used with an input of:
1 2 3
4 s 1234567
this script outputs the following:
1
10
11
100
0
100101101011010000111
The convertArrayBase function operates on split type arrays. So you have to initialize the input array (a here) with the full row (as shown) or a field's subflds(not shown) before calling the it. It destructively updates the array.
You could instead call bc directly with some helper files to get similar output. I didn't find that bc supported - ( stdin as a file name ) so
it's a little more than I'd like.
Making a start_cmds file like this:
ibase=10;obase=2;
and a quit_cmd like:
;quit
Given an input file (called data.semi) where the data is separated by a ;, like this:
1;2;3
4;s;1234567
you can run bc like:
$ bc -q start_cmds data.semi quit_cmd
1
10
11
100
0
100101101011010000111
which is the same data that the awk script is outputting, but only calling bc a single time with all of the inputs. Now, while that data isn't in an awk associative array in a script, the bc output could be written as stdin input to awk and reassembed into an array like:
bc -q start_cmds data.semi quit_cmd | awk 'FNR==NR {a[FNR]=$1; next} END { for( k in a ) print k, a[k] }' -
1 1
2 10
3 11
4 100
5 0
6 100101101011010000111
where the final dash is telling awk to treat stdin as an input file and lets you add other files later for processing.