Shell: In-file converting first field of a text file from decimal to hexadecimal - bash

This is my example text file:
$ cat RealVNC\ MRU.reg
"10"="Lamborghini-:1"
"16"="Terminus-"
"20"="Midnighter-:5915"
"35"="ThreepWood-:1"
"81"="Midnighter-:1"
"58"="Midnighter-"
And I would like to convert values of the first field (the numbers between "") from decimal to hexadecimal (it is a .reg file for Windows, so I meesed it up thinking the numbers were in a decimal base, and now the file is too long to manually edit).
Example result that I need to obtain:
$ cat Hex\ RealVNC\ MRU.reg
"0A"="Lamborghini-:1"
"10"="Terminus-"
"14"="Midnighter-:5915"
"23"="ThreepWood-:1"
"51"="Midnighter-:1"
"3A"="Midnighter-"
As can be seen, only the numbers have changed.
Resulting numbers must be two characters long (RegEdit considers them different).
Changes in the order of the lines don't bother here, but I think it would be more "clean" a solution that doesn't change it.
I don't expect any number (be it decimal or hex) will have more than 2 characters, but a solution that considers this possibility will be best (as it is a more generic solution).
I have tested so far:
$ cat RealVNC\ MRU.reg | awk -F \" '{print $2}'
10
16
20
35
81
58
But I don't know who to in-line make the changes from dec to hex.
My shell is usually Bash, but other shell-derivated solutions (like Perl or Python) are accepted too.

A simple awk:
awk -F\" '$2=sprintf("%02X", $2)' OFS=\" file
"0A"="Lamborghini-:1"
"10"="Terminus-"
"14"="Midnighter-:5915"
"23"="ThreepWood-:1"
"51"="Midnighter-:1"
"3A"="Midnighter-"
Explanation
-F\" : sets field separator (FS) to "
$2=sprintf("%02X", $2) : $2 is assigned to it’s printed
version ( sprintf) with a %02X mask in hexadecimal using the
letters 'A' to 'F' for hex digits greater than 9 and a two digits
width with 0 padding
OFS=\" : sets the Output FS to match FS
The $2 assignation is always true and no additional action is given , awk always displays the results as it's default action.

You can use perl - when using the substitution operator with the e flag you can pass a function to handle the replace value:
echo abc | perl -ne 's/(.+)/uc($1);print/e' # ABC
You can then use sprintf function to convert decimal to hex with the %X conversion:
%x an unsigned integer, in hexadecimal
%X like %x, but using upper-case letters
$ cat RealVNC\ MRU.reg | perl -ne 's/^"(.*?)"/sprintf("\"%X\"", $1)/e;print;'
"A"="Lamborghini-:1"
"10"="Terminus-"
"14"="Midnighter-:5915"
"23"="ThreepWood-:1"
"51"="Midnighter-:1"
"3A"="Midnighter-"
If you want leading zero on 0-F single values you can use the prefix format %02X:
%02X
^^L Conversion
|L Result length
L- Prefix char
And the result:
$ cat RealVNC\ MRU.reg | perl -ne 's/^"(.*?)"/sprintf("\"%02X\"", $1)/e;print;'
"0A"="Lamborghini-:1"
"10"="Terminus-"
"14"="Midnighter-:5915"
"23"="ThreepWood-:1"
"51"="Midnighter-:1"
"3A"="Midnighter-"

Related

How to replace text in file between known start and stop positions with a command line utility like sed or awk?

I have been tinkering with this for a while but can't quite figure it out. A sample line within the file looks like this:
"...~236 characters of data...Y YYY. Y...many more characters of data"
How would I use sed or awk to replace spaces with a B character only between positions 236 and 246? In that example string it starts at character 29 and ends at character 39 within the string. I would want to preserve all the text preceding and following the target chunk of data within the line.
For clarification based on the comments, it should be applied to all lines in the file and expected output would be:
"...~236 characters of data...YBBYYY.BBY...many more characters of data"
With GNU awk:
$ awk -v FIELDWIDTHS='29 10 *' -v OFS= '{gsub(/ /, "B", $2)} 1' ip.txt
...~236 characters of data...YBBYYY.BBY...many more characters of data
FIELDWIDTHS='29 10 *' means 29 characters for first field, next 10 characters for second field and the rest for third field. OFS is set to empty, otherwise you'll get space added between the fields.
With perl:
$ perl -pe 's/^.{29}\K.{10}/$&=~tr| |B|r/e' ip.txt
...~236 characters of data...YBBYYY.BBY...many more characters of data
^.{29}\K match and ignore first 29 characters
.{10} match 10 characters
e flag to allow Perl code instead of string in replacement section
$&=~tr| |B|r convert space to B for the matched portion
Use this Perl one-liner with substr and tr. Note that this uses the fact that you can assign to substr, which changes the original string:
perl -lpe 'BEGIN { $from = 29; $to = 39; } (substr $_, ( $from - 1 ), ( $to - $from + 1 ) ) =~ tr/ /B/;' in_file > out_file
To change the file in-place, use:
perl -i.bak -lpe 'BEGIN { $from = 29; $to = 39; } (substr $_, ( $from - 1 ), ( $to - $from + 1 ) ) =~ tr/ /B/;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
I would use GNU AWK following way, for simplicity sake say we have file.txt content
S o m e s t r i n g
and want to change spaces from 5 (inclusive) to 10 (inclusive) position then
awk 'BEGIN{FPAT=".";OFS=""}{for(i=5;i<=10;i+=1)$i=($i==" "?"B":$i);print}' file.txt
output is
S o mBeBsBt r i n g
Explanation: I set field pattern (FPAT) to any single character and output field seperator (OFS) to empty string, thus every field is populated by single characters and I do not get superfluous space when print-ing. I use for loop to access desired fields and for every one I check if it is space, if it is I assign B here otherwise I assign original value, finally I print whole changed line.
Using GNU awk:
awk -v strt=29 -v end=39 '{ ram=substr($0,strt,(end-strt));gsub(" ","B",ram);print substr($0,1,(strt-1)) ram substr($0,(end)) }' file
Explanation:
awk -v strt=29 -v end=39 '{ # Pass the start and end character positions as strt and end respectively
ram=substr($0,strt,(end-strt)); # Extract the 29th to the 39th characters of the line and read into variable ram
gsub(" ","B",ram); # Replace spaces with B in ram
print substr($0,1,(strt-1)) ram substr($0,(end)) # Rebuild the line incorporating raw and printing the result
}'file
This is certainly a suitable task for perl, and saddens me that my perl has become so rusty that this is the best I can come up with at the moment:
perl -e 'local $/=\1;while(<>) { s/ /B/ if $. >= 236 && $. <= 246; print }' input;
Another awk but using FS="":
$ awk 'BEGIN{FS=OFS=""}{for(i=29;i<=39;i++)sub(/ /,"B",$i)}1' file
Output:
"...~236 characters of data...YBBYYY.BBY...many more characters of data"
Explained:
$ awk ' # yes awk yes
BEGIN {
FS=OFS="" # set empty field delimiters
}
{
for(i=29;i<=39;i++) # between desired indexes
sub(/ /,"B",$i) # replace space with B
# if($i==" ") # couldve taken this route, too
# $i="B"
}1' file # implicit output
With sed :
sed '
H
s/\(.\{236\}\)\(.\{11\}\).*/\2/
s/ /B/g
H
g
s/\n//g
s/\(.\{236\}\)\(.\{11\}\)\(.*\)\(.\{11\}\)/\1\4\3/
x
s/.*//
x' infile
When you have an input string without \r, you can use:
sed -r 's/(.{236})(.{10})(.*)/\1\r\2\r\3/;:a;s/(\r.*) (.*\r)/\1B\2/;ta;s/\r//g' input
Explanation:
First put \r around the area that you want to change.
Next introduce a label to jump back to.
Next replace a space between 2 markers.
Repeat until all spaces are replaced.
Remove the markers.
In your case, where the length doesn't change, you can do without the markers.
Replace a space after 236..245 characters and try again when it succeeds.
sed -r ':a; s/^(.{236})([^ ]{0,9}) /\1\2B/;ta' input
This might work for you (GNU sed):
sed -E 's/./&\n/245;s//\n&/236/;h;y/ /B/;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Divide the problem into 2 lines, one with spaces and one with B's where there were spaces.
Then using pattern matching make a composite line from the two lines.
N.B. The newline can be used as a delimiter as it is guaranteed not to be in seds pattern space.

Bash ./ get subset with length of 8 chars

I have the following input:
line="before,myinput1,after"
myinput1 can be also first or last. for example: line="myinput1,after" or line="before,myinput1"
Im trying to get only the myinput1 value (which can be changed). tried this:
line | grep -o -E ',.{0,7}.,'
which its returned the following value: ,myinput1,. The issue its not working if the value is first or last because the missing ,.
is there any other way to do that?
Using grep, a regex for 8 characters (assuming you only want an 8 character string) is \w{8}. Using OR operators | the three cases needed (start of line, end of line and somewhere in the middle of the line) can be expressed as:
egrep -o ',\w{8},|^\w{8},|,\w{8}$'
To catch fields of 8 characters in a comma delimited string, you can use awk:
awk -v RS=, 'length()==8' <<< "$line"
RS sets the record separator to the comma ,.
awk length() function gives the size of the current record.
With bash :
(IFS=',';set -- $line;for i;do [ ${#i} -eq 8 ] && echo $i ;done)

How to replace last n characters in the kth occurence of a line containing a certain substring using sed or awk?

Suppose I have a file that resembles the following format:
\\ Random other lines \\
...
27861NA+ NA+89122 13.480 11.554 10.082
27862NA+ NA+89123 2.166 5.896 10.108
27863NA+ NA+89124 8.289 6.843 3.090
27864NA+ NA+89125 12.972 5.936 4.498
27865CL- CL-89126 13.914 2.125 12.915
27866CL- CL-89127 12.050 13.907 3.559
...
\\ Random other lines \\
I am trying to find a way of replacing the last 24 characters of each line with a string that I have prepared, for the first 3 instances of lines in the file that contain the string "NA+".
For example, my output would ideally look like:
\\ Random other lines \\
...
27861NA+ NA+89122 my first string hello
27862NA+ NA+89123 my second string foo
27863NA+ NA+89124 my final string bar $$
27864NA+ NA+89125 12.972 5.936 4.498
27865CL- CL-89126 13.914 2.125 12.915
27866CL- CL-89127 12.050 13.907 3.559
...
\\ Random other lines \\
So far, I have found a sed command that will remove the last 24 characters from every line in the file:
sed 's/.\{24\}$//' myfile.txt
And also an awk command that will return the kth line that contains the desired substring:
awk '/NA+/{i++}i==1' myfile.txt
Does anyone have an idea about how I could replace the last 24 characters in the 1st, 2nd, and 3rd lines of my file that each contain a certain substring?
With single awk:
awk -v str="my string" '!f && /NA\+/{ f=1; n=NR+3 }n && n>NR{ $4=$5=""; $3=str }1' myfile.txt
string="my first string hello"
awk -v string="$string" '{ if ( $0 ~ "NA" ) {cnt++} if (cnt < 4 ) { print substr($0,1,length($0)-23)string } else { print }}' NA
Using awk, set a string and pass it awk with -v. Search for strings containing NA and the increment the variable cnt. When cnt is less that 4, print the everything but the last 23 characters adding the string passed to the end. Otherwise print the line.
This might work for you (GNU sed):
sed '/NA+/{x;s/\n/&/3;x;ta;H;s/.\{24\}$/some string/;b;:a;n;ba}' file
This uses the hold space (HS) to keep a count of the number of lines the script has seen of the required string (NA+). Once it has seen n (in this case n=3) such lines it just prints the remainder of the file.

How to add a constant number to all entries of a row in a text file in bash

I want to add or subtract a constant number form all entries of a row in a text file in Bash.
eg. my text file looks like:
21.018000 26.107000 51.489000 71.649000 123.523000 127.618000 132.642000 169.247000 173.276000 208.721000 260.032000 264.127000 320.610000 324.639000 339.709000 354.779000 385.084000
(it has only one row)
and I want to subtract value 18 from all columns and save it in a new file. What is the easiest way to do this in bash?
Thanks a lot!
Use simple awk like this:
awk '{for (i=1; i<=NF; i++) $i -= 18} 1' file >> $$.tmp && mv $$.tmp file
cat file
3.018 8.107 33.489 53.649 105.523 109.618 114.642 151.247 155.276 190.721 242.032 246.127 302.61 306.639 321.709 336.779 367.084
Taking advantage of awks RS and ORS variables we can do it like this:
awk 'BEGIN {ORS=RS=" "}{print $1 - 18 }' your_file > your_new_filename
It sets the record separator for input and output to space. This makes every field a record of its own and we have only to deal with $1.
Give a try to this compact and funny version:
$ printf "%s 18-n[ ]P" $(cat text.file) | dc
dc is a reverse-polish desk calculator (hehehe).
printf generates one string per number. The first string is 21.018000 18-n[ ]P. Other strings follow, one per number.
21.018000 18: the values separated with spaces are pushed to the dc stack.
- Pops two values off, subtracts the first one popped from the second one popped, and pushes the result.
n Prints the value on the top of the stack, popping it off, and does not print a newline after.
[ ] add string (space) on top of the stack.
P Pops off the value on top of the stack. If it it a string, it is simply printed without a trailing newline.
The test with an additional sed to replace the useless last (space) char with a new line:
$ printf "%s 18-n[ ]P" $(cat text.file) | dc | sed "s/ $/\n/" > new.file
$ cat new.file
3.018000 8.107000 33.489000 53.649000 105.523000 109.618000 114.642000 151.247000 155.276000 190.721000 242.032000 246.127000 302.610000 306.639000 321.709000 336.779000 367.084000
----
For history a version with sed:
$ sed "s/\([1-9][0-9]*[.][0-9][0-9]*\)\{1,\}/\1 18-n[ ]P/g" text.file | dc
With Perl which will work on multiply rows:
perl -i -nlae '#F = map {$_ - 18} #F; print "#F"' num_file
# ^ ^^^^ ^
# | |||| Printing an array in quotes will join
# | |||| with spaces
# | |||Evaluate code instead of expecting filename.pl
# | ||Split input on spaces and store in #F
# | |Remove (chomp) newline and add newline after print
# | Read each line of specified file (num_file)
# Inplace edit, change original file, take backup with -i.bak

speed up my awk command? Answer must be awk :)

I have some awk code that is running really slow. The format of my file is tab delimited 5 column ASCII. I am operating on column 5 to get a count of appropriate characters to alter the value in column 4.
Example input line:
10 5134832 N 28 Aaaaa*AAAAaAAAaAAAAaAAAA^]a^]a^Fa^]a
If I find any "^" in $5 I want to not count it, or the following character.
Then I want to find out how many characters are ">" or "<" or "*" and remove them from the count. I'm guessing using a gsub, and 3 splits is less than ideal, especially since column 5 can occasionally be a very very long string.
awk '{l=$4; if($5~/>/ || $5~/</ || $5~/*/ ) {gsub(/\^./,"");l-=split($5,a,"<")-1;l-=split($5,a,">")-1;l-=split($5,a,"*")-1}
If the code runs successfully on the line above, l will be 27.
I am omitting the surrounding parts of the command to try and focus on the part I have a question about.
So, what is the best step to make this run faster?
Well as I see, your gsub pattern will not work, as the / was not closed. Anyway, if I get it correctly and you want the character count of $5 without some characters, I'd go with:
count=length(gensub("[><A-Z^]","","g",$5))
You should list your skippable characters between [ and ], and do not start with ^!
Do you need to use awk, or will this work instead?
cut -f 5 < $file | grep -v '^[A-Z]' | tr -d '<>*\n' | wc -c
Translation:
Extract the 5th field from the tab-delimited $file.
Remove all fields starting with a capital letter.
Remove the characters <, >, *, and newlines.
Count the remaining characters.
Here's a guess:
awk '
BEGIN {FS = OFS = "\t"}
{
str = $5
gsub(/\^.|[><*]/, "", str)
l = length(str)
}
'
This might work for you:
echo "10 5134832 N 28 Aaaaa*AAAAaAAAaAAAAaAAAA^]a^]a^Fa^]a" |
awk '/[><*^]/{t=$5;gsub(/[><*]|[\^]./,"",t);$4=length(t)}1'
10 5134832 N 27 Aaaaa*AAAAaAAAaAAAAaAAAA^]a^]a^Fa^]a
if you want to show the amended fifth field:
awk '/[><*^]/{gsub(/[><*]|[\^]./,"",$5);$4=length($5)}1'

Resources