Unix sorting command by second last letter - sorting

I have a text file containing hexadecimal color codes but I want to be to sorted by their alpha value. How do I go about it using the -k sorting option. I basically want the codes with ff alpha values to be sorted first.
Color Codes:
#b293a6ff
#ead58fff
#a69d36ff
#067806ff
#7f0bf712
#f8b366ff
#8946d744
#c927d4ff
#3e568bff
#3e1ce1ff
#11570a00
Command:
sort -k8,9 colours.txt
Expected Output:
#b293a6ff
#ead58fff
#a69d36ff
#067806ff
#f8b366ff
#c927d4ff
#3e568bff
#3e1ce1ff
#11570a00
#7f0bf712
#8946d744

With GNU sort:
Not exactly your expected output (see the last three entries), but sorted by the last two characters in reverse order
using --stable to leave entries with the same values (ff) in their original order.
$ sort --stable -rk1.8 colours.txt
#b293a6ff
#ead58fff
#a69d36ff
#067806ff
#f8b366ff
#c927d4ff
#3e568bff
#3e1ce1ff
#8946d744
#7f0bf712
#11570a00

Related

Extract 2 fields from string with search

I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.
Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777
Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".

SSRS [Sort Alphanumerically]: How to sort a specific column in a report to be [A-Z] & [ASC]

I have a field set that contains bill numbers and I want to sort them first alphabetically then numerically.
For instance I have a column "Bills" that has the following sequence of bills.
- HB200
- SB60
- HB67
Desired outcome is below
- HB67
- HB200
- SB60
How can I use sorting in SSRS Group Properties to have the field sort from [A-Z] & [1 - 1000....]
This should be doable by adding just 2 separate Sort options in the group properties. To test this, I created a simple dataset using your examples.
CREATE TABLE #temp (Bills VARCHAR(20))
INSERT INTO #temp(Bills)
VALUES ('HB200'),('SB60'),('HB67')
SELECT * FROM #temp
Next, I added a matrix with a single row and a single column for my Bills field with a row group.
In the group properties, my sorting options are set up like this:
So to get this working, my theory was that you needed to isolate the numeric characters from the non-numeric characters and use each in their own sort option. To do this, I used the relatively unknown Regex Replace function in SSRS.
This expression gets only the non-numeric characters and is used in the top sorting option:
=System.Text.RegularExpressions.Regex.Replace(Fields!Bills.Value, "[0-9]", "")
While this expression isolates the numeric characters:
=System.Text.RegularExpressions.Regex.Replace(Fields!Bills.Value, "[^0-9]", "")
With these sorting options, my results match what you expect to happen.
In the sort expression for your tablix/table which is displaying the dataset, set the sort to something like:
=IIF(Fields!Bills.Value = "HB67", 1, IIF(Fields!Bills.Value = "HB200", 2, IIF(Fields!Bills.Value = "SB600", 3, 4)))
Then when you sort A-Z, it'll sort by the number given to it in the sort expression.
This is only a solution if you don't have hundreds of values, as this can become quite tedious to create if there's hundreds of possible conditions.

Find lines that have partial matches

So I have a text file that contains a large number of lines. Each line is one long string with no spacing, however, the line contains several pieces of information. The program knows how to differentiate the important information in each line. The program identifies that the first 4 numbers/letters of the line coincide to a specific instrument. Here is a small example portion of the text file.
example text file
1002IPU3...
POIPIPU2...
1435IPU1...
1812IPU3...
BFTOIPD3...
1435IPD2...
As you can see, there are two lines that contain 1435 within this text file, which coincides with a specific instrument. However these lines are not identical. The program I'm using can not do its calculation if there are duplicates of the same station (ie, there are two 1435* stations). I need to find a way to search through my text files and identify if there are any duplicates of the partial strings that represent the stations within the file so that I can delete one or both of the duplicates. If I could have BASH script output the number of the lines containing the duplicates and what the duplicates lines say, that would be appreciated. I think there might be an easy way to do this, but I haven't been able to find any examples of this. Your help is appreciated.
If all you want to do is detect if there are duplicates (not necessarily count or eliminate them), this would be a good starting point:
awk '{ if (++seen[substr($0, 1, 4)] > 1) printf "Duplicates found : %s\n",$0 }' inputfile.txt
For that matter, it's a good starting point for counting or eliminating, too, it'll just take a bit more work...
If you want the count of duplicates:
awk '{a[substr($0,1,4)]++} END {for (i in a) {if(a[i]>1) print i": "a[i]}}' test.in
1435: 2
or:
{
a[substr($0,1,4)]++ # put prefixes to array and count them
}
END { # in the end
for (i in a) { # go thru all indexes
if(a[i]>1) print i": "a[i] # and print out the duplicate prefixes and their counts
}
}
Slightly roundabout but this should work-
cut -c 1-4 file.txt | sort -u > list
for i in `cat list`;
do
echo -n "$i "
grep -c ^"$i" file.txt #This tells you how many occurrences of each 'station'
done
Then you can do whatever you want with the ones that occur more than once.
Use following Python script(syntax of python 2.7 version used)
#!/usr/bin/python
file_name = "device.txt"
f1 = open(file_name,'r')
device = {}
line_count = 0
for line in f1:
line_count += 1
if device.has_key(line[:4]):
device[line[:4]] = device[line[:4]] + "," + str(line_count)
else:
device[line[:4]] = str(line_count)
f1.close()
print device
here the script reads each line and initial 4 character of each line are considered as device name and creates a key value pair device with key representing device name and value as line numbers where we find the string(device name)
following would be output
{'POIP': '2', '1435': '3,6', '1002': '1', '1812': '4', 'BFTO': '5'}
this might help you out!!

Bash/Awk: Reformat uneven columns with multiple deliminators

I have a CSV where I need to reformat a single column's contents.
The problem is that each cell has completely different lengths to reformat.
Current column looks like (these are two lines of single column) :
Foo*foo*foo*1970,1980+Bar*bar*bar*1970
Foobar*Foobar*foobarbar*1970,1975,1980
Result should look like (still two lines one column)
Foo*foo*foo*1970+Foo*foo*foo*1980+Bar*bar*bar*1970
Foobar*Foobar*foobarbar*1970+Foobar*Foobar*foobarbar*1975+Foobar*Foobar*foobarbar*1980
this is what I'm trying to do
#!/bin/bash
cat foocol | \
awk -F'+' \
'{for i in NF print $i}' \
| awk -F'*' \
'{$Foo=$1"*"$2"*"$3"*" print $4}' \
\
| awk -v Foo=$Foo -F',' \
'{for j in NF do \
print Foo""$j"+" }' \
> newcol
The idea is to iterate over the multiple '+' delimited data, while the first three '*' delimited values are to be grouped for every ',' delimited year, with a '+' between them
But I'm just getting syntax errors everywhere.
Thanks
$ awk --re-interval -F, -v OFS=+ '{match($1,/([^*]*\*){3}/);
prefix=substr($0,RSTART,RLENGTH);
for(i=2;i<=NF;i++) $i=prefix $i }1' file
Foo*foo*foo*1970+Foo*foo*foo*1980+Bar*bar*bar*1970
Foobar*Foobar*foobarbar*1970+Foobar*Foobar*foobarbar*1975+Foobar*Foobar*foobarbar*1980
perhaps add validation with if(match(...
Solution in TXR:
$ txr reformat.txr data
Foo*foo*foo*1970+Foo*foo*foo*1980+Bar*bar*bar*1970
Foobar*Foobar*foobarbar*1970+Foobar*Foobar*foobarbar*1975+Foobar*Foobar*foobarbar*1980
Code in reformat.txr:
#(repeat)
# (coll)#/\+?/#a*#b*#c*#(coll)#{x /[^,+]+/}#(until)+#(end)#(end)
# (output :into items)
# (repeat)
# (repeat)
#a*#b*#c*#x
# (end)
# (end)
# (end)
# (output)
# {items "+"}
# (end)
#(end)
This solution is based on regarding the data to have nested syntax: groups of records are delimited by newlines. Records within groups are separated by + and within records there are four fields separated by *. The last field contains comma-separated items. The data is to be normalized by expanding copies of the records such that the comma-separated items are distributed across the copies.
The outer #(repeat) handles walking over the lines. The outer #(coll) iterates over records, collecting the first three fields into variables a, b and c. Then an inner #(coll) gets each comma separated item into the variable x. The inner #(coll) collects the x-s into a list, and the outer #(coll) also collects all the variables into lists, so a, b, c become lists of strings, and x is a list of lists of strings.
The :into items keyword parameter in the output causes the lines which would normally go the standard output device to be collected into a list of strings, and bound to a variable. For instance:
#(output :into lines)
a
b
cd
#(end)
establishes a variable lines which contains the list ("a" "b" "cd").
So here we are getting the output of the doubly-nested repeat as a bunch of lines, where each line represents a record, stored in a variable called items. Then we output these using the #{items "+"}, a syntax which outputs the contents of a list variable with the given separator.
The doubly nested repeat handles the expansion of records over each comma separated item from the fourth field. The outer repeat implicitly iterates over the lists a, b, c and x. Inside the repeat, these variables denote the items of their respective lists. Variable x is a list of lists, and so the inner repeat iterates over that. Inside the outer repeat, variables a, b, c are already scalar, and stay that way in the scope of the inner repeat: only x varies, which is exactly what we want.
In the data collection across each line, there are some subtleties:
# (coll)#/\+?/#a*#b*#c*#(coll)#{x /[^,+]+/}#(until)+#(end)#(end)
Firstly, we match an optional leading plus with the /\+?/ regex, thereby consuming it. Without this, the a field of every record, except for the first one, would include that separating + and we would get double +-s in the final output. The a, b, c variables are matched simply. TXR is non-greedy with regard to the separating material: #a* means match some characters up to the nearest * and bind them to a variable a. Collecting the x list is more tricky. Here was use a positive-regex-match variable: #{x /[^,+]+/} to extract the sub-field. Each x is a sequence of one or more characters which are not pluses or commas, extracted positively without regard for whatever follows, much like a tokenizer extracts a token. This inner collect terminates when it encounters a +, which is what the #(until)+ clause ensures. It will also implicitly terminate if it hits the end of the line; the #(until) match isn't mandatory (by default). That terminating + stays in the input stream, which is why we have to recognize it and discard it in front of the #a.
It should be noted that #(coll), by default, scans for matches and skips regions of text that do not match, just like its cousin #(collect) does with lines. For instance if we have #(coll)#{foo /[a-z]+/}#(end), which collects sequences of lower-case letters into foo, turning foo into a list of such strings, and if the input is 1234abcd-efgh.... ijk, then foo ends up with the list ("abcd" "efgh" "ijk"). This is why there is no explicit logic in the inner #(coll) to consume the separating commas: they are implicitly skipped.

How I can extract some numbers from a value in bash

I'm trying to separate numbers from a value in bash. For example, I have a text file with the following row:
2015 0212 0455 25.0 L -20.270 -70.950 44.0 GUC 4.6LGUC 1
I need to separate the number 0212 in the second column in order to get two numbers: num1=02 and num2=12. The same way for the number in the third column.
I'd like to find a generalized method with awk or sed to do this, because other files have this line:
2015 0212 0455 25.0 L -20.270 -70.950136.0 GUC 4.6LGUC 1
And in that case I also have to separate the value -70.950136.0 in two numbers: -70.950 and 136.0. In this case the first number always has the same length: -70.950, -69.320, -68.000, etc.
assuming fixed lenght records
sed 's/.\{37\}/& /;s/.\{29\}/& /;s/.\{21\}/& /;s/.\{12\}/& /;s/.\{7\}/& /' YourFile
adapt on your need by adding or removing s/.\{IndexOfCharInLine\}/& /;

Resources