Related
I have a probability values returned from a neural network. The size of list returned is 50,257, so there are a lot of values. The list looks like[-126.32508850097656, -126.77257537841797, -127.69950866699219, -129.98387145996094, ......]
I need the top K values and their indices. So I converted the list to a Map:
final temp = outputLogits.asMap();
and then sorted them using:
var sortedKeys = temp.keys.toList(growable: false)
..sort((k1, k2) => temp[k2].compareTo(temp[k1]));
It produces the desired result, but the issue is that it takes way too long.
Am I doing this wrong? is there a more efficient way to get the same result?
further details:
The unsorted list looks like this:
[-126.32508850097656, -126.77257537841797, -127.69950866699219, -129.98387145996094, -128.03782653808594, -128.08395385742188, -126.33218383789062, -126.6927261352539, -127.6688232421875, -126.58303833007812, -127.32843017578125, -126.1390380859375, -126.54962158203125, -126.38087463378906, -127.82595825195312, -126.3281021118164, -125.81211853027344, -126.20887756347656, -125.95697784423828, -126.07755279541016, -126.35894012451172, -126.70021057128906, -127.03215026855469, -126.67304992675781, -126.92938995361328, -126.64434814453125, -128.20814514160156, -127.24195861816406, -128.25816345214844, -126.73397827148438, -127.62574768066406, -128.8334197998047, -124.46258544921875, -126.03125762939453, -126.18477630615234, -125.85749053955078, -126.11980438232422, -125.64325714111328, -126.06704711914062, -126.35154724121094, -124.83910369873047, -126.90412902832031, -126.02999877929688, -126.60641479492188, -125.97348022460938, -126.56074523925781, -126.58230590820312, -126.49268341064453, -128.5759735107422,
I need to find the top 40 probabilities, and their index and I achieve this using:
final temp = outputLogits.asMap(); // converts the above list to a Map<int, double>
// sort the map values descending
// then take the largest 40 values
var sortedKeys = temp.keys.toList(growable: false)
..sort((k1, k2) => temp[k2].compareTo(temp[k1]));
final Map<int, double> sortedMap = {};
for (final key in sortedKeys.take(40)) {
sortedMap[key] = temp[key];
}
after sorting this is what sortedMap looks like:
{198: -117.52079772949219, 383: -118.29053497314453, 887: -119.25838470458984, 1119: -119.66973876953125, 632: -119.74752807617188, 628: -119.87970733642578, 554: -119.88958740234375, 1081: -119.9058837890625, 843: -120.10496520996094, 317: -120.21776580810547, 2102: -120.23406982421875, 770: -120.31946563720703, 2293: -120.40717315673828, 1649: -120.44376373291016, 366: -120.47624969482422, 2080: -120.4794921875, 2735: -120.74302673339844, 3244: -120.89102935791016, 2893: -120.97686004638672, 314: -120.98660278320312, 5334: -121.00469970703125, 1318: -121.03706359863281, 679: -121.12769317626953, 1881: -121.14120483398438, 1629: -121.18737030029297, 50256: -121.19244384765625, 357: -121.22344207763672, 1550: -121.27531433105469, 775: -121.31112670898438, 7486: -121.3316421508789, 921: -121.37474060058594, 1114: -121.43411254882812, 2312: -121.43602752685547, 1675: -121.51364135742188, 4874: -121.5697021484375, 1867: -121.57322692871094, 1439: -121.60330963134766, 8989: -121.60348510742188, 1320: -121.604621
I need the top value and their respective index thats why converted to Map
Try the following:
void main() {
final temp = [
-126.32508850097656,
-126.77257537841797,
-127.69950866699219,
-129.98387145996094,
-128.03782653808594,
-128.08395385742188,
-126.33218383789062,
-126.6927261352539,
-127.6688232421875,
-126.58303833007812,
-127.32843017578125,
];
final filteredLogitsWithIndexes = Map.fromEntries(
(temp.asMap().entries.toList(growable: false)
..sort((e1, e2) => e2.value.compareTo(e1.value)))
.take(5));
print(filteredLogitsWithIndexes);
// {0: -126.32508850097656, 6: -126.33218383789062, 9: -126.58303833007812,
// 7: -126.6927261352539, 1: -126.77257537841797}
}
This should save you a lot of time since we don't need to make a lookup in the map for each comparison (since a MapEntry contains both key and value).
From a file i import lines. In this line an (escaped) string is part of the line:
DP,0,"021",257
DP,1,"022",257
DP,2,"023",513
DP,3,"024",513
DP,4,"025",1025
DP,5,"026",1025
DP,6,"081",257
DP,7,"082",257
DP,8,"083",513
DP,9,"084",513
DP,10,"085",1025
DP,11,"086",1025
DP,12,"087",1025
DP,13,"091",257
DP,14,"092",513
DP,15,"093",1025
IS,0,"FIX",0
IS,1,"KARIN02",0
IS,2,"KARUIT02",0
IS,3,"KARIN02HOV",0
IS,4,"KARUIT02HOV",0
IS,5,"KARIN08",0
IS,6,"KARUIT08",0
IS,7,"KARIN08HOV",0
IS,8,"KARUIT08HOV",0
IS,9,"KARIN09",0
IS,10,"KARUIT09",0
IS,11,"KARIN09HOV",0
IS,12,"KARUIT09HOV",0
IS,13,"KARIN10",0
IS,14,"KARUIT10",0
IS,15,"KARIN10HOV",0
I get the following Objects (if DP) :
index - parts1 (int)
name - parts2 (string)
ref - parts3 (int)
I tried using REGEX to replace the excape-sequence from the lines but to no effect
#name_to_ID = {}
kruising = 2007
File.open(cfgFile).each{|line|
parts = line.split(",")
if parts[0]=="DP"
index = parts[1].to_i
hex = index.to_s(16).upcase.rjust(2, '0')
cname = parts[2].to_s
tname = cname.gsub('\\"','')
p "cname= #{cname} (#{cname.length})"
p "tname= #{tname} (#{tname.length})"
p cname == tname
#name_to_ID[tname] = kruising.to_s + "-" + hex.to_s
end
}
teststring = "021"
p #name_to_ID[teststring]
> "021" (5)
> "021" (5)
> true
> nil
The problem came to light when calling from another string reference (length3)
hash[key] isnt equal as string "021" ( length 5) is not string 021 ( length 3)
any method that actually replaces the chars i need?
EDIT: I used
cname.each_char{|c|
p c
}
> "\""
> "0"
> "2"
> "1"
> "\""
EDIT: requested outcome update:
# Current output:
#name_to_ID["021"] = 2007-00 "021".length = 5
#name_to_ID["022"] = 2007-01 "022".length = 5
#name_to_ID["081"] = 2007-06 "081".length = 5
#name_to_ID["082"] = 2007-07 "082".length = 5
#name_to_ID["091"] = 2007-0D "091".length = 5
#name_to_ID["101"] = 2007-10 "101".length = 5
# -------------
# Expected output:
#name_to_ID["021"] = 2007-00 "021".length = 3
#name_to_ID["022"] = 2007-01 "022".length = 3
#name_to_ID["081"] = 2007-06 "081".length = 3
#name_to_ID["082"] = 2007-07 "082".length = 3
#name_to_ID["091"] = 2007-0D "091".length = 3
#name_to_ID["101"] = 2007-10 "101".length = 3
Your problem is you don't know the correct character in your string. It might not be the same character when printing it.
Try parts[2].to_s.bytes to check exactly what is the character code of that unexpected character. For example:
> "asd".bytes
=> [205, 184, 97, 115, 100]
Alternatively, you can delete the first and the last characters, if you are sure that every part of the string has the same format:
cname = parts[2].to_s[1..-2]
Or you can remove all special characters in the string if you know that the string will not contain any special character
cname = parts[2].to_s.gsub(/[^0-9A-Za-z]/, '')
please advise how to find and output cust_JiraTaskId. I need the value of cust_JiraTaskId based on the max number of inside node . In this example it'll be 111111.
I managed to find the max externalCode and now i need cust_JiraTaskId value.
<SFOData.cust_JiraReplication>
<cust_HRISId>J000009</cust_HRISId>
<externalCode>7</externalCode>
<cust_JiraTask>
<externalCode>3</externalCode>
<cust_JiraTaskId>12345</cust_JiraTaskId>
</cust_JiraTask>
<cust_JiraTask>
<externalCode>5</externalCode>
<cust_JiraTaskId>111111</cust_JiraTaskId>
</cust_JiraTask>
</SFOData.cust_JiraReplication>
My script is below
// Create an XPath statement to search for the
element or elements you care about:
XPath x;
x = XPath.newInstance("//cust_JiraTask/externalCode");
myElements = x.selectNodes(doc);
String maxvalue = "";
for (Element myElement : myElements) {
if (myElement.getValue() > maxvalue)
{
maxvalue = myElement.getValue();
}
}
props.setProperty("document.dynamic.userdefined.externalCode", maxvalue);
thanks for help.
This works for me with Groovy 2.4.5:
def xml = """
<SFOData.cust_JiraReplication>
<cust_HRISId>J000009</cust_HRISId>
<externalCode>7</externalCode>
<cust_JiraTask>
<externalCode>3</externalCode>
<cust_JiraTaskId>12345</cust_JiraTaskId>
</cust_JiraTask>
<cust_JiraTask>
<externalCode>5</externalCode>
<cust_JiraTaskId>111111</cust_JiraTaskId>
</cust_JiraTask>
</SFOData.cust_JiraReplication>
"""
def xs = new XmlSlurper().parseText(xml)
def nodes = xs.cust_JiraTask.cust_JiraTaskId
def maxNode = nodes.max { it.text() as int }
assert 111111 == maxNode.text() as int
I hava a log file like bellow:
5082 //open_api/user/get_user_info
5074 /user/get_user_idCard_info?passportId=YRD1412538757&viewSource=02
5029 /user/getuserinfo?passportId=YRD1412538757
4706 /user/getuserinfo?passportId=YRD1507000030516
4611 /user/get_user_idCard_info?passportId=YRD1507000030516&viewSource=02
4040 /salesloan/update_draw_bank
The output should be like:
5082 //open_api/user/get_user_info
9685 /user/get_user_idCard_info
9735 /user/getuserinfo
4040 /salesloan/update_draw_bank
The number before each line is the number this url is called. Now I want to count how many times each url(without params for get http request)is requested, for example as above I only want to count the times the '/repay/query_need_repay_data.action' url was called. Now I am using java to filter and process the lines, but for a 200M bytes file it already took 4 hours and still working, I want to know in which way could I get the work done quickly?
Java codes:
public static void main(String[] args) throws IOException {
String source = "/Users/leo/logs/p2pservice/access/a2.output";
String target = "/Users/leo/logs/p2pservice/access/targetUrls";
File targetFile = new File(target);
String splinter = "\\?";
List<String> strings = Files.readLines(new File(source), Charsets.UTF_8);
for (String string : strings) {
if (string.contains("?")) {
String[] split = string.split(splinter);
Files.append(string.split(splinter)[0].toString() + "\n", targetFile, Charsets.UTF_8);
} else {
Files.append(string + "\n", targetFile, Charsets.UTF_8);
}
}
}
Thanks in advance.
awk to the rescue!
$ awk -F'[ ?]' '{a[$2]+=$1} END{for(k in a) print a[k], k}' file
14341 /repay/query_need_repay_data.action
I have a csv file which I have loaded into hadoop. Data sample is below.
name | shop | balance
tom | shop a | -500
john | shop b | 200
jane | shop c | 5000
Results:
bad 1
normal 1
wealthy 1
I have to get the balance for each person and then put them into groups(bad(<0), normal(1 to 500), good(>500)
I'm not 100% sure how to put the groups into mapReduce. Do I put it in the reducer? or mapper?
Splitting the csv file(mapper):
String[] tokens = value.toString().split(",");
Sting balance = tokens[3];
Creating groups:
String[] category = new String[3];
category[0] = "Bad"
category[1] = "Normal"
category[2] = "Good"
I also have this if/else statement:
if (bal =< 500){
//put into cat 0
} else if ( bal >= 501 && bal <=1500){
// put into cat 1
} else {
//put into cat 2
}
Thanks in advance.
A simple way to implement this would be:
Map:
map() {
if (bal <= 0) { //or 500, or whatever
emit (bad, 1);
} else if (bal <= 500) { // or 1500, or whatever
emit (normal, 1);
} else {
emit (good, 1);
}
}
Reduce (and combiner, as well):
reduce(key, values) {
int count = 0;
while (values.hasNext()) {
count += values.next();
}
emit (key, count);
}
It's exactly the same as the word count example, where, in your case, you have three words (categories): bad, normal, good.