Removing the subscript/Index of an array in awk

Removing the subscript/Index of an array in awk - shell

I am using the awk's concept of storing the values as a subscript/Index of an array. Please have a look at the code below
stringVariable="hi,bye,cool.hot,how,see";
split(stringVariable,stringArray,",");
#This loop will iterate and stores the RIDs in the requestIds variable into an array
for(tr=1;tr<=length(stringArray);tr++)
{
Count++;
referenceIdArray[stringArray[tr]]++;
}
So in my referenceId array I will be having hi,bye,cool,hot,how,see
let me consider a sample file which has the following values
hi
bye
gone
My aim is to get the values from the file and to match with the array declared previously and if any of the values matches print the value from a file
awk script
awk '{BEGIN (Array loading done previously)} {if($0 in referenceIdArray) {print $0}}'
So this will give me the desired result. But assuming that the "hi" will appear only once in an array and hence when the action block finds the value, the value should be printed and also the corresponding entry in the array which is referenceIdArray["hi"] should also be removed in order to make the search effecient. Since they are stored as subscript I am not sure how to remove the entry. Any suggesions regarding this. Thank you.

You can remove an individual element of an array using the delete statement:
delete array[index]
ref: http://www.math.utah.edu/docs/info/gawk_12.html

Related

how to compare array element of one array with elements in another array in bash?

I have stored some strings in array1, lets say
array1[0]=apple
array2[1]=orange
and array2 contains
array2[0]=apple
array2[1]=mango
I want to loop through each element and check if they match. I tried using this condition inside loop but it didnot work
if [ "$array[i]" = "$array2[j]" ]

To access the elements of an array in bash you have to use ${array[i]} instead of just $array[i]. Because the [ cannot normally be part of a variable name, bash interprets $array[i] as ${array} followed by a literal [i].
By the way: https://www.shellcheck.net/ would have found this error.

Extract 2 fields from string with search

I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.

Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777

Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".

Sort numberic in a string of text

I tried some sort examble but can't find the way to solve this.I think i should find the right seperator and then sort it by numberic but it don't work as my desire.
This is my file:
abc_bla_bla_bla_reg0_bla_reg_1_0
abc_bla_bla_bla_reg0_bla_reg_5_0
abc_bla_bla_bla_reg0_bla_reg_2_0
abc_bla_bla_bla_reg0_bla_reg_10_0
abc_bla_bla_bla_reg0_bla_reg_15_0
abc_bla_bla_bla_reg2_bla_reg_15_0
abc_bla_bla_bla_reg2_bla_reg_9_0
abc_bla_bla_bla_reg2_bla_reg_7_0
abc_bla_bla_bla_reg3_bla_reg_26_0
abc_bla_bla_bla_reg3_bla_reg_3_0
abc_bla_bla_bla_reg3_bla_reg_5_0
And this is my desire result:
abc_bla_bla_bla_reg0_bla_reg_1_0
abc_bla_bla_bla_reg0_bla_reg_2_0
abc_bla_bla_bla_reg0_bla_reg_5_0
abc_bla_bla_bla_reg0_bla_reg_10_0
abc_bla_bla_bla_reg0_bla_reg_15_0
abc_bla_bla_bla_reg2_bla_reg_7_0
abc_bla_bla_bla_reg2_bla_reg_9_0
abc_bla_bla_bla_reg2_bla_reg_15_0
abc_bla_bla_bla_reg3_bla_reg_3_0
abc_bla_bla_bla_reg3_bla_reg_5_0
abc_bla_bla_bla_reg3_bla_reg_26_0

$ sort -t_ -k5,5 -k8,8n file
abc_bla_bla_bla_reg0_bla_reg_1_0
abc_bla_bla_bla_reg0_bla_reg_2_0
abc_bla_bla_bla_reg0_bla_reg_5_0
abc_bla_bla_bla_reg0_bla_reg_10_0
abc_bla_bla_bla_reg0_bla_reg_15_0
abc_bla_bla_bla_reg2_bla_reg_7_0
abc_bla_bla_bla_reg2_bla_reg_9_0
abc_bla_bla_bla_reg2_bla_reg_15_0
abc_bla_bla_bla_reg3_bla_reg_3_0
abc_bla_bla_bla_reg3_bla_reg_5_0
abc_bla_bla_bla_reg3_bla_reg_26_0
That may or may not produce the output you expect if the regN value in the 5th column can include 2-digit numbers.

Using awk
$awk -F"_" 'function print_array(arr,max){ for(i=1; i<=max; i++) if(a[i]){print a[i], a[i]="";} } key==$5{a[$8]=$0; key=$5; max=$8>max?$8:max} key!=$5{print_array(a,max); key=$5; a[$8]=$0; max=$8} END{print_array(a,max)}' file
Output:
abc_bla_bla_bla_reg0_bla_reg_1_0
abc_bla_bla_bla_reg0_bla_reg_2_0
abc_bla_bla_bla_reg0_bla_reg_5_0
abc_bla_bla_bla_reg0_bla_reg_10_0
abc_bla_bla_bla_reg0_bla_reg_15_0
abc_bla_bla_bla_reg2_bla_reg_7_0
abc_bla_bla_bla_reg2_bla_reg_9_0
abc_bla_bla_bla_reg2_bla_reg_15_0
abc_bla_bla_bla_reg3_bla_reg_3_0
abc_bla_bla_bla_reg3_bla_reg_5_0
abc_bla_bla_bla_reg3_bla_reg_26_0
Explanation:
awk -F"_" '
function print_array(arr,max) #Simply prints the hashed array from i=1 to max value array is holding
{
for(i=1; i<=max; i++)
if(a[i])
{print a[i], a[i]="";}
}
key==$5{a[$8]=$0; max=$8>max?$8:max} #Key here denotes the 5th field for eg. reg0 in line one. Initially key is null and it will satisfy the condition mentioned below i.e key!=$5. If the 5th field matches with the key set in previous line then push the record into array where the index in array will be the value at field 8 based on which you want to sort your results.
key!=$5{print_array(a,max); key=$5; a[$8]=$0; max=$8} #If key doesn't matches the 5th line it signifies we have a new record set and before proceeding further print the array we stored for previous record set based on 5th field.
END{print_array(a,max) #To print the last record set
}' file
key==$5{a[$8]=$0; max=$8>max?$8:max} : Key here denotes the 5th field for eg. reg0 in line one. Initially key is null and it will satisfy the condition mentioned below i.e key!=$5. If the 5th field $5 matches with the key set in previous line then push the record into array where the index in array will be the value at field 8 based on which you want to sort your results. This will work irrespective of the number of digits in $8.
key!=$5{print_array(a,max); key=$5; a[$8]=$0; max=$8} If key doesn't matches the 5th line it signifies we have a new record set and before proceeding further print the array we stored for previous record set based on 5th field.
END{print_array(a,max) Just to print the last record set

sort -V file
-V, --version-sort
natural sort of (version) numbers within text

bash: identifying the first value list that also exists in another list

I have been trying to come up with a nice way in BASH to find the first entry in list A that also exists in list B. Where A and B are in separate files.
A B
1024dbeb 8e450d71
7e474d46 8e450d71
1126daeb 1124dae9
7e474d46 7e474d46
1124dae9 3217a53b
In the example above, 7e474d46 is the first entry in A also appearing in B, So I would return 7e474d46.
Note: A can be millions of entries, and B can be around 300.

awk is your friend.
awk 'NR==FNR{a[$1]++;next}{if(a[$1]>=1){print $1;exit}}' file2 file1
7e474d46
Note : Check the [ previous version ] of this answer too which assumed that values are listed in a single file as two columns. This one is wrote after you have clarified that values are fed as two files in [ this ] comment.

Though few points are not clear, like how about if a number in A list is coming 2 times or more?(IN your given example itself d46 comes 2 times). Considering that you need all the line numbers of list A which are present in List B, then following will help you in same.
awk '{col1[$1]=col1[$1]?col1[$1]","FNR:FNR;col2[$2];} END{for(i in col1){if(i in col2){print col1[i],i}}}' Input_file
OR(NON-one liner form of above solution)
awk '{
col1[$1]=col1[$1]?col1[$1]","FNR:FNR;
col2[$2];
}
END{
for(i in col1){
if(i in col2){
print col1[i],i
}
}
}
' Input_file
Above code will provide following output.
3,5 7e474d46
6 1124dae9
creating array col1 here whose index is first field and array col2 whose index is $2. col1's value is current line's value and it will be concatenating it's own value too. Now in END section of awk traversing through col1 array and then checking if any value of col1 is present in array col2 too, if yes then printing col1's value and it's index.

If you have GNU grep, you can try this:
grep -m 1 -f B A

sorting text lines Google Apps Script

Sorry for this extreme beginner question. I have a string variable originaltext containing some multiline text. I can convert it into an array of lines like so:
lines = originaltext.split("\n");
But I need help sorting this array. This DOES NOT work:
lines.sort;
The array remains unsorted.
And an associated question. Assuming I can sort my array somehow, how do I then convert it back to a single variable with no separators?

Your only issue is a small one - sort is actually a method, so you need to call lines.sort(). In order to join the elements together, you can use the join() method:
var originaltext = "This\n\is\na\nline";
lines = originaltext.split("\n");
lines.sort();
joined = lines.join("");

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Removing the subscript/Index of an array in awk - shell

You can remove an individual element of an array using the delete statement: delete array[index] ref: http://www.math.utah.edu/docs/info/gawk_12.html

Related

how to compare array element of one array with elements in another array in bash?

Extract 2 fields from string with search

Sort numberic in a string of text

bash: identifying the first value list that also exists in another list

sorting text lines Google Apps Script

Categories

Resources