Rearranging string with shell script - bash

Having the following string:
>db|version.x|name of entry 1
properties of entry1
>db|version.x|name of entry 2
properties of entry2
In the lines starting with ">", I would like to move the number to the front, separated by a space from the rest of the old text, leaving properties of the entry intact, so that it looks like this:
>1 db|version.x|name of entry 1
properties of entry1
>2 db|version.x|name of entry 2
properties of entry

awk '{sub(/^>/, ">"$NF" ")}1' File
>1 db|version.x|name of entry 1
properties of entry1
>2 db|version.x|name of entry 2
properties of entry2

Related

Combining columns of multiple files while matching the order based on a different column and adding 0 for missing values

I have multiple files I'd like to combine in a weird way.
Let's say this is one of my files:
1 group1
5 group5
6 group9
10 group3
2 group10
And this is another file:
0.1 group3
3 group5
52 group2
11 group4
8 group10
I'd like to combine these files into a new file such that I get:
File1 File2
group1 1 0
group2 0 52
group3 10 0.1
group4 0 11
group5 5 3
group9 6 0
group10 2 8
So:
- The values from a column are combined based on the annotation in another column.
- If the file is missing value for a given annotation, it gets "0".
- Filename becomes the header, the "annotations" become the row names in the new file.
Is there a way to do this using bash scripting (or some other convenient way)? I have a few thousands of these files so it's really not an option to do it manually...
Thank you very much!
Edit: I guess I could follow some steps like:
1) I have a list of all possible annotations. For every file (iterating over each), I could check if an annotation exists, if not, I could insert a new line to the file:
0 annotation
2) I could sort every file alphabetically
3) Then I could merge them all into one file (and somehow figure out the header thing here)
Does anyone have ideas for any of these steps?
Here is one way:
awk 'FNR==1 { ++n }
{ a[$2,n]=$1;b[$2] }
END {
for (c in b) {
for (i=1;i<=n;i++)
$i=((c,i) in a?a[c,i]:0)
$1=(c OFS $1)
print
}
}' file1 file2 file3 ...
Hash the 2nd field plus the index of file on the command line with the 1st field in an array, and keep unique annotations in another array as a reference so that we can loop through the first array.

Bash/Linux: Merge rows on match; add last field

I have a set of wireless stats from various branches in the organization:
branchA,171
branchA_guests,1020
branchB,2019
branchB_guests,3409
There are 2 entries for each branch: 1st is internal wifi usage, the next is guest usage. I'd like to merge them into a single total as we don't care whether it's guests or staff ...etc.
Desired output should be:
branchA,1191
branchB,5428
The input file has a header and some markdown so it has to identify a match, not just assume the next line is related --- though the data could be cleaned first, it is my opinion that a match would make this more bulletproof.
Here is my approach: Remove the _guests and tally:
# file: tally.awk
BEGIN {
FS = OFS = ","
}
{
sub(/_guests/, "", $1) # Remove _guests
stat[$1] += $2 # Tally
}
END {
for (branch in stat) {
printf "%s,%d\n", branch, stat[branch]
}
}
Running the script:
awk -f tally.awk data.txt
Notes
In the BEGIN pattern, I set the field separator (FS) and output field separator (OFS) both to a comma
Next, for each line, I remove the _guests part and tally the count
Finally, at the end of the file, I print out the counts

How do I concatenate lines from a text file into one big string?

I have an input file that looks like(without such big spaces between lines):
3 4
ATCGA
GACTTACA
AACTGTA
ATC
...and I need to concatenate all lines except for the first "3 4" line. Is there a simple solution? I've tried manipulating getline() somehow, but that has not worked for me.
Edit: The amount of lines will not be known initially, so it will have to be done recursively.
If your concate 2 lines in 1 line then you can use easily concate "+",
e.g:
String a = "WAQAR MUGHAL";
String b = "check";
System.out.println(a + b);
System.out.println("WAQAR MUGHAL" + "CHECK");
Output:
WAQAR MUGHAL check
WAQAR MUGHAL CHECK

Subtrack fields from duplicate lines

I have file with two columns. First column is string, second is positive number. in If first field (string) doesn't have double in file (so, first field is unique for the file), I want to copy that unique line to (let's say) result.txt. If first field does have duplicate in file, then I want to subtract second field (number) in those duplicated lines. By the way, file will have one duplicate max, no more than that. I want to save that also in result.txt. So, output file will have all lines with unique values of first field and lines in which first field is duplicated name and second is subtracted value from those duplicates. Files are not sorted. Here is example:
INPUT FILE:
hello 7
something 8
hey 9
hello 8
something 12
nathanforyou 23
OUTPUT FILE that I need (result.txt):
hello 1
something 4
hey 9
nathanforyou 23
I can't have negative numbers in ending file, so I have to subtract smaller number from bigger. What have I tried so far? All kinds of sort (I figure out how to find non-duplicate lines and put them in separate file, but choked on duplicate substraction), arrays in awk (I saved all lines in array, and do "for" clause... problem is that I don't know how to get second field from array element that is line) etc. By the way, problem is more complicated than I described (I have four fields, first two are the same and so on), but at the end - it comes to this.
$ cat tst.awk
{ val[$1,++cnt[$1]] = $2 }
END {
for (name in cnt) {
if ( cnt[name] == 1 ) {
print name, val[name,1]
}
else {
val1 = val[name,1]
val2 = val[name,2]
print name, (val1 > val2 ? val1 - val2 : val2 - val1)
}
}
}
$ awk -f tst.awk file
hey 9
hello 1
nathanforyou 23
something 4

Bash/Sed/Awk - parsing CSV from column ==x until column==x again

I've got a rather large set of CSV's that I need to parse. Most of it is extremely easy, however I've got some 'group' objects with embedded objects that I need to extract correctly.
The file looks something like this
Test_GroupA,Group,-,-,-,-,NodeA,,-,
,,,,,,NodeB,,,
,,,,,,NodeC,,,
,,,,,,NodeD,,,
,,,,,,NodeE,,,
Test_GroupB,Group,-,-,-,-,NodeA,,-,
,,,,,,NodeB,,,
,,,,,,NodeC,,,
,,,,,,NodeX,,,
,,,,,,NodeE,,,
,,,,,,NodeF,,,
So, as you can see, I need something along the lines of:
awk -F"[,|]" '{if ($2=="Group")
then - pseudo code->
print "create group",$1
print "add member in $7 to group found in $1 of first row"
continue until you reach next $2=="Group"), then loop
This is perplexing me greatly :)
Edit::
It seems a lot of the values are somewhat bogus and contain '-' when they're blank instead of just being ,,
Something like
sed 's/\,\-\,/\,\,/g'
should replace them I'd think, however I think I need a leading wildcard.
New example:
grp-ext-test-test,Group,-,-,-,-,Net_10.10.10.10,,-,
,,,,,,Net_10.101.10.10,,,
,,,,,,ws-ext-test-10.102,,,
,,,,,,ws-ext-test-10.103,,,
,,,,,,ws-ext-test-10.104,,,
,,,,,,ws-ext-test-10.105,,,
,,,,,,ws-ext-test-10.106,,,
,,,,,,ws-ext-test-10.107,,,
,,,,,,ws-ext-test-10.108,,,
,,,,,,ws-ext-test-10.108,,,
Running the new string on it only produces:
create group grp-ext-test-test
You could try something like this and adapt as required..
awk -F, '$2=="Group"{g=$1; print "create group",g}{print "add " $7 " to " g}' file
Output:
create group Test_GroupA
add NodeA to Test_GroupA
add NodeB to Test_GroupA
add NodeC to Test_GroupA
add NodeD to Test_GroupA
add NodeE to Test_GroupA
create group Test_GroupB
add NodeA to Test_GroupB
add NodeB to Test_GroupB
add NodeC to Test_GroupB
add NodeX to Test_GroupB
add NodeE to Test_GroupB
add NodeF to Test_GroupB
---edit---
To check if the contents of $7 are valid you could try something like:
awk -F, '$2=="Group"{ g=$1; print "create group",g } $7!="-"{print "add " $7 " to " g}' file

Resources