I have a huge file to read whose structure is:
[...]
(0,0,0,0,0): 5.00634e-33, 5.59393e-33, 6.24691e-33, 7.29338e-33,
(0,0,0,0,4): 7.77607e-33, 8.95879e-33, 9.65316e-33, 1.07434e-32,
(0,0,0,0,8): 1.20824e-32, 1.34983e-32, 1.49877e-32, 1.73061e-32,
(0,0,0,0,12): 1.919e-32, 2.15391e-32, 2.3996e-32, 2.67899e-32,
[...]
I'm interested in reading the value after ":", which format should I use in the read statement if I use Fortran90?
I've tried with
read(1,'("(",I6,",",I6,",",I6,",",I6,",",I6,"):",F10.4,F10.4,F10.4,F10.4)')idx1,idx2,idx3,idx4,idx5,dummy1,dummy2,dummy3,dummy4
But I got a forrtl: severe (64): input conversion error
Since it appears that the items don't line up in columns this is tricky to do with formats. I'd approach it this way:
read (55, '(A)') string
colon_pos = index (string, ":")
read (string (colon_pos+1:len_string), * ) real1, real2, real3, real4
read each line into a string, locate the colon, then use list-directed IO to process the numeric values in the string after the colon.
Related
I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.
Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777
Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".
I'm trying to to read from a TXT file and do some calculation and write it back to another TXT file but when I read the character it changes to ASCII number (ex : '1' convert to 50) and when I try to write it in another file it's the ASCII number. How can I change it to that character I want?
int wf=FileOpen("wf.txt",FILE_WRITE|FILE_ANSI|FILE_TXT);
int rf=FileOpen("rf.txt",FILE_READ|FILE_ANSI|FILE_TXT);
str_size=FileReadInteger(rf,INT_VALUE); //the TXT I read is 1234
str=FileReadString(rf,str_size);
StringToCharArray(str,data1,0,StringLen(str));
RandonNum[0]= str[1];
RandonNum[1]= str[2];
RandonNum[2]= str[3];
FileWrite(wf,str[1],str[2],str[3]); //the TXT I write is 505152
FileReadInteger() is reserved for binary type files.
Unfortunately this is not explicitly stated in the documentation.
Use FileReadNumber() to read a number from txt file. It will return the number as a double, but it can be cast to an integer using a type cast (int)double_value.
I have specific questions for my project
input = "3d6"
I want to convert this string some parts to integer. For instance I want to use input[0] like integer.
How can I do this?
There's two problems here:
How to convert a string to an integer
The most straightforward method is the Atoi (ASCII to integer) function in the strconv package., which will take a string of numeric characters and coerce them into an integer for you.
How to extract meaningful components of a known string pattern
In order to use strconv.Atoi, we need the numeric characters of the input by themselves. There's lots of ways to slice and dice a string.
You can just grab the first and last characters directly - input[:1] and input[2:] are the ticket.
You could split the string into two strings on the character "d". Look at the split method, a member of the strings package.
For more complex problems in this space, regular expressions are used. They're a way to define a pattern the computer can look for. For example, the regular expression ^x(\d+)$ will match on any string that starts with the character x and is followed by one or more numeric characters. It will provide direct access to the numeric characters it found by themselves.
Go has first class support for regular expressions via its regexp package.
For example,
package main
import (
"fmt"
)
func main() {
input := "3d6"
i := int(input[0] - '0')
fmt.Println(i)
}
Playground: https://play.golang.org/p/061miKcXdIF
Output:
3
In this project the user can type in a text(maximum 140 characters).
so for this limitation I once used getline():
string text;
getline(cin, text);
text = text.substr(1, 140);
but in this case the result of cout << text << endl; is an empty string.
so I used cin.get() like:
cin.get(text, 140);
this time I get this error: no matching function for call to ‘std::basic_istream::get(std::__cxx11::string&, int)’
note that I have included <iostream>
so the question is how can I fix this why is this happening?
Your first approach is sound with one correction - you need to use
text = text.substr(0, 140);
instead of text = text.substr(1, 140);. Containers (which includes a string) in C/C++ start with index 0 and you are requesting the string to be trimmed from position 1. This is perfectly fine, but if the string happens to be only one character long, calling text.substr(1, 140); will not necessarily cause the program to crash, but will not end up in the desired output either.
According to this source, substr will throw an out of range exception if called with starting position larger than string length. In case of a one character string, position 1 would be equal to string length, but the return value is not meaningful (in fact, it may even be an undefined behavior but I cannot find a confirmation of this statement - in yours and my case, calling it returns an empty string). I recommend you test it yourself in the interactive coding section following the link above.
Your second approach tried to pass a string to a function that expected C-style character arrays. Again, more can be found here. Like the error said, the compiler couldn't find a matching function because the argument was a string and not the char array. Some functions will perform a conversion of string to char, but this is not the case here. You could convert the string to char array yourself, as for instance described in this post, but the first approach is much more in line with C++ practices.
Last note - currently you're only reading a single line of input, I assume you will want to change that.
I have a multiple sequence alignment file in which the lines from the different sequences are interspersed, as in the format outputed by clustal and other popular multiple sequence alignment tools. It looks like this:
TGFb3_human_used_for_docking ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPY
tr|B3KVH9|B3KVH9_HUMAN ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPY
tr|G3UBH9|G3UBH9_LOXAF ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPY
tr|G3WTJ4|G3WTJ4_SARHA ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPY
TGFb3_human_used_for_docking LRSADTTHST-
tr|B3KVH9|B3KVH9_HUMAN LRSADTTHST-
tr|G3UBH9|G3UBH9_LOXAF LRSTDTTHST-
tr|G3WTJ4|G3WTJ4_SARHA LRSADTTHST-
Each line begins with a sequence identifier, and then a sequence of characters (in this case describing the amino acid sequence of a protein). Each sequence is split into several lines, so you see that the first sequence (with ID TGFb3_human_used_for_docking) has two lines. I want to convert this to a format in which each sequence has a single line, like this:
TGFb3_human_used_for_docking ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSADTTHST-
tr|B3KVH9|B3KVH9_HUMAN ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSADTTHST-
tr|G3UBH9|G3UBH9_LOXAF ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSTDTTHST-
tr|G3WTJ4|G3WTJ4_SARHA ALDTNYCFRNLEENCCVRPLYIDFRQDLGWKWVHEPKGYYANFCSGPCPYLRSADTTHST-
(In this particular examples the sequences are almost identical, but in general they aren't!)
How can I convert from multi-line multiple sequence alignment format to single-line?
Looks like you need to write a script of some sort to achieve this. Here's a quick example I wrote in Python. It won't line the white-space up prettily like in your example (if you care about that, you'll have to mess around with formatting), but it gets the rest of the job done
#Create a dictionary to accumulate full sequences
full_sequences = {}
#Loop through original file (replace test.txt with your file name)
#and add each line to the appropriate dictionary entry
with open("test.txt") as infile:
for line in infile:
line = [element.strip() for element in line.split()]
if len(line) < 2:
continue
full_sequences[line[0]] = full_sequences.get(line[0], "") + line[1]
#Now loop through the dictionary and write each entry as a single line
outstr = ""
with open("test.txt", "w") as outfile:
for seq in full_sequences:
outstr += seq + "\t\t" + full_sequences[seq] + "\n"
outfile.write(outstr)