Parse log File and store value in object in Ruby - ruby

I have a log file which generates following data:
2015-07-06 11:07:24 +0522 [ERROR]
2015-07-06 11:07:29 +0522 [ERROR] index=healthe-int-legacy host=kdatamap.abc.com com.rp.keplar.collector.CollectorException: Could not process additional data, connection lost to data collector service
I want to store data in different section like date, time, index value and error related information like 'Could not process additional data, connection lost to data collector service' into database. How to parse so that I can easily store in DB? Please guide me.

You want to read up the manual on the really powerful File and String classes.
Consider this rather quick hack:
aFile = File.new("/your/file.dat")
aFile.each_line { |line|
arr = line.split
print "date = " + arr[0] + "\n"
print "time = " + arr[1] + "\n"
print "index = " + arr[4].split('=')[1]
}
It does not take into accout that the file might not exist or that the lines might be aligned differently. Have a look at '''regular''' expressions for implementation of a more robust (but unfortunately more difficult to read) matching algorighm.
Basic I/O is described at http://ruby-doc.com/docs/ProgrammingRuby/html/tut_io.html.

Related

How to use entrezpy and Biopython Entrez libraries to access ClinVar data from genomic position of variant

[Disclaimer: I have published this question 3 weeks ago in biostars, with no answers yet. I really would like to get some ideas/discussion to find a solution, so I post also here.
biostars post link: https://www.biostars.org/p/447413/]
For one of my projects of my PhD, I would like to access all variants, found in ClinVar db, that are in the same genomic position as the variant in each row of the input GSVar file. The language constraint is Python.
Up to now I have used entrezpy module: entrezpy.esearch.esearcher. Please see more for entrezpy at: https://entrezpy.readthedocs.io/en/master/
From the entrezpy docs I have followed this guide to access UIDs using the genomic position of a variant: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html in code:
# first get UIDs for clinvar records of the same position
# credits: credits: https://entrezpy.readthedocs.io/en/master/tutorials/esearch/esearch_uids.html
chr = variants["chr"].split("chr")[1]
start, end = str(variants["start"]), str(variants["end"])
es = entrezpy.esearch.esearcher.Esearcher('esearcher', self.entrez_email)
genomic_pos = chr + "[chr]" + " AND " + start + ":" + end # + "[chrpos37]"
entrez_query = es.inquire(
{'db': 'clinvar',
'term': genomic_pos,
'retmax': 100000,
'retstart': 0,
'rettype': 'uilist'}) # 'usehistory': False
entrez_uids = entrez_query.get_result().uids
Then I have used Entrez from BioPython to get the available ClinVar records:
# process each VariationArchive of each UID
handle = Entrez.efetch(db='clinvar', id=current_entrez_uids, rettype='vcv')
clinvar_records = {}
tree = ET.parse(handle)
root = tree.getroot()
This approach is working. However, I have two main drawbacks:
entrezpy fulls up my log file recording all interaction with Entrez making the log file too big to be read by the hospital collaborator, who is variant curator.
entrezpy function, entrez_query.get_result().uids, will return all UIDs retrieved so far from all the requests (say a request for each variant in GSvar), thus this space inefficient retrieval. That is the entrez_uids list will quickly grow a lot as I process all variants from a GSVar file. The simple solution that I have implenented is to check which UIDs are new from the current request and then keep only those for Entrez.fetch(). However, I still need to keep all seen UIDs, from previous variants in order to be able to know which is the new UIDs. I do this in code by:
# first snippet's first lines go here
entrez_uids = entrez_query.get_result().uids
current_entrez_uids = [uid for uid in entrez_uids if uid not in self.all_entrez_uids_gsvar_file]
self.all_entrez_uids_gsvar_file += current_entrez_uids
Does anyone have suggestion(s) on how to address these two presented drawbacks?

[Snowflake-jdbc]It hangs when get info from resetset object of connection.getMetadata().getColumns(...)

I try to test the jdbc connection of snowflake with codes below
Connection conn = .......
.......
ResultSet rs = conn.getMetaData().getColumns(**null**, "PUBLIC", "TAB1", null); // 1. set parameters to get metadata of table TAB1
while (rs.next()) { // 2. It hangs here if the first parameter is null in above liune; otherwise(set the corrent db name), it works fine
System.out.println( "precision:" + rs.getInt(7)
+ ",col type name:" + rs.getString(6)
+ ",col type:" + rs.getInt(5)
+ ",col name:" + rs.getString(4)
+ ",CHAR_OCTET_LENGTH:" + rs.getInt(16)
+ ",buf LENGTH:" + rs.getString(8)
+ ",SCALE:" + rs.getInt(9));
}
.......
I debug the codes above in Intellij IDEA, and find that the debugger can't get the details of the object, it always shows "Evaluating..."
The JDBC driver I used is snowflake-jdbc-3.12.5.jar
Is it a bug?
When the catalog (database) argument is null, the JDBC code effectively runs the following SQL, which you can verify in your Snowflake account's Query History UIs/Views:
show columns in account;
This is an expensive metadata query to run due to no filters and the wide requested breadth (columns across the entire account).
Depending on how many databases exist in your organization's account, it may require several minutes or upto an hour of execution to return back results, which explains the seeming "hang". On a simple test with about 50k+ tables dispersed across 100+ of databases and schemas, this took at least 15 minutes to return back results.
I debug the codes above in Intellij IDEA, and find that the debugger can't get the details of the object, it always shows "Evaluating..."
This may be a weirdness with your IDE, but in a pinch you can use the Dump Threads (Ctrl + Escape, or Ctrl + Break) option in IDEA to provide a single captured thread dump view. This should help show that the JDBC client thread isn't hanging (as in, its not locked or starved), it is only waiting on the server to send back results.
There is no issue with the 3.12.5 jar.I just tested the same version in Eclipse, I can inspect all the objects . Could be an issue with your IDE.
ResultSet columns = metaData.getColumns(null, null, "TESTTABLE123",null);
while (columns.next()){
System.out.print("Column name and size: "+columns.getString("COLUMN_NAME"));
System.out.print("("+columns.getInt("COLUMN_SIZE")+")");
System.out.println(" ");
System.out.println("COLUMN_DEF : "+columns.getString("COLUMN_DEF"));
System.out.println("Ordinal position: "+columns.getInt("ORDINAL_POSITION"));
System.out.println("Catalog: "+columns.getString("TABLE_CAT"));
System.out.println("Data type (integer value): "+columns.getInt("DATA_TYPE"));
System.out.println("Data type name: "+columns.getString("TYPE_NAME"));
System.out.println(" ");
}

Strict searching against two different files

I have two questions regarding the following code:
import subprocess
macSource1 = (r"\\Server\path\name\here\dhcp-dump.txt")
macSource2 = (r"\\Server\path\name\here\dhcp-dump-ops.txt")
with open (r"specific-pcs.txt") as file:
line = []
for line in file:
pcName = line.strip().upper()
with open (macSource1) as source1, open (macSource2) as source2:
items = []
for items in source1:
if pcName in items:
items_split = items.rstrip("\n").split('\t')
ip = items_split[0]
mac = items_split[4]
mac2 = ':'.join(s.encode('hex') for s in mac.decode('hex')).lower() # Puts the :'s between the pairs.
print mac2
print pcName
print ip
Firstly, as you can see, the script is searching for the contents of "specific-pcs.txt" against the contents of macSource1 to get various details. How do I get it to search against BOTH macSource1 & 2 (as the details could be in either file)??
And secondly, I need to have a stricter matching process as at the moment a machine called 'itroom02' will not only find it's own details, but also provide the details for another machine called '2nd-itroom02'. How would I get that?
Many thanks for your assistance in advance!
Chris.
Perhaps you should restructure it a bit more like this:
macSources = [ r"\\Server\path\name\here\dhcp-dump.txt",
r"\\Server\path\name\here\dhcp-dump-ops.txt" ]
with open (r"specific-pcs.txt") as file:
for line in file:
# ....
for target in macSources:
with open (target) as source:
for items in source:
# ....
There's no need to do e.g. line = [] immediately before you do for line in ...:.
As far as the "stricter matching" goes, since you don't give examples of the format of your files, I can only guess - but you might want to try something like if items_split[1] == pcName: after you've done the split, instead of the if pcName in items: before you split (assuming the name is in the second column - adjust accordingly if not).

Print current frame during command line render?

Is there a way to basically print my own output during a command line render?
Let's say I don't need/want all the other output that maya spits out by default, I know you can change the verbosity level, but there's very specific things I'd like to output but I can't figure it out. I currently render out the verbosity output to file, so I wanted to print in the terminal (I'm using MAC) the frame that the render is currently up to.
This may just be simple minded, but here's what I tried:
Render -preFrame "print `currentTime -q`;" -s 1 -e 20 -rd /render/directory/ maya_file.mb
Obviously, -preFrame expects a string, according to the docs this can take mel commands, but obviously this is limited to certain commands, I'm assuming the currentTime command is pulling the information from the timeline in maya, not queering it from the Renderer it self... When I run the above command, straight away, it spits out this: -bash: currentTime: command not found and soon after the render fails/doesn't start.
Idealy, I'd like to print the following as it starts each frame:
"Started rendering frame XXXX at TIME GOES HERE", that way, I can quickly look at the terminal, and see if the renderer has failed, stuck or where it's up to and when it started it.
So my question is, seeing is currentTime is a mel command used from within Maya, is there another way I could print this information?
Cheers,
Shannon
After many hours of searching for this answer, I ended up finding out that you can start maya as an interactive shell. By doing this, I was able to source a script as I opened it, and run whatever I want into memory as If I had Maya open at the time.
/Applications/Autodesk/maya2014/Maya.app/Contents/MacOS/maya -prompt -script "/Volumes/raid/farm_script/setupRender.mel"
In the setupRender.mel file, I was able to assign variables, containing options for renders etc, in doing this, I was also able to create a global variable for the frame number, and increment it during the preFrame callback, like so:
int $startFrame = 100;
int $endFrame = 1110;
global int $frameCount = 0;
string $preRenderStatistics = "'global int $frameCount; $frameCount = " + $startFrame + ";'";
string $preFrameStatistics = "'print(\"Rendering frame: \" + $frameCount++)'";
string $additionalFlags = "";
string $sceneFilePath = "'/Volumes/path/to/file/intro_video_001.mb'";
system("Render -preRender " + $preRenderStatistics + " -preFrame " + $preFrameStatistics + " -s " + $startFrame + " -e " + $endFrame + " -x " + $additionalFlags + " " + $sceneFilePath);
This is a very simplified version of what I currently have, but hopefully this will help others if they stumble across it.
Take a look at the pre render layer MEL and/or pre render frame MEL section of the Render Settings.
It expects MEL, so you'll either need to write it in MEL or wrap your python in MEL. For such a simple use, I'd say just write it in MEL:
print `currentTime -q`

Password Changer using VAccess

Hey I am working on a password changer. User logs in ( successfully), loads a global var with user initials, then launch a password expired form. I try and use those initials on the password expired form to retrieve user info from DB.
vaUserLog.FieldValue("USERINIT") = UserInitials
vaUserLog.GetEqual
vaStat = vaUserLog.Status
vaStat keeps giving me an error of 4. I am using pervasive v9. Connection with VA looks like:
With vaUserLog
.RefreshLocations = True
.DdfPath = DataPath
.TableName = "USERLOG"
.Location = "USERLOG.MKD"
.Open
If .Status <> 0 Then
ErrMsg = "Error Opening File " + .TableName + " - Status " + str$(.Status) + vbCrLf + "Contact IT Department"
End If
End With
In DB table, USERINIT is Char, 3. UserInitials is a String.
Probably missing something small but can't think right now. Any help is appreciate. Lemme know if you require more info.
Cheers
Status 4 means that the record could not be found. In your case, it could be the case of the value being searched is wrong, there's a different padding (spaces versus binary zero), or that the UserInitials value just isn't in the data file.
You can use MKDE Tracing to see what's actually being passed to the PSQL engine. Once you've done that, make sure the value you're using works through the Function Executor where you can open the file and perform a GetEqual.
Here are my suggestions:
- Make sure you're pointing to the right data files.
- Make sure you're passing the right value into the GetEqual (by setting the FieldValue).

Resources