Performing MongoDB's cursor.forEach() in ruby - ruby

I've just started experimenting with Ruby's Sinatra a couple of days ago, I'm trying to query a MongoDB, the find_one() method works very well, but when trying to get more than one document (i.e when using find()) a cursor is returned, I'm used to using the cursor.forEach() method to iterate through all the returned documents, but as I am new to ruby, I am having a hard time figuring it out.
Would be great if you can point me in the right direction, also if you know of a Mongo/Ruby command dictionary or cheat sheet, I would really appreciate it.
Some code to help with the matter:
#The following code is intentionally formatted the way it is, (i.e the case
#insensitive, the way I'm calling the database), all that is irrelevant,
#but there to show you what I'm doing; I might be screwing up somewhere.
#works fine, returns JSON of required document
settings.mongo_db['col'].find_one({"key" => /#{value}/i}).to_json
#returns cursor, need to iterate
settings.mongo_db['col'].find({"key" => /#{value}/i}).to_json
Your replies/thoughts are much appreciated.

Well generally in ruby in order to iterate you just use .each but since you just want to return your cursor results as JSON just turn the statement around
JSON.generate( settings.mongo_db['col'].find({"key" => /#{value}/i}).to_a )
So that should serialize as an array of documents.
Also see other methods in the JSON package.

Related

Confused about XPath Syntax

Problem Summary:
Hi, I'm trying to learn to use the Scrapy Framework for python (available at https://scrapy.org). I'm following along with a tutorial I found here: https://www.scrapehero.com/scrape-alibaba-using-scrapy/, but I was going to use a different site for practice rather than just copy them on Alibaba. My goal is to get game data from https://www.mlb.com/scores.
So I need to use Xpath to tell the spider which parts of the html to scrape, (I'm about halfway down on that tutorial page on the scrapehero site, at the "Construct Xpath selectors for the product list" section). Problem is I'm having a hell of a time figuring out what syntax should actually be to get the pieces I want? I've been going over xpath examples all morning trying to figure out the right syntax but I haven't been able to get it.
Background info:
So what I want is- from https://www.mlb.com/scores, I want an xpath() command which will return an array with all the games displayed.
Following along with the tutorial, what I understand about how to do this is I'd want to inspect the elements from the webpage, determine their class/id, and specific that in the xpath command.
I've tried a lot of variations to get the data but all are returning empty arrays.
I don't really have any training in XPath so I'm not sure if my syntax is just off somewhere or what, but I'd really appreciate any help on getting this command to return the objects I'm looking for. Thanks for taking the time to read this.
Code:
Here are some of the attempts that didn't work:
response.xpath("//div[#class='g5-component--mlb-scores__game-wrapper']")
response.xpath("//div[#class='g5-component]")
response.xpath("//li[#class='mlb-scores__list-item mlb-scores__list-item--game']")
response.xpath("//li[#class='mlb-scores__list-item']")
response.xpath("//div[#!data-game-pk-id > 0]")'
response.xpath("//div[contains(#class, 'g5-component')]")
Expected Results and Actual Results
I want an XPath command that returns an array containing a selector object for each game on the mlb.com/scores page.
So far I've been able to get generic returns that aren't actually what I want (I can get a selector that returns the whole page by just leaving out the predicates, but whenever I try to specify I end up with an empty array).
So for all my attempts I either get the wrong objects or an empty array.
You need to always check HTML source code (Ctrl+U in a browser) for the data you need. For MLB page you'll find that content you are want to parse is loaded dynamically using JavaScript.
You can try to use Scrapy-Splash to get target content from your start_urls or you can find direct HTTP request used to get information you want (using Network tab of Chrome Developer Tools) and parse JSON:
https://statsapi.mlb.com/api/v1/schedule?sportId=1,51&date=2019-06-26&gameTypes=E,S,R,A,F,D,L,W&hydrate=team(leaders(showOnPreview(leaderCategories=[homeRuns,runsBattedIn,battingAverage],statGroup=[pitching,hitting]))),linescore(matchup,runners),flags,liveLookin,review,broadcasts(all),decisions,person,probablePitcher,stats,homeRuns,previousPlay,game(content(media(featured,epg),summary),tickets),seriesStatus(useOverride=true)&useLatestGames=false&language=en&leagueId=103,104,420

Riak ruby client trying to delete CRDT map

using ruby client (2.3.0) with Riak 2.0. I've created a CRDT bucket type of 'Maps', which store (surprise) maps.
Everything works including search, etc. but for the life of me I can't work out how to delete a map when I no longer need it.
I've tried this based on things I found:
robject = #bucket.get #key, type: 'maps'
robject.delete
This does not give an error, but the map is not removed from Riak; neither is it 'tombstoned' as I can still retrieve the data from it and the search index still has the data too.
I've also tried:
#bucket.delete #key, 'maps'
but this doesn't work either. It gives error "no implicit conversion of Symbol into Integer" and without 'maps' it doesn't work either.
Looking at the first option in the console, it looks to me it is accessing the correct object, but calling 'delete' on it seems to have no effect.
How do I correctly delete the map? At least if I can have it removed from indexing results would be a big step!
Thanks
d'oh, didn't read the docs correctly.
it's simply:
#bucket.delete #key, type: 'maps'
I missed the 'type:'
Silly

Path to dynamic object?

I have a system_settings table which has a key and value columns. The key looks something like general.site.something.config and the value is a simple string.
I'd like to have a static class which, upon initialization, reads the settings and caches the values. Furthermore, I'd like to be able to access the settings in an OO way, such as SystemSetting.CACHE.General.Site.Something.Config in order to pull back the value for that key. Basically turning the rows in the table into a tree.
Is there an easy way to do this in Ruby 1.8.7?
TL;DR, No. No easy (read 'built-in') way atleast.
The syntax you want is not the way things happen in Ruby (without over-plumbing, that is). To have a look at the over-plumbing I'm referring to, have a look at the code I wrote for this example that demonstrates some of the desired functionality you want. I wouldn't suggest using it though and that's the same reason I'm not posting it here.

MongoDB find and remove - the fastest way

I have a quick question, what is the fast way to grab and delete an object from a mongo collection. Here is the code, I have currently:
$cursor = $coll->find()->sort(array('created' => 1))->limit(1);
$obj = $cursor->getNext();
$coll->remove(array('name' => $obj['name']));
as you can see above it grabs one document from the database and deletes it (so it isn't processed again). However fast this may be, I need it to perform faster. The challenge is that we have multiple processes doing this and processing what they have found BUT sometimes two or more of the processes grab the same document therefore making duplicates. Basically I need to make it so a document can only be grabbed once. So any ideas would be much appreciated.
Peter,
It's hard to say what the best solution is here without understanding all the context - but one approach which you could use is findAndModify. This will query for a single document and return it, and also apply an update to it.
You could use this to find a document to process and simultaneously modify a "status" field to mark it as being processed, so that other workers can recognize it as such and ignore it.
There is an example here that may be useful:
http://docs.mongodb.org/manual/reference/command/findAndModify/
Use the findAndRemove function as documented here:
http://api.mongodb.org/java/current/com/mongodb/DBCollection.html
The findAndRemove function retrieve and object from the mongo database and delete it in a single (atomic) operation.
findAndRemove(query, sort[, options], callback)
The query object is used to retrieve the object from the database (see collection.find())
The sort parameter is used to sort the results (in case many where found)
I make a new answer to remark the fact:
As commented by #peterscodeproblems in the accepted answer. The native way to this in mongodb right now is to use the
findAndModify(query=<document>, remove=True)
As pointed out by the documentation.
As it is native, and atomic, I expect this to be the faster way to do this.
I am new to mongodb and not entirely sure what your query is trying to do, but here is how I would do it
# suppose database is staging
# suppose collection is data
use staging
db.data.remove(<your_query_criteria>)
where is a map and can contain any search criteria you want
Not sure if this would help you.

calling the column_name in the table while connecting via Oracle DB

I am trying to get the value from the table(employee)connecting through
the oracle database. Since there are 100s of values in one column, I would need to iterate the table and get the exact value.
I have the code that works if I use the index no. such as row[1] but I
wanted to use the column_name "first name" instead of row[1]. Below is
the code that I have which works.
Code:
def load_borrower
connection = OCI8.new('usrname', 'pwd', //host:portno/sid')
connection.exec(("SELECT BI_PREFIX, BI_FNAME, BI_MNAME, BI_LNAME, B.BI_SUFFIX, BI_ID_TYPE, BI_ID_NUMBER, BI_DOB, B1.*, R.*, M.*, C.*, L.* FROM EMPLOYEE, SC_BORROWERPREF_NEW S1, BORROWER_NEW B, BORROWERPREF_NEW B1, RES_ADD R, MAIL_ADD M, CLOS_ADD C, LLORD_ADD L WHERE S2=SCENARIO_ID = S1.SCENARIO_ID AND S1.PREF_ID = B1.PREF_ID AND B1.BORROWER_ID = B.BORROWER_ID AND B1.PREF_ID = R.RES_PREF_ID AND B1.PREF_ID = M.MAIL_PREF_ID AND B1.PREF_ID = C.CLOS_PREF_ID AND B1.PREF_ID = L.LLORD_PREF_ID AND S.RELEASE_ID= "1" AND S.SCENARIO_NO = '2' ORDER BY S1.SC_BORROWERPREF_ID") do |row|
$BI_PREFIX=row[0].to_s
$BI_FNAME=row[1].to_s
$BI_MNAME=row[2].to_s
$BI_LNAME=row[3].to_s
$BI_SUFFIX=row[4].to_s
$BI_BI_ID_TYPE=row[5].to_s
$BI_BI_ID_NUMBER=row[6].to_s
$BI_DOB=row[7].to_s
$BI_EMAIL=row[9].to_s
$BI_CELL_PH=row[11].to_s
$BI_WORK_PH=row[12].to_s
$BI_PREF_CONT=row[13].to_s
$BI_MAR_STATUS=row[16].to_s
$BI_EMP_STATUS=row[23].to_s
$BI_EDUC_YEARS=row[17].to_s
$BI_NUM_DEPEND=row[21].to_s
end
end
Now I'm running the above functions below
load_borrower
So the code above right now works fine. But As you can see from above, I am defining the variables from the db table as row[5], row[24] like that which is very hectic and time consuming although it works. So I was just wondering if we have any method or command to use the column_name such that it gets the value from the row and the column such as row['Emp_id'] instead of finding about the index of every column_name.
I am not sure if this is a drawback of Ruby as it treates the table from the db as an array and may be that's why we can't specify by column_name.
Firstly it appears you are a bit confused by the boundaries and separations between the various bits of technology you are using. There is no Watir in the code you provided, NONE. it's all pure Ruby and a tiny bit of stuff from the OCI8 Gem. A GEM is a standard way that Ruby folks use to distribute code libraries and programs written in the Ruby language. See HERE for more info to better understand what a Gem is and how they are used.
Watir is another Ruby gem that is for driving web-browsers, and you might be using it elsewhere in your code, but it doesn't relate to this question or OCI8 other than both of them being Ruby code libraries distributed as Gems. So lets leave it aside so as to not confuse things.
The behavior you are seeing is how the OCI8 gem works, NOT anything to do with Ruby specifically. If you want something more elegant, then look into different gems that have been created for doing db access with Ruby, for example ActiveRecord, which was suggested in another answer already. The OCI8 Gem only returns an array if you have the results feeding into a block like you do in your current code. Otherwise the results are in an object called a Cursor, and you can use the cursor's fetch_hash method to get fetched data as a Hash. The hash keys are column names. (see http://ruby-oci8.rubyforge.org/en/api_OCI8Cursor.html)
Allow me to strongly recommend that you spend a little time learning a bit more about the Ruby language before you tear much further into your current project. Given the nature of the coding you seem to be doing, I'd advise you to read Brian Marik's book "Everyday Scripting with Ruby", thats going to give you a lot better understanding of the technology you are using, and you'll understand better when we toss around terms like 'hash' as I just did.
If you will allow a bit of general advice in terms of how you are going about interfacing with your database. IMHO, you should be taking advantage of the db by constructing a query that returns JUST the data you want, instead of grabbing huge amounts of data and trying to parse through it manually. It's better use of the resource, uses less memory, takes less time to transfer the info from the db, and no matter how good your parsing code might be, it won't be as good as what the Oracle people wrote. Let the db do the heavy lifting, that's what it's there for.
If what you are dealing with here is data to drive your testing, or validate results, rather than construct one huge monolithic array, I'd recommend you use a much more modular approach. Use one global variable such as the EMP_ID of the current user you are testing with or against, and have the test code get query results for just the values needed for each validation, or a small logical group of validations like the parts of an address. It's a lot easier to build up stuff that way on a case by case basis working as you go, instead of trying to write the whole data retrieval bit in one giant piece that will be a nightmare to maintain.
As it stands all your test code that is verifying function or validating how the site works is going to be tightly coupled to a big monolithic piece that fetches the data from the db. that creates a lot of dependencies and makes your test code hard to maintain. If you deal with things in a more modular way, where each validation step retrieves just the data it needs, then it's a lot easier to expand or modify your test code as the site or database changes.
If you had an array containing the column names then you could zip it up with the row array and build a hash:
Hash[column_names.zip( row )]
I would recommend using activerecord for this though.
This should work
connection = OCI8.new('usrname', 'pwd', //host:portno/sid')
cursor = connection.exec(("SELECT BI_PREFIX ...")
cols = cursor.get_col_names
while r = cursor.fetch
$BI_PREFIX=r[cols.index('BI_PREFIX')].to_s
...
end

Resources