Accessing New messages from Yahoo Mail using YQL - yahoo

I am currently writing a JAVA application in which I need to access the following information from users Yahoo email messages (to display back to them). YQL looked like a 'quick easy way' to do this, however it's proving to be more difficult. All tests I ran were done here: http://developer.yahoo.com/yql/console/ I can replicate the same results using my webapp/oauth.
To
FromEmail
FromName
Subject
Message
Date
MID
I am having trouble getting this all in to 1 query call (or even 2, although I have not invested as much time researching that as a solution). Here is the short of it, currently I have the following YQL:
SELECT folder.unread, message FROM ymail.msgcontent
WHERE (fid,mids )
IN
(SELECT folder.folderInfo.fid, mid
FROM ymail.messages
WHERE numMid=2
AND startMid=0)
AND fid='Inbox'
AND message.flags.isRead=0;
This works the best out of all the solutions I have, however there is one major crippling flaw. If we have 10 emails, E1 - E10 and they are all unread with the exception of E2,E3 then after running that query, the result set will show E1, not E1, E4. Obviously this is not good. So I tried plugging the "AND message.flags.isRead=0" in the sub select:
SELECT folder.unread, message FROM ymail.msgcontent
WHERE (fid,mids )
IN
(SELECT folder.folderInfo.fid, mid
FROM ymail.messages
WHERE numMid=10
AND startMid=0
AND message.flags.isRead=0)
AND fid='Inbox'
However, this Yields 'null' as a result. In order to debug this I just run the sub select and come up with this:
SELECT folder.folderInfo.fid, mid
FROM ymail.messages
WHERE numMid=10
AND startMid=0
AND messageInfo.flags.isRead=0
This query returns 10, unfortunately after further review, it does not filter out the read VS unread. After some more toying around I change the select statement to the following query:
SELECT folder.folderInfo.fid, messageInfo.mid
FROM ymail.messages
WHERE numMid=10
AND startMid=0
AND messageInfo.flags.isRead=0
Finally, this works! EXCEPT 47 emails are returned instead of just 10. and to make things more interesting, I know for a fact I have 207 (unread) emails in my inbox, so why 47?? I have changed the 'numMid' (think of this as how many to show) from 0 - 300 and startMid (how many emails in to start, like an offset) from 0 - 300 and neither change the result set count. Of course when i change the select statement back from 'messageInfo.mid' to 'mid' the numMid / startMid 'work' again, however the filtering fromt he isRead no longer works. I know there are other solutions where I set numMid=50000 or something along those lines, however YQL is a bit slow to begin with, and I can only imagine that this will slow it down significantly.
So the question is, has any one done this? Is YQL just broke / not maintained or am I doing something wrong?
Thank you!
EDIT: Apparently this '47' that shows up is from the top 50 emails I have, 3 of which are read. I have yet to figure out how to 'trick' the YQL to allow me to override this 50 limit.

Bit late but I think I have the answer to your question.
Your query is query is almost correct except for the numInfo query parameter. Try changing the query to
SELECT *
FROM ymail.messages
WHERE numMid=75
AND startMid=0 AND numInfo=75
AND messageInfo.flags.isRead=0
Notice the numInfo=75. This should get you the last 75 unread messages. To read more about different query parameters refer to official documentation here
EDIT 1
The table ymail.messages should return unread messages by default. There is a GroupBy parameter which you should use if you want to get unread messages. Find documentation here

Related

Youtube API - Subscriptions list returns different number of total results in set

I'm trying to get the complete list of my subscriptions. I've tried 3 methods, all of them returns different amount of subscriptions and I don't know what to do :)
1: Using Subscriptions: list with channel ID:
https://www.googleapis.com/youtube/v3/subscriptions?part=snippet&channelId=MY_CHANNEL_ID&maxResults=50&key=MY_API_KEY
"totalResults" is 942
2: Using Subscriptions: list with "mine" flag. the "totalResult" field is 991.
Where do 49 subscriptions appear from?
3: Open browser in incognite mode, go to
https://www.youtube.com/channel/MY_CHANNEL_ID
Click on "Channels" tab, scroll down to the end of the subscriptions list, open console and type something like that
document.querySelectorAll("#contents #items > *").length
I see 1039. Where do another 48 subscriptions come from?
And the 1039 seems to be the most accurace number - I have 6 subscriptions in a row and the last row has only 1 item. 173*6+1 = 1039
So the questions is - how do I get all the 1039 subscriptions by API? And why does it return wrong amount of subscriptions?
You are using Subscriptions: list and shouldn't have such kind of bugs with totalResults however maybe there is a YouTube Data API v3 endpoint bug as documented in Search: list totalResults is:
integer
The total number of results in the result set. Please note that the value is an approximation and may not represent an exact value. In addition, the maximum value is 1,000,000.
You should not use this value to create pagination links. Instead, use the nextPageToken and prevPageToken property values to determine whether to show pagination links.
So I would recommend you to enumerate all subscriptions you have with the different methods you explained and so count on your own by using nextPageToken.

How to get all message history from Hipchat for a room via the API?

I was using the Hipchat API (v2) a bit today and ran into an odd issue where I was not able to really pull up all of the history for a room. It seemed as though when I queried a specific date, for example, it would only retrieve a fraction of the history for that date given. I had had plans to simply iterate across all of the dates for a Room to extract the history in a format that I could use, but ended up hitting this and am now unsure if it is really possible to pull out the history fully.
I realize that this is a bit clunky. It is pulling the JSON as a string and then I have to form it into a hash so I know I'm not doing this as good as it could be done, but here is roughly what I quickly did just to test out the history method for the API:
api_token = "MY_TOKEN"
client = HipChat::Client.new(api_token, :api_version => 'v2')
history = client['ROOM_NAME'].history
history = JSON.parse(history)
history.each do |key, history|
if history.is_a? Array
history.each do |message|
if message.is_a? Hash
puts "#{message['from']['name']}: #{message['message']}"
end
end
end
end
Obviously then the extension to that was to just curse through the dates in the desired range (using: client['ROOM_NAME'].history(:date => '2010-11-19', :timezone => 'PST')), but again, I was only getting a fraction of the history for the room. Are there some additional parameters that I'm missing for this to make it work as expected?
I got this working but it was a big pain.
Start by sending a query with the current time, in UTC, but without including the time zone, as the start date:
https://internal-hipchat-server/v2/room/2/history?reverse=false&date=2015-06-25T20:42:18.658439&max-results=1000&auth_token=XXX
This is very fiddly:
If you specify just the current date, without a timezone, as documented in the API, it is interpreted as midnight last night and you only get messages from yesterday or older.
If you try specifying tomorrow’s date instead, the response is 400 Bad Request This day has not yet come to pass.
If you specify the time as 2015-06-25T20:42:18.658439+00:00, which is the format that times come in HipChat API responses, HipChat’s parser seems to fail and interpret it as midnight last night.
When you get the response back, take the oldest items.date property, strip the timezone, and resubmit the above URL with an updated date parameter:
https://internal-hipchat-server/v2/room/2/history?reverse=false&date=2015-06-17T19:56:34.533182&max-results=1000&auth_token=XXX
Be sure to include the microseconds, in case a notification posted multiple messages to the same room in the same second.
This will get you the next page of messages. Keep doing this until you get fewer than max-results messages back.
There is a start-index parameter I tried passing before I got the above working, and it will give you a few pages of results, with responses lacking a links.next property, but it won’t give you the full history. On a chatroom with 9166 messages in the history according to statistics.messages_sent, it only returned 3217 messages. So don’t use it. You can use statistics.messages_sent as a sanity check for whether you get all messages.
Oh yeah, and the last_active property in the /v2/room call cannot be trusted because it doesn’t update when notification messages are posted to the room.

python slow to check if mongodb record found

I have a python (3.2) request that goes to MongoDB and the request itself is running fast enough. When I then perform an if statement check to see if any records were found it takes 50 times as long:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
58 27623 6475988 234.4 1.7 itemInDB = db.mainData.find({"x":item[x]}).limit(1)
59
60 #existing item in db
61 27623 293419802 10622.3 77.6 if itemInDB.count():
What on earth is the cause for that if statement taking so long?! I presume there must be a better way to check if a record was found but google has come up empty.
Thanks for the help.
Perhaps a Better Way
If you're only interested in returning one value, you might want to use find_one instead of find. It will stop looking for values after one has been found, as opposed to find, which has to run through the collection:
itemInDB = db.mainData.find_one({"x":item[x]})
if itemInDB:
print("Item found")
else:
print("Item not found")
For Your Example
According to the PyMongo docs, when querying the count of a cursor, you can pass in a parameter (True or False) to take into account any skip or limit calls previously made to the cursor. The default for that parameter is False (namely, not taking those calls into account). That may be affecting the performance of your count query.
Gauging Query Performance
If you want to see how your query will be carried out by mongo, you can call explain on your cursor:
db.coll.find({"x":4}).explain()
The explain function is also implemented in PyMongo.
Turns out it was due to the find() function and not the if statement. I created an index on "x" (as I should have anyway). Changed the find to find_one and removed the .count() from the if statement. Overall 75% faster.

reading EMV card using PPSE and not PSE

I'm trying to read the data off a contactless Visa Paywave card.
For the Paywave, I have to submit a SELECT using PPSE (2PAY.SYS.DDF01) instead of PSE (1PAY.SYS.DDF01).
The EMV book 1, section 11.3.4, table 43 only describes how to interpret the response for a successful SELECT command using PSE. Does anyone know or can refer me to a source that shows how to process the data returned from a successful SELECT command using PPSE?
Here's my request APDU:
00A404000e325041592e5359532e444446303100
Here's the response:
6F2F840E325041592E5359532E4444463031A51DBF0C1A61184F07A0000000031010500A564953412044454249548701019000
I understand tag 84, tag 85, tag BF0C from the response. According to the examples for reading PSE, I should be able to just send GET PROCESSION OPTIONS (to get the AIP and AFL) with PDOL = null after this successful response as follows: 80A80000830000.
But request 80A80000830000 returns error code 6985 - Command not allowed; conditions of use not satisfied.
I also tried reading all the files after successfully selecting the PPSE by traversing through every single SFI (0-30) and every single record (0-16) of each SFI. Yes, I also did the 3 bit shift and bitwise-OR the SFI with 0x4. But I got no data.
I'm stuck, any help that would point me into getting some info from my Paywave card would be appreciated!
Have you tried this tool from EMVLAB http://www.emvlab.org/emvtags/
Using that tool,
http://www.emvlab.org/tlvutils/?data=6F2F840E325041592E5359532E4444463031A51DBF0C1A61184F07A0000000031010500A564953412044454249548701019000
2PAY.SYS.DDF01 is for contactless (e.g. NFC ) cards, while 1PAY.SYS.DDF01 is for contact cards.
After successfully (SW1 SW2 = 90 00) reading a PSE, you should only search for the SFI (tag 88) which is a mandatory field in the FCI template returned.
With the SFI as your start index, your would have to read the records starting from the start index until you get a 6A83 (RECORD_NOT_FOUND). E.g. if your SFI is 1, you would do a readRecord with record_number=1. That would probably be successful. Then you increament record_number to 2 and do readRecord again. The increament to 3 .... Repeat it until you get 6A83 as your status.
The records read would be ADFs (at least 1). Then your would have to compare the read ADF Names with what your terminal support and also based on the ASI (Application Selection Indicator). At the end you would have a list of possible ADFs (Candidate list)
All the above steps (1-3) are documented in chapter 12.3.2 Book1 v4.3 of the EMV spec.
You would have to make a final selection (Chapter 12.4 Book1)
Read the spec book 1 chapter 12.3 - 12.4 for all the detailed steps.
You seem to have the flow mixed up a bit, you want to:
Send 1PAY or 2PAY, it doesn't actually matter for all of the cards I've tested. This will return a list of the AIDs available on the card. Alternately you can just select an AID straight away if you know it's there but good practice would be to check first.
Get the list of AIDs returned in response to 1PAY/2PAY, in PayWave's case this will probably be A0000000031010 if you sent 2PAY but you may get more if you send 1PAY.
Select one of the AIDs sent back (or one you already know is on there).
Then loop through the SFIs and records sending the Read Records command to get the data.
You don't have to send Get Processing Options before sending the Read Records command even though that's now a normal transaction flow goes.
I think the information you're looking for is available from this VISA website. But only if you're a registered and/or licensed partner of VISA.
EDIT: Looking at the resulting TLV struct under BF0C:
tag=0xBF0C, length=0x1A
tag=0x61, length=0x18
tag=0x4F, length=0x07, value=0xA0000000031010 // looks like an AID to me
tag=0x50, length=0x0A, value="VISA DEBIT"
tag=0x87, length=0x01, value=0x01
I would guess that you need to first select A0000000031010 before getting the processing options.
I was selecting application 2PAY.SYS.DDF01. when I should have been selecting AID = 0xA0000000031010. It looks like there's no records under application 2PAY.SYS.DDF01.
But there was 1 record under application 0xA0000000031010. After I got this application, I performed a READ RECORD, and the first record gave me the PAN and all the credit card info I wanted.
Thanks everyone for chiming in.

Scraping Real Time Visitors from Google Analytics

I have a lot of sites and want to build a dashboard showing the number of real time visitors on each of them on a single page. (would anyone else want this?) Right now the only way to view this information is to open a new tab for each site.
Google doesn't have a real-time API, so I'm wondering if it is possible to scrape this data. Eduardo Cereto found out that Google transfers the real-time data over the realtime/bind network request. Anyone more savvy have an idea of how I should start? Here's what I'm thinking:
Figure out how to authenticate programmatically
Inspect all of the realtime/bind requests to see how they change. Does each request have a unique key? Where does that come from? Below is my breakdown of the request:
https://www.google.com/analytics/realtime/bind?VER=8
&key= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&ds= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&pageId=rt-standard%2Frt-overview
&q=t%3A0%7C%3A1%3A0%3A%2Ct%3A11%7C%3A1%3A5%3A%2Cot%3A0%3A0%3A4%2Cot%3A0%3A0%3A3%2Ct%3A7%7C%3A1%3A10%3A6%3D%3DREFERRAL%3B%2Ct%3A10%7C%3A1%3A10%3A%2Ct%3A18%7C%3A1%3A10%3A%2Ct%3A4%7C5%7C2%7C%3A1%3A10%3A2!%3Dzz%3B%2C&f
The q variable URI decodes to this (what the?):
t:0|:1:0:,t:11|:1:5:,ot:0:0:4,ot:0:0:3,t:7|:1:10:6==REFERRAL;,t:10|:1:10:,t:18|:1:10:,t:4|5|2|:1:10:2!=zz;,&f
&RID=rpc
&SID= [What is this? Where does it come from? 16 character uppercase alphanumeric, stays the same each request]
&CI=0
&AID= [What is this? Where does it come from? integer, starts at 1, increments weirdly to 150 and then 298]
&TYPE=xmlhttp
&zx= [What is this? Where does it come from? 12 character lowercase alphanumeric, changes each request]
&t=1
Inspect all of the realtime/bind responses to see how they change. How does the data come in? It looks like some altered JSON. How many times do I need to connect to get the data? Where is the active visitors on site number in there? Here is a dump of sample data:
19
[[151,["noop"]
]
]
388
[[152,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[49,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2,0],"name":"Total"}]}}]]]
]
388
[[153,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[52,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[2,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2],"name":"Total"}]}}]]]
]
388
[[154,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[53,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,3,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3],"name":"Total"}]}}]]]
]
Let me know if you can help with any of the items above!
To get the same, Google has launched new Real Time API. With this API you can easily retrieve real time online visitors as well as several Google Analytics with following dimensions and metrics. https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/
This is quite similar to Google Analytics API. To start development on this,
https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide
With Google Chrome I can see the data on the Network Panel.
The request endpoint is https://www.google.com/analytics/realtime/bind
Seems like the connection stays open for 2.5 minutes, and during this time it just keeps getting more and more data.
After about 2.5 minutes the connection is closed and a new one is open.
On the Network panel you can only see the data for the connections that are terminated. So leave it open for 5 minutes or so and you can start to see the data.
I hope that can give you a place to start.
Having google in the loop seems pretty redundant. Suggest you use a common element delivered on demand from the dashboard server and include this item by absolute URL on all pages to be monitored for a given site. The script outputting the item can read the IP of the browser asking and these can all be logged into a database and filtered for uniqueness giving a real time head count.
<?php
$user_ip = $_SERVER["REMOTE_ADDR"];
/// Some MySQL to insert $user_ip to the database table for website XXX goes here
$file = 'tracking_image.gif';
$type = 'image/gif';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($file));
readfile($file);
?>
Ammendum:
A database can also add a timestamp to every row of data it stores. This can be used to further filter results and provide the number of visitors in the last hour or minute.
Client side Javascript with AJAX for fine tuning or overkill
The onblur and onfocus javascript commands can be used to tell if the the page is visible, pass the data back to the dashboard server via Ajax. http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/
When a visitor closes a page this can also be detected by the javascript onunload function in the body tag and Ajax can be used to send data back to the server one last time before the browser finally closes the page.
As you may also wish to collect some information about the visitor like Google analytics does this page https://panopticlick.eff.org/ has a lot of javascript that can be examined and adapted.
I needed/wanted realtime data for personal use so I reverse-engineered their system a little bit.
Instead of binding to /bind I get data from /getData (no pun intended).
At /getData the minimum request is apparently: https://www.google.com/analytics/realtime/realtime/getData?pageId&key={{propertyID}}&q=t:0|:1
Here's a short explanation of the possible query parameters and syntax, please remember that these are all guesses and I don't know all of them:
Query Syntax: pageId&key=propertyID&q=dataType:dimensions|:page|:limit:filters
Values:
pageID: Required but seems to only be used for internal analytics.
propertyID: a{{accountID}}w{{webPropertyID}}p{{profileID}}, as specified at the Documentation link below. You can also find this in the URL of all analytics pages in the UI.
dataType:
t: Current data
ot: Overtime/Past
c: Unknown, returns only a "count" value
dimensions (| separated or alone), most values are only applicable for t:
1: Country
2: City
3: Location code?
4: Latitude
5: Longitude
6: Traffic source type (Social, Referral, etc.)
7: Source
8: ?? Returns (not set)
9: Another location code? longer.
10: Page URL
11: Visitor Type (new/returning)
12: ?? Returns (not set)
13: ?? Returns (not set)
14: Medium
15: ?? Returns "1"
page:
At first this seems to work for pagination but after further analysis it looks like it's also used to specify which of the 6 pages (Overview, Locations, Traffic Sources, Content, Events and Conversions) to return data for.
For some reason 0 returns an impossibly high metrictotal
limit: Result limit per page, maximum of 50
filters:
Syntax is as specified at the Documentation 2 link below except the OR is specified using | instead of a comma.6==CUSTOM;1==United%20States
You can also combine multiple queries in one request by comma separating them (i.e. q=t:1|2|:1|:10,t:6|:1|:10).
Following the above "documentation", if you wanted to build a query that requests the page URL and city of the top 10 active visitors with a traffic source type of CUSTOM located in the US you would use this URL: https://www.google.com/analytics/realtime/realtime/getData?key={{propertyID}}&pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States
Documentation
Documentation 2
I hope that my answer is readable and (although it's a little late) sufficiently answers your question and helps others in the future.

Resources