How to feed Slack archive into GSA? - google-search-appliance

I am wondering how can I use Slack API to feed message history into GSA (Google Search Appliance) and having this kept up to date.
Did anyone wrote a script for this?

I don't have a readymade script, but it should be possible as you've imagined; IMO (without being familiar with the slack api, but with some knowledge of the slack archive sizes, i.e., >500K messages), I think the main challenge would be to identify and extract only the pieces of information that are important to you from the slack archive (which can easily get you to run out of your GSA document index license limit if you chose your GSA feed record elements too discretely - e.g., imagine if every message were a separate feed record).
In other words, you need to identify the discrete feed records keeping them as atomically large as possible in order to keep the document license usage down to a minimum, while keeping them discrete enough to yield accurate results.
Once that's done, or if your GSA index license limit is not a problem, one possible solution is to create an incremental/full feed by reading updates from the slack archive using its API, and then compiling the new records found, into the GSA feed format (with information that you want to be able to search-on/omit contained within the tags as appropriate, and info that you need to present in the results, contained in html meta tags), and push those new records in to the GSA.
Another solution, if you'd be able to host a few web application pages that you can have the GSA crawl, will even allow you to keep its index up to date with a continuous crawl. For this you'd need at least one "jump page" which would just be a list of links each populated with query string parameters, to be passed to your detail record page, which would serve to identify a set of various slack message archive element IDs, that you've determined as needing to be indexed as a discrete record. You'd then need to set your "jump page" URL to be crawled by the GSA, and also develop your XSLT or other search results consumer service to be able to read/render the returned results with info contained in meta tags. Note: when the consumer service makes the search call to GSA, it'll need to pass in a "&getfields=*" query string parameter to get the GSA to return all the info contained in the meta tags.
I hope that my wording is not too esoteric and helps you in some way in designing your solution.

Related

How to retrieve EVERY mail message (including Deleted, Archived, and Recoverable items) for an O365 user with Graph API?

We are working on an application in the compliance/monitoring space where we are monitoring the activity of an individual. Because of this, we want to pull EVERYTHING in a user's Office 365 mailbox - if it has text the user wrote or received, we want it if it is there, even if it was deleted, purged, etc.
We are using the Graph API and have an existing implementation that uses the standard "messages" GET command:
GET https://graph.microsoft.com/v1.0/me/messages
We are making use of the GraphApiClient (Microsoft.Graph v1.9.0), so the code actually looks like this:
IUserMessagesCollectionPage pageOfMessages = _graphClient.Users[userId].Messages.Request(options).Top(batchSize).Expand("attachments").GetAsync().Result;
However, at the very least this does not return any items from any of the RecoverableItems folders. After looking into it, I am now suspicious that there might be other folders that are not returned by this command either. There is quite the list of Well-known folder names and I'm not sure what others might not be included in a generic Messages request.
Based on this post, I know you can request the messages in the missing folders by WellKnownFolderName one at a time like this:
GET https://graph.microsoft.com/v1.0/me/MailFolders/RecoverableItemsDeletions/messages
It even works with the GraphApiClient:
IMailFolderMessagesCollectionPage pageOfMessages = _graphClient.Users[userId].MailFolders["RecoverableItemsDeletions"].Messages.Request(options).Top(batchSize).Expand("attachments").GetAsync().Result;
The problems with this are:
I don't know how to build a comprehensive list of every folder that has messages for the user
Some of the folders (like RecoverableItemsDeletions and ArchiveRecoverableItemsDeletions, for example) can contain duplicates so I would need to use a dictionary to get rid of the duplicates
It would be a lot more expensive to first build a list of relevant folders and then request their contents and their childrens' contents one request at a time.
At scale, a folder-by-folder implementation could be subject to throttling (if we are monitoring enough users with big enough mailboxes)
Does anyone know the best way to do this? Thanks!

Why would you send multiple operations in a graphql query document when one can only run at a time?

The graphql spec mentions that you can send multiple operations in request document as long as you select the operation to run:
http://facebook.github.io/graphql/October2016/#ExecuteRequest()
To execute a request, the executor must have a parsed Document (as defined in the “Query Language” part of this spec) and a selected operation name to run if the document defines multiple operations, otherwise the document is expected to only contain a single operation.
In what situation, would you send such a document with multiple operations? I can only assume this is to allow for switching between operations without having to create a separate document for each one?
I found this answer from the graphql team:
The use is threefold:
It's often useful to store GraphQL documents in a client side file; we use .graphql files on iOS and Android for this. We wanted the parser to work on these files as well (so the parser is universal), but you can (and we often do) include multiple queries in a single document.
One optimization that can be made (and that we'll discuss in more detail in the future, I'm sure) is "persisting" documents; you send the document to the server who stores it for you and gives you an identifier for that, that way you don't have to send the whole document string up every time. If you do that, you can send a document up with all your queries, and then pass the document ID + operation name over the wire. This would optimize bytes being sent from the client to the server.
It's possible to write a "batch" API for GraphQL, where you use the results from one query as the parameters to another. We're still working out the exact details there, but if you do that, it's useful to specify multiple queries in a single document; you'd specify which one you wanted to run at the start and the relationships between them.
Reference: https://github.com/facebook/graphql/issues/29#issuecomment-118994619

Mailchimp Custom Automation Suggestions?

Not sure if this is even possible - but we are looking for a way to trigger mailchimp newsletters based on a custom field value in a Wordpress website.
Basically we will have a field value that holds "the number of miles" a person has walked based on the data they enter. We will be calculating the "total miles".....when they reach 100 miles for example we will need an email to trigger from Mailchimp....then 200 miles will trigger a 2nd email and so on....
Does anyone know if this can even be done with Mailchimp? If not is there a better approach to handling this?
THANK YOU!
If you are familiar with Python, I'd recommend using a Jupyter notebook for this to cut down on development work. You could set it to run at regular intervals checking the status of each user (running either on your computer or a server), then updating the merge tag of the status in mailchimp. You can have automations that are triggered when the merge tag of distance is a specific value, say 100 they get the 100 email, 200 they get the 200 email. (You could also do it so when a user hits a certain milestone their merge tag is updated in MailChimp but from my experience that's a little more work.)
Net net there are a few ways to achieve your goal but I think using a Python notebook using pandas to manipulate the data and the mailchimp3 mailchimp API client would be the lightest lift.
TIP: Mailchimp currently has a bug where merge tags information is not always accurately represented in the UI. So for example if via the API you added 500 people with the Distance merge value of 200, and checked that via the UI how many people had a value of 200 for Distance you would likely see an inaccurate number displayed for the count in the UI. If you export the list, you will see the correct number that is reflected in your API update. To be clear, in some cases UI does not display the accurate number for users with that merge tag or value, but if you export the list with that merge tag/value via the UI it should match what you pushed to the API. This is currently an open ticket.

Limiting and Sorting with Parse?

I'm trying to learn how to use Parse and while it's very simple, it's also... not? Perhaps I'm just missing something, but it seems like Parse requires a lot of client-side code, and even sending multiple requests for a single request. For example, in my application I have a small photo gallery that each user has. The images are stored on Parse and obtained from parse when needed.
I want to make sure that a user can not store any more than 15 images in their gallery at a time, I also want these images to be ordered by an index.
Currently it seems like the only viable option is to perform the following steps on the client:
Execute a query/request to get the amount of pictures stored.
If the amount is less than 15, then execute a request to upload the picture.
Once the picture is uploaded, execute a request that stores an object linking the user that uploaded the PFFile.
This is a total of 3 or? 6 requests just to upload a file, depending on if a "response" is considered a request by parse too. This also does not provide any way to order the pictures in the gallery. Would I have to create a custom field called "index" and set that to the number of photos received in the first query + 1?
It's worse than you think: to create the picture you must create a file, save it, then save a reference to the file in an object and save that, too.
But it's also better than you think: this sort of network usage is expected in a connected app, and some of it can be mitigated with additional logic on the server ("cloud code" in parse parlance).
First, in your app, consider a simple data model where _User has an array of images (represented, say, by an "UserImage" custom class). If you keep this relationship as an array of pointers on user, than a user's images can be fetched eagerly, when the app starts, so you'll know the image count as a fact along with the user. The UserImage object will have a file reference in it, so you can optionally fetch the image data and just hold the lighter metadata with the current user.
Ordering is a more ephemeral idea. One doesn't order objects as they are saved, but rather as they are retrieved. Queries can be ordered according to any attribute, and even more to the point, since you're retrieving all 15 images, you should consider ordering them for presentation a function of the UI, not the data.
Finally, parse limits your app not by transaction count, but by transaction rate, with a free limit low enough to serve plenty of users.

Segmenting on users who have performed a behaviour not behaving as expected

I want to look at the effect of having performed a specific action sequence at any (tracked) time in the past on user retention and engagement.
The action sequence is that of performing an optional New User Flow.
This is signalled to Google Analytics via sending it appropriate events. That works fine. The events show up in reports as expected.
My problem is what happens to results when I used these events to create segments. I have tried two different ways of creating a segment based on this in Advanced Segmentations, via Conditions (defining the segment via the end event, filtered over users not sessions), and via Sequences (defining start and end events, again filtered over users not sessions).
What I get when I look at various retention/loyalty reports, using either of these segments, is ever so very clearly a result which is doing this segmentation within session, not across uses sessions. So for NUF completers , I am seeing all my loyalty/recency on Session 1, in which people are most likely to do the NUF, if they ever do it at all. This is not what I want. (Mind you it is something that could be really useful in other context, with another event! But not for the new user flow.)
What are my options for getting what I want? I see two possible ways forward:
Using custom dimensions, assigning a custom dimension value in the code when the New User Flow is completed. However I do not know if this will solve the cross-session persistence problem.
Injecting a UserID, which we do not currently do, and (somehow!) using the reports available when you inject a UserID to do this.
Are either of these paths plausible? Is there a better way forward? Is it silly to even try to do this in Google Analytics? I'm way more familiar with App Tracking solutions (e.g. Flurry, Mixpanel, DeltaDNA) which do this as a matter of course, than with Google Analytics, and the fact this is at the very least awkward in Google Analytics is coming a bit of a surprise.
thanks,
Heather

Resources