How to delete index in custom java connector - google-search-appliance

I have build a custom connector to get the data from a web service and then index it. The web service response returns only the data to be indexed.
I want to delete the documents from index which are not part of the web service response during the crawl but were added to the index in the last crawl.
Is there any way to achieve the above or can I flush the full index programmatically in the connector code and then add the recent content to the index.

Marged is correct. A feed (which is what the connector can send to the GSA) of type full will purge the existing feed and replace it. Otherwise, your connector is going to have to manage state and prune out documents as you decided.

Thanks Marged and Michael for the help.. I guess i have to write the custom logic in connector to delete the data from index.

What you're trying to achieve is exactly what happens when you send a "full" content feed. This is from the documentation:
When the feedtype element is set to full for a content feed, the system deletes all the prior URLs that were associated with the data source. The new feed contents completely replace the prior feed contents. If the feed contains metadata, you must also provide content for each record; a full feed cannot push metadata alone. You can delete all documents in a data source by pushing an empty full feed.
Marged is correct that v4.x is the way to go in the future, but if you've already started this with the 3.x connector framework and you're happy with it there's no need to rush to upgrade it. All the related code is open source and 3.x won't disappear any time soon, there are too many 3rd party connectors based on it.

Related

Problems Deleting Descriptors in Apache Nifi Using Rest API

I am trying to use the rest API to dynamically update and control my Apache NiFi Flow. I am using Postman to explore the REST API but am having trouble deleting properties/descriptors.
My current process is to call a GET to this address - http://localhost:8080/nifi-api/processors/{ID}
I then modify the response as desired and do a PUT with the modified response as the body. If I add a descriptor or change the content of a descriptor it works ok. But if I try to delete a descriptor by removing it from the properties and descriptors area then nothing happens.
I still get a 200 OK response, but it is the same as the original.
I am using NiFi 1.1.2 on Windows.
The PropertyDescriptors are specified by the Processor in question. These are read-only values and describe the properties the Processor currently supports. In you want to remove a property, and it is optional, you should be able to remove the value for it by setting it's entry to null in properties object in your request.

How to parse RSS feeds with Spring Integration when pubDate not available?

I run into a problem parsing RSS feeds with spring-integration-feed. I followed the example at
https://spring.io/guides/gs/integration/
My feeds do not include a published date. According to the RSS specifications, the dates are not required.
As the pubDate is null, the entry is not added to the queue of SyndEntry. See FeedEntryMessageSource.java
Is there a workaround for this?
The FeedEntryMessageSource uses that to detect new entries, without it, you'd get all the entries on every poll.
The only work-around would be a custom message source - you can invoke it from an inbound channel adapter.
If you have a proposal for another mechanism to detect new posts, feel free to open an improvement JIRA Issue.

How can I index sub-community discussions and events?

I have written a custom crawler to index all the data from the connections seedlists
https:///forums/seedlist/myserver
When we started utilizing subcommunities, I double checked to make sure subcommunities behave practically the same as communities. They seem to, they have all the same properties in the Connections DB, just subs have a parent uuid. Got it.
I expected my crawler to find the sub communities discussions (basically just iterating through the atom feed with a Java XML parser) and pulling out the relevant information. Are subcommunities not published to this seedlist? If not, there does not seem to be a subcommunity specific seedlist.
We are currently on Connections 4.5
Thank you.
I have found the answer here.
http://www-10.lotus.com/ldd/appdevwiki.nsf/xpDocViewer.xsp?lookupName=IBM+Connections+4.5+API+Documentation#action=openDocument&res_title=Community_entry_content_ic45&content=pdcontent
There seems to be an additional element that links to the sub-community feed from within the community. A crawler will need to send a GET request to that link.

Combining metadata from multiple sources

In a SPA app using breeze, how would I go about combining metadata from multiple sources for related data so that I can use them in 1 manager on the client. For example, I might have the following
Entity Framework Metadata from WebAPI controller (e.g. Account)
Custom Metadata from DTOs (e.g. Invoices)
Data from a third party service with metadata provided from client side metadata (e.g. Invoice transmission result)
In each case the data has related properties so I might want to be able to use Account.Transactions.TransmissionResults
UPDATE
I have tried several ways of getting this to work but to no avail. From Jay's answer, it is not possible at present to update the metadata from the server once it has been retrieved, so if and until that changes (see breeze user voice issue) I am left with one of the following approaches
1 Retrieve metadata from the server from Entity Framework and add metadata on the client to add extra entities. This worked to a degree but I could not add navigation properties from entity types added on the client to entity types retrieved from the server because I cannot add the foreign key association to the entity retrieved from the server, again back to the need to modifying metadata after it has been retrieved.
2 Write the complete metadata by hand, which will work but makes maintainability that much harder and seems wrong to be manually writing mostly the same code that the designer would write.
3 Generate most of the code from Entity Framework as described in the docs and then update it afterwards to add in the custom entities. Again similar issues than with option 2, it seems hacky.
Anyone else tried something similar? Is there something I am missing, which I could be, I've only started with breeze and js.
Thanks
A breeze EntityManager can have metadata from any number of DataService endpoints, and you can manually add metadata (new EntityTypes) on the client at any point. The only current restriction is that once you have metadata from a specific service, you can't change it. ( We are considering reviewing the last restriction).
So the question is, what are you trying to do that you can't right now?

Sync mailchimp campaign click and open with some other database

I am working on mailchimp integration.
I need to pull campaign stats (opens and clicks) and put it in my local database.
Using mailchimp api i am getting the list of all the users with their action taken.
But my issue is how to keep data sync at all time.
Is there any way to skip that data from mailchimp api that i had synced already.
The problem is the entire data set can change between calls and there is no 'since' parameter... The only way to get an updated picture is to query all records and update....
Keeping stat "data synced at all times" really would just depend on your solution (have it query for updates when you/your users access that section...)
You could expidite the update process by keeping track of previous calls/updates with the timestamp (keep track of the timestamp and only update/add records that are newer than the last sync... )
As I said, there is currently no "since" command for the campaignEmailStatsAIMAll method (and no direct equivelent in the export API...)
A 'since' parameter would actually be a good feature... So if coding your own solution to track updates via the timestamp is undesirable, you may want to ask the question in the google group or post a feature request in the google code project:
http://code.google.com/p/mailchimp-api/
EDIT: I just opened the feature request as it may solve a similar issue for an upcomming project:
http://code.google.com/p/mailchimp-api/issues/detail?id=60

Resources