DSpace: Items only appear in Discovery after moving to another collection - move

I moved all the items from a collection to another. However, these items, on the source collection,didn't appear on the discovery. After the move, these same items appeared on the destination collection. Why these items didn't appear at the source collection before the move?
Still before the move: If I get these item's handle and try to access on a browser it works. Should it be a problem on discovery index?

There could be 2 causes for this issue.
the items need to be re-indexed. Depending on how the move was performed, the index may not have been updated.
If you are using XMLUI, the cocoon cache needs to be cleared
Here is my recommendation.
Since this is quick, clear the cocoon cache from the Admin->Control Panel->Java Information page.
It that does not resolve the issue, re-build your discovery index by running [dspace-install]/bin/dspace index-discovery -b
The re-index can take a while a while to complete. User search results will be impacted during the re-index process.

In addition to what terrywb said in his answer to this question, in order for automatic re-indexing to work, these things also need to be done:
The "discovery" event consumer must be enabled in your dspace.cfg
The solr data directory for the discovery index ([dspace]/solr/search/data) needs to be owned by the same user that tomcat runs under, so that the tomcat user can add/change/delete files and subdirectories
Automatic re-indexing should be triggered whenever you move items through the user interface or via bulk metadata editing.
Honestly, we've been through this before -- it would be helpful if you could give us more information on your original question rather than posting a new one.

Related

Removal of metadata added by feed

I have a GSA that fulfils a number of roles within my organisation. Honestly it's a bit of a frankenmess but it's what I have to work with.
One of the things we have it doing is indexing a number of sites based on a feed we pass it. Each of the items we pass in the feed gets tagged with a metadata that allows me to setup a frontend that only queries those items. This is working fine for the most part except that now I want to remove some metadata from items that are in the index (thereby stopping them from being in that particular frontend) and I can't figure out how.
I use a metadata-and-url type feed to push in these urls I want the system to be aware of. But it also finds a number of them through standard indexing patterns.
Heres the issue. The items that are in the index that have been found as a part of the standard crawling I can't remove. I just need the GSA to forget that I ever attached metadata to them.
Is this possible?
You can push a new feed that either updates the metadata or deletes the individual records that you want to remove from your frontend.
You can also block specific results from appearing in a specific frontend as a temporary measure while you work it out. See this doco.
It sounds like you would be better off using collections to group the subsets of the index that you want to present in a specific frontend.

My couchdb view is rebuilding for no reason

i have a couchdb with a database containing ~20M documents. it takes ~12h to build a single view.
i have saved 6 views successfully. they returned results quickly. at first.
after 2 days idle, i added another view. it took much longer to build, and it was a "nice-to-have", not a requirement, so i killed it after ~60% completion (restarted the windows service).
my other views now start re-building their indexes when accessed.
really frustrated.
additional info: disk had gotten within 65GB of full (1TB disk; local)
Sorry you have no choice but to wait for the views to rebuild here. However I will try to explain why this is happening. It won't solve your problem but perhaps it will help you understand what is happening and how to prevent it in future.
From the wiki
CouchDB view index filenames are based on the contents of the design document (not its name, ID or revision). This means that two design documents with identical view code will share view index files.
What follows is that if you change the contents by adding a new view or updating the existing one couchdb will rebuild the indexes.
So I think the most obvious solution is to add new views in new design docs. It will prevent re indexing of existing views and the new one will take whatever time it needs to index any way.
Here is another helpful answer that throws light on how to effectively use couchdb design documents and views.

Solr deployment strategies for 100% Up Time while creating whole index

I'm working on a Solr 3.6 with ASP.net MVC3 e-commerce project.
I've an index of appx. 1 lac products in Solr. There is some changes in requirements, and we need to rebuild the whole index. Whole indexing is taking almost 1 & half hour during which site needs to be down.
How can I rebuild the index and also keep the site live which serving contents from older index. What is best practices to reduce down time while rebuilding the whole index. I wish I can do it with 100% uptime.
Edit
I'm storing a several URLs into Solr data as stored field and hence, which are dynamically generated while adding data into Solr. If I deploy application on different sub domain like test.example.com then it takes wrong URL, wherein it will only work with example.com. So hosting an another application is not an option for me.
You can leverage the concept of multiple cores in Solr and thereby have a live core that users are currently searching against and a standby core where you can make schema changes, re-index, etc. Then using the SWAP command you can switch the live and standby cores without any user downtime. The swap will be handled internally by Solr and your users will never notice a difference.
As I see, there are several ways to correctly solve this problem:
Do not rebuild the whole index - just update necessary records on-the-fly when they changes, Solr can do it pretty simple
Create 2 Solr instances on different ports and use them one after one. When first is rebuilding, on the second you can use old index. And when first is rebuilt, you can use it until the index on second instance is rebuilt.
Add boolean field to your index named, for example, "old_index". And whem reindexing starts, update all currrent records and set old_index=1, then write somewhere in configuration that you looking for records with old_index==1. Than start reindexing, than delete old records. It can be done with Solr`s deleteByQuery and either atomatic update in Solr 4.x or manual update.

couchDB views inaccessible while updating

Sorry I couldn't think of a more descriptive title: We have an issue with updating couchDB views since they're inaccessible while the design doc is being reindexed. Is the only solution to allow stale views?
In one scenario, there are several couchDB nodes which replicate with each other. Updating a view in one will cause all couchDB nodes to reindex the design doc. Is it not possible to update the view on one node and then replicate out the result? I assume the issue there is that new docs could be inserted into other nodes while the one is reindexing.
In another scenario, we have several couchDB nodes which are read/write and replicate with each other. For web apps, there's another cluster with read-only couchDB nodes... they don't replicate out, but are replicated to from the read/write pool. A solution here could be to take a node out of the cluster, update the view and wait for it to reindex. However, won't that node be missing any documents that were created during reindexing? Is it possible for it to continue receiving document inserts while reindexing?
Are there other possible solutions? We're migrating to the second scenario, so that's what I'm primarily concerned with, but I'm wondering if there's a general solution for either case. Using stale views isn't an ideal scenario since reindexing can take a long time and it's a high-traffic site.
It's great to hear that you are having success with CouchDB.
I suggest you use the staging-and-upgrade technique described in the wiki. It requires a little preparation to get working, however once you have it working, it works very well without any human effort.

update app database regularly without needing an app update

I am working on a WP7 app that contains
CategoryGroups
Categories
Products
The rows for each of these entities are populated on first run of the application.
The issues is that when the app gets published, the rows in each of the entities will change (added, deleted, modified). I would like some suggestions on how I should handle this? Any pointers to existing code samples will be great?
I am using an object oriented database to store my entities. The app also allows the user to add their own entities (which get added to the database as personalized (flagged) entities). One solution I was thinking was to read an xml file from the server and then loop through the database entries and make the necessary modifications in the database. So, on the first run, all the entities will just get inserted. On subsequent runs, if the version number attribute in xml is different, then the system populated data is reloaded from xml but the user data is preserved.
Also, maybe only check for the new xml file on the server when internet connection is available and only periodically (like every 2 weeks).
Any other suggestions are welcome. If there is a simpler, cleaner way - please share.
Pratik
I think it's fair to say that this question has nothing to do with WP7 and everything to do with finding an efficient way to to compute and deliver update deltas.
Timestamp your items. When requesting an update, specify the time of last update. You server can trivially query for items newer than this and return a delta. At the client (ie in the phone) it is not necessary to store a last update time because you can simply add one second to the most recent timestamp in the items present on the phone.

Resources