Delete hanging ferret index entry after record deleted

Delete hanging ferret index entry after record deleted - ruby

I'm using ActsAsFerret which is a Rails wrapper for the Ruby-based Ferret search engine.
Sometimes it seems a record is deleted using SQL, rather than via ActiveRecord (ie record.destroy), and when this happens an index entry is left in the Ferret index.
I can see it in this example, where School #136574 has left a hanging index entry: In the search below I'm using the lazy option, which says "Just get the data out of the Ferret index and don't bother hitting the database":
>> #schools = School.search(params)
=> [#<FerretResult wrapper for School with id 136574, #<FerretResult wrapper for School with id 55814]
>> #schools.collect(&:id)
ActiveRecord::RecordNotFound: Couldn't find School with ID=136574
This causes a serious problem because if you try and do anything with the results it does School.find(id) and you get an exception like above.
I did School.rebuild_index, thinking that would fix it, but it hasn't.
I can work around it - it seems that calling .collect on the results has been overridden to convert them into their corresponding object (causing the exception), and I can do something like this:
#schools = #schools.select{|result| School.find_by_id(result.id)};#schools.size
This does work, but it's slow, and a bit of a pain in the ass.
Does anyone know a way to delete the hanging index records?

Related

Spring Data JPA fetching list always returns at least a single result

I've noticed a slight problem with how my API is working where I'm using Spring Data JPA.
My query looks something along the lines of:
#Query("SELECT p.id AS id, COUNT(l) AS likes FROM Post p LEFT JOIN Like l ON l.post = p WHERE p.location.id = ?1")
My actual query is bigger, this this contains everything necessary to explain what the issue is. This query will return a list, but assume the location does not exist, it should return null or an empty list, correct? Oh, how wrong you are, my sweet summer child!
This query will instead always return a list of at least one element, regardless of whether or not there are any posts linked to said location.
[{"id": null, "likes": 0}]
That is what the result looks like when serialized to JSON. I am not quite sure what to do about this little predicament, as I obviously don't want to return a list with faulty data, but needing to use processing to filter out duds also seems dumb and unnecessary.
Is there any way to prevent this that I've yet to find? If it is of any relevance, I am using projections currently for my responses.
What I've tried so far:
Adding a not null condition for fields. Does not work, ignored by COUNT.
Adding constraints to all fields #NotNull. Does not work, will still become null.
For what it's worth, I've tried different kinds of joins, though anything but LEFT JOIN doesn't make much sense.
I haven't been able to find any other case which resembles this either, although it most likely exists, but is drowned out by everything else. I'm not quite sure what can be done in this regard, so I'm curious if it's just a quirk with the framework, or if there is an actual solution.
It might be possible to solve through native queries, but I would prefer not to use them.

I'm no SQL expert but I believe that a left join will give you this result if the ID does not exist.
Have you run the query in your DB? Doesn't it give you one row in your result set for IDs that do not exist?
I believe this is intended to say there is a 0 match.
You might want to validate your query before running it. Meaning checking that the location exists first.

As the issue is inherently due to a COUNT and CASE keyword in my real query, resulting in there always being at least one row, and I can't find any method of doing this automatically, the solution I've used is the following:
List<Item> items = repository.customQuery(id);
if (0 < items.size() && null == items.get(0).getId()) {
items.remove(0);
}
The first condition is arbitrary as I know there is always at least one entry, but is done just as a safety measure. A try-catch block would do the trick as well. In the case where you use a primitive int instead of Integer, you'd need to initialize the value in the constructor to something which would normally never be present in the database, such as -1.
If anyone knows of a better method, I'd love to know about it.

How can I get Watir to make a fresh reference on a non-stale element?

A portion of some tests I am writing calls for checking if an option gets removed from a select list once that option has been used. I am inconsistently getting this error: timed out after 60 seconds, waiting for {:xpath=>"//select[#id = 'newIdentifierType']//option", :index=>31} to be located (Watir::Exception::UnknownObjectException)
It causes my test to fail maybe 2-3 times out of 10 runs and seems kind of random. I think Watir is looking for the "old" select list with this ID since it caches the element and may also include that it had 32 items, but it times out since a select list with this ID and 32 items no longer exists. The new select list has the same ID but only 31 items.
Is there a way to always get a new reference on this element even though it's not technically going stale? Am I experiencing this problem due to a different issue?
My current code for getting the options in the select list:
#browser.elements(:xpath => "//select[#id = 'newIdentifierType']//option")
I am using Ruby/Cucumber with Selenium and Watir Webdriver level. I first tried defining the element as a select_list in a page-object but moved it to the step definitions using #browser.element to see if that would stop the timeout. I thought it may ignore Watir's cached elements and get the most current one with the ID, but that does not appear to be the case.

Please avoid using XPath with Watir. Everything you can do with XPath, Watir has a much more readable API to handle.
To check for a specific option not being there, you should avoid collections and locate directly:
el = browser.select_list(id: "newIdentifierType").option(value: "31"))
# or
el = browser.select_list(id: "newIdentifierType").option(text: "This one"))
Then to see if it has gone away:
el.stale?
# or
el.wait_until(:stale?)
That won't test the right thing if the entire DOM has changed, though, so you might need to just relocate:
browser.select_list(id: "newIdentifierType").option(text: "This one")).present?
If you are intent on using a collection, the correct way to get the list of options is:
options = #browser.select(id: 'newIdentifierType').options
el = options.find { |o| o.text == 'This one' }
# Do things
el.stale?

Laravel 5.3 and Redis (predis) - autoincrement hash and delete hash `row`

I've been flirting with Redis for a while now.
I've watched these series some time ago and they were awesome. I've been through some of the documentation and the mentioning of the Time complexity of the queries blew me away, this is something that's rarely mentioned in web materials but is of huge importance for app building.
Anyhow I'm trying to make my app use the Redis on the consumer end so the users can fetch the data as fast as possible.
So I'm trying to save some objects to hash as:
$redis->hmset("taxi_car", array(
"brand" => "Toyota",
"model" => "Yaris",
"license number" => "RO-01-PHP",
"year of fabrication" => 2010,
"nr_stats" => 0)
as found here and this works nicely.
However I can't find a way to delete the whole entry anywhere.
Did I get this hash thing wrong?
Following this example I would like to delete the entry with given licence number. All I could find is how to delete the licence number from the object:
$redis->hdel("taxi_car", "license number");
and can't figure out how to delete the whole hash row (please do correct with proper word for row here).
Another problem here is that it seems this only allows me to save a single taxi_car in the Redis. How do I set the UUID so I can have multiple Taxi cars?
I'm going to play with this a bit, any help is welcome. Thanks!

To delete a key of any type, Hash included, call the Redis DEL command.
To have multiple keys, give them different names, e.g. taxi_car:1, taxi_car:2 etc.

Solr query conundrum

I've recently swapped from using Lucene for Sitecore to Solr.
For the most part it has been smooth, but the way I was writing some queries (using Sitecore.ContentSearch.Linq) abstraction now don't seem to be compatible.
Specifically, I have a situation where I've got "global" content and "regional" content, like so:
Home (000)
X
Y
Z
Regions (ID: 111)
Region 1 (ID: 221)
A
B
Region 2 (ID: 222)
D
My code worked on Lucene, but now doesn't on Solr. It should find all "global" and a single region's content, excluding all other region's content. So as an example, if the user's current region was Region 1, I'd want the query to return content X, Y, Z, A, B.
Sitecore's Item Crawler has a field for each item in the index called "_path" which is a multivalued string field of IDs, so as an example, Region 1's _path field value would be [000, 111, 221 ].
When I write this using the Linq abstraction it comes out as below which doesn't return results.
-_path:(111) OR _path:(221)
But _path:(111) does return result. Mind blown.
When I use the Solr interface and wrap each side of the OR in extra brackets like below (which I'd consider redundant) it works! Mind blown v2.
(-_path:(111)) OR (_path:(221))
Firstly, what's the difference between those queries?
Secondly, my real problem is I can't add these extra brackets as I'm working in an abstraction Linq so the brackets will be "optimized" out.
Any advice would be awesome! Cheers.

The problem here is, lucene's negative queries don't work like you think they do. They only remove results from what has been found. -_path:111 doesn't find all documents which aren't in 111, it doesn't find anything at all. It only removes results. So you are finding all results with path "221", then removing any that also have path "111", which from your heirarchy, I assume is all of them. See my answer here for a bit more on that topic.
The OR makes it seem like it ought to work, but really -_path:(111) OR _path:(221) is the same as -_path:(111) _path:(221). The moral here is: Don't use Lucene's AND/OR/NOT syntax, if you can help it. Use +/-. +/- syntax actually expresses how the query operates, AND/OR/NOT doesn't. It attempts to shoehorn it into a different, SQL-like retrieval model and leads to some unexpected behavior like this.
So, what about: (-_path:(111)) OR (_path:(221))
Well, first, does it actually work? Or does it just get some results?
If it just gets some results, but just seems to get the same results as _path:221: The reason is -_path:111 gets no results, so your query is, in practice, something like: (nothing) OR (_path:221), which is equivalent to _path:221
If it really does get the results you expect (I'm guessing it probably does): Something is translating your query into something like: (*:* -_path:111) (_path:221). Solr does have some logic along these lines, though I'm not quite sure in this case. Essentially, it puts a match-all in front of any lonely negative queries it finds, allowing them to do what you were expecting. If the implicit *:* makes you nervous about performance, well, it should. But lucene is an inverted index, it does well with finding matches on a term quickly. Getting everything that doesn't match goes against the grain of that retrieval model, and will pretty much have to do a full scan of the index.

Issues with Sinatra and Heroku

So I've created and published a Sinatra app to Heroku without any issues. I've even tested it locally with rackup to make sure it functions fine. There are a series of API calls to various places after a zip code is consumed from the URL, but Heroku just wants to tell me there is an server error.
I've added an error page that tries to give me more description, however, it tells me it can't perform a `count' for #, which I assume means hash. Here's the code that I think it's trying to execute...
if weather_doc.root.elements["weather"].children.count > 1
curr_temp = weather_doc.root.elements["weather/current_conditions/temp_f"].attributes["data"]
else
raise error(404, "Not A Valid Zip Code!")
end
If anyone wants to bang on it, it can be reached at, http://quiet-journey-14.heroku.com/ , but there's not much to be had.

Hash doesn't have a count method. It has a length method. If # really does refer to a hash object, then the problem is that you're calling a method that doesn't exist.

That # doesn't refer to Hash, it's the first character of #<Array:0x2b2080a3e028>. The part between the < and > is not shown in browsers (hiding the tags themselves), but visible with View Source.
Your real problem is not related to Ruby though, but to your navigation in the HTML or XML document (via DOM). Your statement
weather_doc.root.elements["weather"].children.count > 1
navigates the HTML/XML document, selecting the 'weather' elements, and (tries to) count the children. The result of the children call does not have a method count. Use length instead.
BTW, are you sure that the document contains a tag <weather>? Because that's what your're trying to select.

If you want to see what's behind #, try
raise probably_hash.class.to_s

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio