RethinkDB - Search contents of list? - rethinkdb

I have a rethinkDB with a complex JSON key structure that stores data and log files in the form of long text strings.
In each entry, I have something like:
{"serial": "fun",
"Time delay": 20,
"MAC address list": [
"0c:c4:c1:24:24:55",
"0c:c4:c1:24:24:56",
"0c:c4:0a:fd:e2:12"
]}
I need to search all of the (hundreds of thousands of) entries for the presence of a MAC address. Say I'm searching for "0c:c4:0a:fd:e2:12". How would I use the RethinkDB HTTP interface (java script based) to do that?
I'm essentially looking for the equivalent to python's "if X in list1".

Related

What's the term of data structure where field is missing if no value?

My data looks like this:
[
{"name":"Jimmy H.","title":"Mr."},
{"name": "Janice H."}
]
So, if field does not have value, then also the field name is missing. What's the proper term for that?
EDIT:
Basically I'm looking for a term that differentiates structure above from structure where every field name (even without value) is guaranteed to exist in every record.
The one in the example is a combination of well known structures, it seems indeed an array of maps. At least, in JavaScript it would be an array of objects, but those objects behave like maps the way they are used in the example.

RethinkDB text search?

I am trying to study some rethinkdb for my next project. My backend is in Haskell and rethink db haskell driver looks a bit better then mongodb. So I want to try it.
My question is how do you do simple text search with rethinkdb?
Nothing too complex. Just find field which value contains these words.
I assume this should be built in as even a smallest blog app needs a search facility of some kind, right?.
So I am looking for a mongodb equivalent of:
var search = { "$text": { "$search": "some text" } };
Thank you.
EDIT
I am not looking for regular expressions and the match function.
It is extremely slow for more or less large sets.
I does not have any notion of indexes.
It does not have any notion of stemming.
With the rethinkdb driver documented here
run h $ table "table" # R.filter (\row -> match "some text" (row ! "field"))

Wiktionary/MediaWiki Search & Suffix Filtering

I'm building an application that will hopefully use Wiktionary words and definitions as a data source. In my queries, I'd like to be able to search for all Wiktionary entries that are similar to user provided terms in either the title or definition, but also have titles ending with a specified suffix (or one of a set of suffixes).
For example, I want to find all Wiktionary entries that contain the words "large dog", like this:
https://en.wiktionary.org/w/api.php?action=query&list=search&srsearch=large%20dog
But further filter the results to only contain entries with titles ending with "d". So in that example, "boarhound", "Saint Bernard", and "unleashed" would be returned.
Is this possible with the MediaWiki search API? Do you have any recommendations?
This is mostly possible with ElasticSearch/CirrusSearch, but disabled for performance reasons. You can still use it on your wiki, or attempt smart search queries.
Usually for Wiktionary I use yanker, which can access the page table of the database. Your example (one-letter suffix) would be huge, but for instance .*hound$ finds:
Afghan_hound
Bavarian_mountain_hound
Foxhound
Irish_Wolfhound
Mahound
Otterhound
Russian_Wolfhound
Scottish_Deerhound
Tripehound
basset_hound
bearhound
black_horehound
bloodhound
boarhound
bookhound
boozehound
buckhound
chowhound
coon_hound
coonhound
covert-hound
covert_hound
coverthound
deerhound
double-nosed_andean_tiger_hound
elkhound
foxhound
gazehound
gorehound
grayhound
greyhound
harehound
heckhound
hell-hound
hell_hound
hellhound
hoarhound
horehound
hound
limehound
lyam-hound
minkhound
newshound
nursehound
otterhound
powder_hound
powderhound
publicity-hound
publicity_hound
rock_hound
rockhound
scent_hound
scenthound
shag-hound
sighthound
sleuth-hound
sleuthhound
slot-hound
slowhound
sluthhound
smooth_hound
smoothhound
smuthound
staghound
war_hound
whorehound
wolfhound

Data structure for dictionary app in WP

I'm creating a dictionary app for WP.
One of its features is the real-time lookup. While the use is inputing a word, the dictionary will automatically find the recommended-result that is close to what user inputed.
And when the user input "bility", the dictionary must find "reusability", "ability" as the recommened-result.
MY QUESTION IS: what data structures fits my need?
Hashtable and tree structure isn't possible in this case because hashtable can only perform the lookup when the word is fully inputed, and tree structure can just find something close to "userinputed*" ( assumed the user input "n", the dictionary using tree structure can just find "nice" or "night" but can't find "and" or "ten" )
inputed data sample : recommended-result sample
"bility" => "reuseability"; "responsibility"...
"n" => "and"; "ten"; "nice"; "night"
A suffix trie can find the longest common substring in a string.

What's the right database for this? Mongo, SQL, Couch or something else?

Let's say I've got a collection of 10 million documents that look something like this:
{
"_id": "33393y33y63i6y3i63y63636",
"Name": "Document23",
"CreatedAt": "5/23/2006",
"Tags": ["website", "shopping", "trust"],
"Keywords": ["hair accessories", "fashion", "hair gel"],
"ContactVia": ["email", "twitter", "phone"],
"Body": "Our website is dedicated to making hair products that are..."}
I would like to be able to query the database for an arbitrary number of, including 0 of, any of the 3 attributes of Tags, Keywords, and ContactVia. I need to be able to select via ANDS (this document includes BOTH attributes of X and Y) or ORs (this document includes attributes of X OR Y).
Example queries:
Give me the first 10 documents that have the tags website and
shopping, with the keywords matching "hair accessories or fashion"
and with a contact_via including "email".
Give me the second 20 documents that have the tags "website" or
"trust", matching the keywords "hair gel" or "hair accessories".
Give me the 50 documents that have the tag "website".
I also need to order these by either other fields in the documents
(score-type) or created or updated dates. So there are basically four "ranges" that are queried regularly.
I started out SQL-based. Then, I moved to Mongo because it had support for Arrays and hashes (which I love). But, it doesn't support more than one range using indexes, so my Mongo database is slow..because it can't use indexes and has to scan 10 million documents.
Is there a better alternative. This is holding up moving this application into production (and the revenue that comes with it). Any thoughts as to the right database or alternative architectures would be greatly appreciated.
I'm in Ruby/Rails if that matters.
When needing to do multiple queries on arrays, we found the best solution, at least for us, was to go with ElasticSearch. We get this, plus some other bonuses. And, we can reduce the index requirements for Mongo.. so it's a win/win.
My two cents are for MongoDB. Not only can your data be represented, saved, and loaded as raw Ruby hashes, but Mongo is modern and fast, and really, really easy to know. Here's all you need to do to start Mongo server:
mongod --dbpath /path/to/dir/w/dbs
Then to get the console , which is just a basic JavaScript console, just invoke mongo. And using it is just this simple:
require 'mongo'
db = Mongo::Connection.new['somedb']
db.stuff.find #=> []
db.stuff.insert({id: 'abcd', name: 'Swedish Chef', says: 'Bork bork bork!'})
db.stuff.find #=> [{id: 'abcd', name: 'Swedish Chef', says: 'Bork bork bork!'}]
db.stuff.update({id: 'abcd', {'$set' => {says: 'Bork bork bork!!!! (Bork)!'}}})
db.stuff.find #=> [{id: 'abcd', name: 'Swedish Chef', says: 'Bork bork bork!!!! (Bork)!'}]

Resources