wordforms on sphinx - full-text-search

If I use the wordforms file, to a word like this:
television > tv
If i search for television, i'd get results with TV on it, but i also want to have television results, is that possible?

Yes, that's how it works by default. But, you have to rebuild your index after changing the word forms. The mapping occurs at both index time and query time.

Related

IFTTT JavaScript filter - How to make case insensitive searches + How to search Include and Exclude sets of terms

First off I'm a total novice for Javascript, so please go gently. I'm aware of how people feel about having to now pay for IFTTT, but it's perfect for what I need.
I am using a more expansive version of this code below to capture certain keywords from Tweets to then generate emails if the search returns a positive result. This search works very nicely, except it is case sensitive which is a problem.
Yes, I know you can manipulate the twitter search to pick up specific words or phrases. I am very proficient in achieving searches this way. I am casting a wide net to pick up approx 120 search words or phrases which is too long to achieve through "OR" Twitter search parameters alone which is why I'm using this.
Q1 - I have tried adding item.toLowerCase() and just .toLowerCase() in various parts of the code so it wouldn't matter if the sentence case of the search term is different to that of the original tweet text case. I just can't get it to work though. I've seen various posts on here but I can't get any of them to work in IFTTT. I believe IFTTT doesn't accept REGEX either, which is annoying.
Any advice of how to get this code running so it's case-insensitive for text within IFTTT?
Q2 - I have approx 120 search terms for the tweet text to return positive results. There is a lot of junk that comes through with that. Does anyone know how to add a second layer of 'and exclude' search terms?
I have something like 300-400 words and specific phrases which would be used to stop the email from being triggered - so it'd be something like "IF tweet text contains a, b, c BUT text ALSO contains x, y, z... do not send the email"
let str=Twitter.newTweetFromSearch.Text;
let searchTerms=[
"Northbound",
"Westbound",
"Southbound",
"Eastbound"
]
let foundOne=0;
if(searchTerms.some(function(v){return str.indexOf(v)>=0;})){
foundOne=1;
}
if(foundOne==0){
Email.sendMeEmail.skip();
}
I have looked at the Twitter API, but that is a step too far for my coding ability which is why I'm using IFTTT.
Any help is very much appreciated
Thank you.
I'm playing with IFTTT Filter myself at the moment, so here are some thoughts about solving your solution.
If you want to do a case insensitive seatch on the original text, convert the original text to lowercase, then have all your search terms in lowercase.
Plus I think you want to iterate over the searchTerms array, and use the includes() method. Ok, just realised that .some() does the iteration for you, but I prefer includes() over indexof().
let str=Twitter.newTweetFromSearch.Text.toLowerCase();
let searchTerms=[
"northbound",
"westbound",
"southbound",
"eastbound"
]
let foundOne=0;
if(searchTerms.some(function(term){return str.includes(term);})){
foundOne=1;
}
if(foundOne==0){
Email.sendMeEmail.skip();
}
Or you could just skip having the foundOne variable, and do the search in the if() statement.
let str=Twitter.newTweetFromSearch.Text.toLowerCase();
let searchTerms=[
"northbound",
"westbound",
"southbound",
"eastbound"
]
if(!searchTerms.some(function(term){return str.includes(term);})){
Email.sendMeEmail.skip();
}

Elasticsearch Result more like google

Is there posibilities to load only that terms what i am looking for ? Example if i search [ expert ] word it will look in suggested_tags and output result according to that ?
Right now if i type child it takes all terms and it makes file size big and it takes time to show autocomplete.
Here is Image
When we search in google it shows result like this :
Google Output
So it is possible in elasticsearch ? because i don't want to load all suggested_tags while searching only load 4 or 5 related to search and doesn't metter for position.
Thanks

GSA includes the keyword start in the search results when I am searching for restart

I am facing a strange issue with a specific search.
I would expect the below two queries to return the same result set since space between keywords interpreted as AND anyway:
1) inurl:taskcracker Angela restart crash
2) inurl:taskcracker Angela AND restart crash
First one returns 42 results where the highlighted keywords on the search result page includes 'start' in addition to 'restart'. Whereas the second query returns only 2 results with 'restart' only (no 'start') which what I was expecting from the first search as well.
Please note that it does not matter whether I put an AND in front of other keywords on the 2nd query. It only makes a difference when I put or don't put an AND in front of 'restart'
I initially thought that maybe restart is in the synonyms list under Search > Query Settings > Synonym Data > English in the GSA admin panel but it is not there.
So the issue is when I don't put an explicit AND in front of 'restart' GSA expands it to include 'start' as well.
Any ideas whether this comes from a configuration somewhere on the admin panel or likely to be a bug?
This is probably because query expansion doesn't work when you put AND before "restart".
Can you try adding &entqr=0 to the URL being sent to the GSA? It controls the the query expansion policy. You can also try &entqrm=0 too if first one doesn't work.

Wiktionary/MediaWiki Search & Suffix Filtering

I'm building an application that will hopefully use Wiktionary words and definitions as a data source. In my queries, I'd like to be able to search for all Wiktionary entries that are similar to user provided terms in either the title or definition, but also have titles ending with a specified suffix (or one of a set of suffixes).
For example, I want to find all Wiktionary entries that contain the words "large dog", like this:
https://en.wiktionary.org/w/api.php?action=query&list=search&srsearch=large%20dog
But further filter the results to only contain entries with titles ending with "d". So in that example, "boarhound", "Saint Bernard", and "unleashed" would be returned.
Is this possible with the MediaWiki search API? Do you have any recommendations?
This is mostly possible with ElasticSearch/CirrusSearch, but disabled for performance reasons. You can still use it on your wiki, or attempt smart search queries.
Usually for Wiktionary I use yanker, which can access the page table of the database. Your example (one-letter suffix) would be huge, but for instance .*hound$ finds:
Afghan_hound
Bavarian_mountain_hound
Foxhound
Irish_Wolfhound
Mahound
Otterhound
Russian_Wolfhound
Scottish_Deerhound
Tripehound
basset_hound
bearhound
black_horehound
bloodhound
boarhound
bookhound
boozehound
buckhound
chowhound
coon_hound
coonhound
covert-hound
covert_hound
coverthound
deerhound
double-nosed_andean_tiger_hound
elkhound
foxhound
gazehound
gorehound
grayhound
greyhound
harehound
heckhound
hell-hound
hell_hound
hellhound
hoarhound
horehound
hound
limehound
lyam-hound
minkhound
newshound
nursehound
otterhound
powder_hound
powderhound
publicity-hound
publicity_hound
rock_hound
rockhound
scent_hound
scenthound
shag-hound
sighthound
sleuth-hound
sleuthhound
slot-hound
slowhound
sluthhound
smooth_hound
smoothhound
smuthound
staghound
war_hound
whorehound
wolfhound

tweepy streaming track filter results

It seems not all the tweets I get using filter contain the item ("health" in this case). How could I get only tweets contain this specific item? Anyone can help me?
Thanks so much in advance!!
This is the line when I use filter:
sapi.filter(locations=[-79.55, 37.883, -75.067, 39.717],track = ["health"])
Unfortunately, the Streaming API does not allow filtering by both location and terms. From the docs:
Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.
So essentially the reason you are seeing some tweets that do not contain the word "health" is because you are receiving tweets containing the word "health", OR located within your bounding box (in this case, locations=[-79.55, 37.883, -75.067, 39.717]).
You can, however, try to filter by your term(s) then parse through the tweet data for the location, or alternately filter by location then search the tweet text for your term(s). I would probably suggest the latter if location is necessary to limit the scope of your tweet consumption.
It is very easy you just need to add this line in your code.
twitterStream.filter(track=["health"])

Resources