How did the sphinx calculate the weight?

How did the sphinx calculate the weight? - full-text-search

Note:
This is a cross-post, it is firstly posted at the sphinx forum,however I got no answer, so I post it here.
First take a look at a example:
The following is my table(just for test used):
+----+--------------------------+----------------------+
| Id | title | body |
+----+--------------------------+----------------------+
| 1 | National first hospital | NASA |
| 2 | National second hospital | Space Administration |
| 3 | National govenment | Support the hospital |
+----+--------------------------+----------------------+
I want to search the contents from the title and body field, so I config the sphinx.conf
as shown followed:
--------The sphinx config file----------
source mysql
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =0000
sql_db = testfull
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT * FROM test
}
index mysql
{
source = mysql
path = var/data/mysql_old_test
docinfo = extern
mlock = 0
morphology = stem_en, stem_ru, soundex
min_stemming_len = 1
min_word_len = 1
charset_type = utf-8
html_strip = 0
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312
read_timeout = 5
max_children = 30
max_matches = 1000
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
pid_file = var/log/searchd_mysql.pid
log = var/log/searchd_mysql.log
query_log = var/log/query_mysql.log
}
------------------
Then I reindex the db and start the searchd daemon.
In my client side I set the attribute as:
----------Client side config-------------------
sc = new SphinxClient();
///other thing
HashMap<String, Integer> weiMap=new HashMap<String, Integer>();
weiMap.put("title", 100);
weiMap.put("body", 0);
sc.SetFieldWeights(weiMap);
sc.SetMatchMode(SphinxClient.SPH_MATCH_ALL);
sc.SetSortMode(SphinxClient.SPH_SORT_EXTENDED,"#weight DESC");
When I try to search "National hospital", I got the following output:
Query 'National hospital' retrieved 3 of 3 matches in 0.0 sec.
Query stats:
'nation' found 3 times in 3 documents
'hospit' found 3 times in 3 documents
Matches:
1. id=3, weight=101
2. id=1, weight=100
3. id=2, weight=100
The match number (three matched) is right,however the order of the result is not what I
wanted.
Obviously the document of id 1 and 2 should be the most closed items to the required
string( "National hospital" ), so in my opinion they should be given the largest
weights,but they are orderd at the last position.
I wonder if there is anyway to meet my requirement?
PS:
1)please do not suggestion me set the sortModel to :
sc.SetSortMode(SphinxClient.SPH_SORT_EXTENDED,"#weight ASC");
This may work for just this example, it will caused some other potinal problems.
2)Actuall the contents in my table is Chinese, I just use the "National Hosp..l" to make
a example.

1° You ask "National hospital" but sphinx search "nation" and "hospit" because
morphology = stem_en, stem_ru, soundex
2° You give weight
weiMap.put("title", 100);
weiMap.put("body", 0);
to unexisting text fields
sql_query = SELECT * FROM test
3° finaly my simple answer to main question
You sort by weight,
the third row has more weight because no words between nation and hospit

Related

Telegraf unable to pull route table information from Arista MIB

So I'm trying to collect routing stats from some Aristas.
When I run snmpwalk it all seems to work...
snmpwalk -v2c -c pub router.host ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.other = Gauge32: 3
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.connected = Gauge32: 8
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.static = Gauge32: 26
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.ospf = Gauge32: 542
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.bgp = Gauge32: 1623
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.attached = Gauge32: 12
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv4.internal = Gauge32: 25
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv6.other = Gauge32: 3
ARISTA-FIB-STATS-MIB::aristaFIBStatsTotalRoutesForRouteType.ipv6.internal = Gauge32: 1
But when I try to pull the stats with telegraf I get different information with missing context...
BGP,agent_host=10.45.100.20,host=nw01.ny5,hostname=CR.NY aristaFIBStatsTotalRoutesForRouteType=2i 1654976575000000000
BGP,agent_host=10.45.100.20,host=nw01.ny5,hostname=CR.NY aristaFIBStatsTotalRoutes=2260i 1654976575000000000
BGP,agent_host=10.45.100.20,host=nw01.ny5,hostname=CR.NY aristaFIBStatsTotalRoutesForRouteType=8i 1654976575000000000
BGP,agent_host=10.45.100.20,host=nw01.ny5,hostname=CR.NY aristaFIBStatsTotalRoutesForRouteType=63i 1654976575000000000
According to the MIB documentation..
https://www.arista.com/assets/data/docs/MIBS/ARISTA-FIB-STATS-MIB.txt
it is using IANA-RTPROTO-MIB.txt protocol definitions but I have no idea where to derive that information from as the retrieved data via telegraf isn't showing me anything. Anyone know how to deal with this?

First, you might want to enable telegraf to return the index of the returned rows by setting index_as_tag = true inside the inputs.snmp.table.
Then, add the following processors in your config:
# Parse aristaFIBStatsAF and aristaFIBStatsRouteType from index for BGP table
[[processors.regex]]
namepass = ["BGP"]
order = 1
[[processors.regex.tags]]
## Tag to change
key = "index"
## Regular expression to match on a tag value
pattern = "^(\\d+)\\.(\\d+)$"
replacement = "${1}"
## Tag to store the result
result_key = "aristaFIBStatsAF"
[[processors.regex.tags]]
## Tag to change
key = "index"
## Regular expression to match on a tag value
pattern = "^(\\d+)\\.(\\d+)$"
replacement = "${2}"
## Tag to store the result
result_key = "aristaFIBStatsRouteType"
# Rename index to aristaFIBStatsAF for BGP table with single index row
[[processors.rename]]
namepass = ["BGP"]
order = 2
[[processors.rename.replace]]
tag = "index"
dest = "aristaFIBStatsAF"
[processors.rename.tagdrop]
aristaFIBStatsAF = ["*"]
# Translate tag values for BGP table
[[processors.enum]]
namepass = ["BGP"]
order = 3
tagexclude = ["index"]
[[processors.enum.mapping]]
## Name of the tag to map
tag = "aristaFIBStatsAF"
## Table of mappings
[processors.enum.mapping.value_mappings]
0 = "unknown"
1 = "ipv4"
2 = "ipv6"
[[processors.enum.mapping]]
## Name of the tag to map
tag = "aristaFIBStatsRouteType"
## Table of mappings
[processors.enum.mapping.value_mappings]
1 = "other"
2 = "connected"
3 = "static"
8 = "rip"
9 = "isIs"
13 = "ospf"
14 = "bgp"
200 = "ospfv3"
201 = "staticNonPersistent"
202 = "staticNexthopGroup"
203 = "attached"
204 = "vcs"
205 = "internal"
Disclaimer: did not test this in telegraf, so there might be some typo's

How to split the reports in a single dataset to Multiple Datasets uisng JCL

A dataset has many reports in it. I need the first report alone to another dataset. How can we achieve using JCL?
Below is the sample how the dataset looks like. My requirement is to sort out only the records under R0A report.
---Report - R0A---
List of Payments
Date : 23/07/2021
Name Payment-Amt Due-Date
AAAA 233.04 15/08/2021
BBBB 38.07 16/08/2021
---Report - R0B---
List of Payments
Date : 23/07/2021
Name Payment-Amt Due-Date
AAAA 233.04 15/08/2021
BBBB 38.07 16/08/2021
---Report - R0C---
List of Payments
Date : 23/07/2021
Name Payment-Amt Due-Date
AAAA 233.04 15/08/2021
BBBB 38.07 16/08/2021

If the size of the reports is fixed, you can use sort with the COPY and STOPAFT= options:
SORT FIELDS=COPY,STOPAFT=6
If you need a report beyond the first, you can add the SKIPREC= option. E.g. to get the third report, specify:
SORT FIELDS=COPY,SKIPREC=12,STOPAFT=6
If the reports differ in length, you could run a simple REXX.
/* REXX - NOTE This is only a skeleton. Error checking must be added. */
/* This code has not been tested, so thorough testing is due. */
"ALLOC F(INP) DS('your.fully.qualed.input.data.set.name') SHR"
"EXECIO * DISKR INP ( STEM InpRec. FINISH"
"FREE F(INP)"
TRUE = 1
FALSE = 0
ReportStartIndicator = "---Report"
ReportName = "- R0B---"
ReportHeader = ReportStartIndicator ReportName
ReportCopy = FALSE
do ii = 1 to InpRec.0 while ReportCopy = FALSE
if InpRec.ii = ReportHeader
then ReportCopy = TRUE
end
if ReportCopy
then do
OutRec.1 = InpRec.ii
Outcnt = 1
do jj = ii + 1 to InpRec.0 while ReportCopy = TRUE
if word( InpRec.jj, 1 ) = ReportStartIndicator /* Start of next report? */
then ReportCopy = FALSE
else do
OutCnt = OutCnt + 1
OutRec.Outcnt = InpRec.jj
end
end
"ALLOC F(OUT) DS('your.fully.qualed.output.data.set.name')" ,
"NEW CATLG SPACE(......) RECFM(....) LRECL(....)"
"EXECIO" OutCnt "DISKW OUT ( STEM OutRec. FINIS"
"FREE F(OUT)"
say "Done copying report." OutCnt "records have been copied."
end
else do
say "Report" ReportName "not found."
exit 16
end
As written in the comment in the REXX, I haven't tested this code. Also, error checking need to be added, especially for TSO HOST commands (ALLOC, EXECIO, FREE).
All of the solutions copy a single report to another data set. In the title, you wrote to multiple datasets. I'm sure you'll find solutions for this using above single report solutions.

netcool omnibus probe rules

I have following probable values for $6 coming to netcool omnibus rules.
I would like to extract the InstanceName from $6
Eg: SQLSERVER1
SQL2012TESTRTM
SQL2012TESTSTD1
SQL2014STD
MSSQLSERVER
Below are $6 values
6 = "Microsoft.SQLServer.DBEngine:TM-B33F-FAD4.cap.dev.net;SQLSERVER1:1"
6 = "Microsoft.SQLServer.2012.Agent:TM-B33F-FAD4.cap.dev.net;SQL2012TESTRTM;SQLAgent$SQL2012TESTRTM:1"
6 = "Microsoft.SQLServer.2012.Agent:TM-B33F-FAD4.cap.dev.net;SQL2012TESTRTM;SQLAgent$SQL2012TESTRTM:1"
6 = "Microsoft.SQLServer.2012.Agent:TM-B33F-FAD4.cap.dev.net;SQL2012TESTSTD1;SQLAgent$SQL2012TESTSTD1:1"
6 = "Microsoft.SQLServer.Database:TM-B33F-FAD4.cap.dev.net;SQL2012TESTSTD1;DB2:1"
6 = "Microsoft.SQLServer.2012.Agent:TM-B33F-FAD4.cap.dev.net;SQL2012TESTRTM;SQLAgent$SQL2012TESTRTM:1"
6 = "Microsoft.SQLServer.2014.Agent:TM-B33F-FAD4.cap.dev.net;SQL2014STD;SQLAgent$SQL2014STD:1"
6 = "Microsoft.SQLServer.Database:TM-B33F-FAD4.cap.dev.net;SQL2012TESTSTD1;DB2:1"
6 = "Microsoft.SQLServer.2014.DBEngine:TM-B33F-FAD4.cap.dev.net;SQL2014STD:1"
6 = "Microsoft.SQLServer.Database:TM-B33F-FAD4.cap.dev.net;SQL2012TESTSTD1;DB2:1"
6 = "Microsoft.SQLServer.2014.Agent:TM-B33F-FAD4.cap.dev.net;SQL2014STD;SQLAgent$SQL2014STD:1"
6 = "Microsoft.SQLServer.Database:TM-B33F-FAD4.cap.dev.net;SQL2012TURKSTD1;DB1:1"
6 = "Microsoft.SQLServer.2014.DBFile:CTNTV01;MSSQLSERVER;SPOT;1;35:1"
6 = "Microsoft.SQLServer.Library.EventLogCollectionTarget:TM-B33F-FAD4.cap.dev.net:1"
I have tried below code to extract, it works for most of them above.
#temp = extract($6, ";([^\:]+)\:")
if (regmatch(#temp, "[\;]"))
{
#temp = extract(#temp, "([^\:]+)\;")
}
But it does not work for
Microsoft.SQLServer.2014.DBFile:CTNTV01;MSSQLSERVER;SPOT;1;35:1
I believe the second extract inside if statement needs to be corrected little more.
It extracts until MSSQLSERVER;SPOT;1, however I only want MSSQLSERVER from it.
Can you please help in correcting this.

Try with below.
#temp = extract($6, ";([^\:]+)\:")
if (regmatch(#temp, "[\;]"))
{
#temp = extract(#temp, "([^\;]+)\;")
}

advice to make my below Pig code simple

Here is my code and I do two group all operations and my code works. My purpose is to generate all student unique user count with their total scores, student located in CA unique user count. Wondering if good advice to make my code simple to use only one group operation, or any constructive ideas to make code simple, for example using only one FOREACH operation? Thanks.
student_all = group student all;
student_all_summary = FOREACH student_all GENERATE COUNT_STAR(student) as uu_count, SUM(student.mathScore) as count1,SUM(student.verbScore) as count2;
student_CA = filter student by LID==1;
student_CA_all = group student_CA all;
student_CA_all_summary = FOREACH student_CA_all GENERATE COUNT_STAR(student_CA);
Sample input (student ID, location ID, mathScore, verbScore),
1 1 10 20
2 1 20 30
3 1 30 40
4 2 30 50
5 2 30 50
6 3 30 50
Sample output (unique user, unique user in CA, sum of mathScore of all students, sum of verb Score of all students),
7 3 150 240
thanks in advance,
Lin

You might be looking for this.
data = load '/tmp/temp.csv' USING PigStorage(' ') as (sid:int,lid:int, ms:int, vs:int);
gdata = group data all;
result = foreach gdata {
student_CA = filter data by lid == 1;
student_CA_sum = SUM( student_CA.sid ) ;
student_CA_count = COUNT( student_CA.sid ) ;
mathScore = SUM(data.ms);
verbScore = SUM(data.vs);
GENERATE student_CA_sum as student_CA_sum, student_CA_count as student_CA_count, mathScore as mathScore, verbScore as verbScore;
};
Output is:
grunt> dump result
(6,3,150,240)
grunt> describe result
result: {student_CA_sum: long,student_CA_count: long,mathScore: long,verbScore: long}

first load the file(student)in hadoop file system. The perform the below action.
split student into student_CA if locationId == 1, student_Other if locationId != 1;
student_CA_all = group student_CA all;
student_CA_all_summary = FOREACH student_CA_all GENERATE COUNT_STAR(student_CA) as uu_count,COUNT_STAR(student_CA)as locationCACount, SUM(student_CA.mathScore) as mScoreCount,SUM(student_CA.verbScore) as vScoreCount;
student_Other_all = group student_Other all;
student_Other_all_summary = FOREACH student_Other_all GENERATE COUNT_STAR(student_Other) as uu_count,0 as locationOtherCount:long, SUM(student_Other.mathScore) as mScoreCount,SUM(student_Other.verbScore) as vScoreCount;
student_CAandOther_all_summary = UNION student_CA_all_summary, student_Other_all_summary;
student_summary_all = group student_CAandOther_all_summary all;
student_summary = foreach student_summary_all generate SUM(student_CAandOther_all_summary.uu_count) as studentIdCount, SUM(student_CAandOther_all_summary.locationCACount) as locationCount, SUM(student_CAandOther_all_summary.mScoreCount) as mathScoreCount , SUM(student_CAandOther_all_summary.vScoreCount) as verbScoreCount;
output:
dump student_summary;
(6,3,150,240)
Hope this helps :)
While solving your problem, I also encountered an issue with PIG. I assume it is because of improper exception handling done in UNION command. Actually, it can hang you command line prompt, if you execute that command, without proper error message. If you want I can share you the snippet for that.

The answer accepted has an logical error.
Try to have the below input file
1 1 10 20
2 1 20 30
3 1 30 40
4 2 30 50
5 2 30 50
6 3 30 50
7 1 10 10
The output will be
(13,4,160,250)
The output should be
(7,4.170,260)
I have modified the script to work correct.
data = load '/tmp/temp.csv' USING PigStorage(' ') as (sid:int,lid:int, ms:int, vs:int);
gdata = group data all;
result = foreach gdata {
student_CA_sum = COUNT( data.sid ) ;
student_CA = filter data by lid == 1;
student_CA_count = COUNT( student_CA.sid ) ;
mathScore = SUM(data.ms);
verbScore = SUM(data.vs);
GENERATE student_CA_sum as student_CA_sum, student_CA_count as student_CA_count, mathScore as mathScore, verbScore as verbScore;
};
Output
(7,4,160,250)

how to set value of cell in multiple record - laravel, ajax

I use AJAX mechanism to set create or modify records in this table:
table:
id | item_type | item_id | creator_id | attitude
1 | exemplar | 3 | 33 | 1
2 | exemplar | 4 | 33 | 0
3 | exemplar | 3 | 35 | 1
In plain English: there are many exemplars to choose for one user. A given user can only set only one exemplar to value 1. In this particular case Exemplar #3 is active (attitude = 1). I want to set its "attitude" to 0 and in the same controller method where I have the below code.
The below code creates a new record for an exemplar which has never been chosen before, or changes the value of 'attitude column.
$user_id = Auth::user()->id;
$countatt = $exemplar->attitudes()->where('creator_id', $user_id)->first();
if (!$countatt)
{
$countatt = new Userattitude;
$countatt->creator_id = $user_id;
$countatt->item_type = 'exemplar';
$countatt->item_id = $exemplar_id;
}
$countatt->attitude = $value; // $value = 1
$countatt->save();
Problem to solve:
1. how, using the best practices, set all other records of the same user (creator_id) and exemplar_id to 0
My best guess isbe to put the below 4 lines before the code quoted above:
$oldactive= Exemplar::where('creator_id', $user_id)->where(exemplar_id, $exemplar_id)->first();
$zeroing_attitude= $oldactive->attitudes()->first();
$zeroing_attitude->attitude = 0;
$zeroing_attitude->save();
;
The above solution works only in case when there is only one exemplar with value of 'attitude' set to 1. But in the future I want to allow users to have multiple exemplars active. I am not familiar with Eloquent enough to rewrite the logic for multiple active Exemplars.
Sometimes there will be no active Exemplars set, which means that this collection would be empty
$oldactive= Exemplar::where('creator_id', $user_id)->where(exemplar_id, $exemplar_id)->first();
How should I skip executing the rest of the code in such case? By adding IF as below?
if($oldactive) {}
Thank you.

$oldactive= Exemplar::where('creator_id', $user_id)->where(exemplar_id,$exemplar_id)->first();
foreach($oldactive->attitudes() as $zeroing_attitude){
$zeroing_attitude->attitude = 0;
$zeroing_attitude->save();
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How did the sphinx calculate the weight? - full-text-search

Related

Telegraf unable to pull route table information from Arista MIB

How to split the reports in a single dataset to Multiple Datasets uisng JCL

netcool omnibus probe rules

advice to make my below Pig code simple

how to set value of cell in multiple record - laravel, ajax

Categories

Resources