I'm stuck on creating an algorithm as follows. I know this shouldn't be too difficult, but I simply can't get my head around it, and can't find the right description of this kind of pattern.
Basically I need a multi-level counter, where when a combination exist in the database, the next value is tried by incrementing from the right.
1 1 1 - Start position. Does this exist in database? YES -> Extract this and go to next
1 1 2 - Does this exist in database? YES -> Extract this and go to next
1 1 3 - Does this exist in database? YES -> Extract this and go to next
1 1 4 - Does this exist in database? NO -> Reset level 1, move to level 2
1 2 1 - Does this exist in database? YES -> Extract this and go to next
1 2 2 - Does this exist in database? NO -> Reset level 2 and 1, move to level 3
2 1 1 - Does this exist in database? YES -> Extract this and go to next
2 1 2 - Does this exist in database? YES -> Extract this and go to next
2 1 3 - Does this exist in database? NO -> Reset level 1 and increment level 2
2 2 1 - Does this exist in database? YES -> Extract this and go to next
2 2 2 - Does this exist in database? YES -> Extract this and go to next
2 2 3 - Does this exist in database? YES -> Extract this and go to next
2 3 1 - Does this exist in database? NO -> Extract this and go to next
3 1 1 - Does this exist in database? NO -> Extract this and go to next
3 2 1 - Does this exist in database? NO -> End, as all increments tried
There could be more than three levels, though.
In practice, each value like 1, 2, etc is actually a $value1, $value2, etc. containing a runtime string being matched against an XML document. So it's not just a case of pulling out every combination already existing in the database.
Assuming, the length of the DB key is known upfront, here's one way how it can be implemented. I'm using TypeScript but similar code can be written in your favorite language.
First, I declare some type definitions for convenience.
export type Digits = number[];
export type DbRecord = number;
Then I initialize fakeDb object which works as a mock data source. The function I wrote will work against this object. This object's keys are representing the the database records' keys (of type string). The values are simple numbers (intentionally sequential); they represent the database records themselves.
export const fakeDb: { [ dbRecordKey: string ]: DbRecord } = {
'111': 1,
'112': 2,
'113': 3,
'211': 4,
'212': 5,
'221': 6,
'311': 7,
};
Next, you can see the fun part, which is the function that uses counterDigits array of "digits" to increment depending on whether the record presence or absence.
Please, do NOT think this is the production-ready code! A) there are unnecessary console.log() invocations which only exist for demo purposes. B) it's a good idea to not read a whole lot of DbRecords from the database into memory, but rather use yield/return or some kind of buffer or stream.
export function readDbRecordsViaCounter(): DbRecord[] {
const foundDbRecords: DbRecord[] = [];
const counterDigits: Digits = [1, 1, 1];
let currentDigitIndex = counterDigits.length - 1;
do {
console.log(`-------`);
if (recordExistsFor(counterDigits)) {
foundDbRecords.push(extract(counterDigits));
currentDigitIndex = counterDigits.length - 1;
counterDigits[currentDigitIndex] += 1;
} else {
currentDigitIndex--;
for (let priorDigitIndex = currentDigitIndex + 1; priorDigitIndex < counterDigits.length; priorDigitIndex++) {
counterDigits[priorDigitIndex] = 1;
}
if (currentDigitIndex < 0) {
console.log(`------- (no more records expected -- ran out of counter's range)`);
return foundDbRecords;
}
counterDigits[currentDigitIndex] += 1;
}
console.log(`next key to try: ${ getKey(counterDigits) }`);
} while (true);
}
The remainings are some "helper" functions for constructing a string key from a digits array, and accessing the fake database.
export function recordExistsFor(digits: Digits): boolean {
const keyToSearch = getKey(digits);
const result = Object.getOwnPropertyNames(fakeDb).some(key => key === keyToSearch);
console.log(`key=${ keyToSearch } => recordExists=${ result }`);
return result;
}
export function extract(digits: Digits): DbRecord {
const keyToSearch = getKey(digits);
const result = fakeDb[keyToSearch];
console.log(`key=${ keyToSearch } => extractedValue=${ result }`);
return result;
}
export function getKey(digits: Digits): string {
return digits.join('');
}
Now, if you run the function like this:
const dbRecords = readDbRecordsViaCounter();
console.log(`\n\nDb Record List: ${ dbRecords }`);
you should see the following output that tells you about the iteration steps; as well as reports the final result in the very end.
-------
key=111 => recordExists=true
key=111 => extractedValue=1
next key to try: 112
-------
key=112 => recordExists=true
key=112 => extractedValue=2
next key to try: 113
-------
key=113 => recordExists=true
key=113 => extractedValue=3
next key to try: 114
-------
key=114 => recordExists=false
next key to try: 121
-------
key=121 => recordExists=false
next key to try: 211
-------
key=211 => recordExists=true
key=211 => extractedValue=4
next key to try: 212
-------
key=212 => recordExists=true
key=212 => extractedValue=5
next key to try: 213
-------
key=213 => recordExists=false
next key to try: 221
-------
key=221 => recordExists=true
key=221 => extractedValue=6
next key to try: 222
-------
key=222 => recordExists=false
next key to try: 231
-------
key=231 => recordExists=false
next key to try: 311
-------
key=311 => recordExists=true
key=311 => extractedValue=7
next key to try: 312
-------
key=312 => recordExists=false
next key to try: 321
-------
key=321 => recordExists=false
next key to try: 411
-------
key=411 => recordExists=false
------- (no more records expected -- ran out of counter's range)
Db Record List: 1,2,3,4,5,6,7
It is strongly recommended to read the code. If you want me to describe the approach or any specific detail(s) -- let me know. Hope, it helps.
Related
I have a bunch of codes indicating the stages a person has been in my data displayed horizontally as shown below.
Name code1 code2 code3 code4
A 2 3. 4 Null
B 2 5 4 7
C 1 3 4 5
D 0 9 Null Null
I have another file which has all the valid codes.
ID Value
1 3
2 4
3 5
4 6
5 7
What I would like to do is validate all the columns cell by cell against this lookup and indicate 0 if they are valid and null if they are not valid.
I'm using Apache Spark 1.5.2 and I would like to do this the efficient way. I've tried bunch of combinations and only thing close to what I want I've come is using concat on the cells and then explode it as normalized table and then perform lookups.
You can do this very simply with a single pass through the data, without any joins or explode by code-generating a validation expression:
// Simulate the data
case class Record(Name: String, code1: Option[Int], code2: Option[Int])
val dfData = sc.parallelize(Seq(
Record("A", Some(3), Some(4)),
Record("B", Some(3), None)
)).toDF.registerTempTable("my_data")
// Simulate the lookup table
val dfLookup = sc.parallelize(Seq((1,3), (2,4))).toDF("ID", "Value")
// Build a validation expression
val validationExpression = dfLookup.collect.map{ row =>
s"code${row.getInt(0)} = ${row.getInt(1)}"
}.mkString(" and ")
// Add an is_valid column to the data
sql(s"select *, nvl($validationExpression, false) as is_valid from my_data").show
This produces:
defined class Record
dfData: Unit = ()
dfLookup: org.apache.spark.sql.DataFrame = [ID: int, Value: int]
validationExpression: String = code1 = 3 and code2 = 4
+----+-----+-----+--------+
|Name|code1|code2|is_valid|
+----+-----+-----+--------+
| A| 3| 4| true|
| B| 3| null| false|
+----+-----+-----+--------+
I am trying RxJS.
My use case is to parse a log file and group lines by topic ( i.e.: the beginning of the group is the filename and then after that I have some lines with user, date/time and so on)
I can analyse the lines using regExp. I can determine the beginning of the group.
I use ".scan" to group the lines together, when I've the beginning of new group of line, I create an observer on the lines I've accumulated ... fine.
The issue is the end of the file. I've started a new group, I am accumulating lines but I can not trigger the last sequence as I do not have the information that the end. I would have expect to have the information in the complete (but not)
Here is an example using number. Begin of group can multi of 3 or 5. (remark: I work in typescript)
import * as Rx from "rx";
let r = Rx.Observable
.range(0, 8)
.scan( function(acc: number[], value: number): number[]{
if (( value % 3 === 0) || ( value % 5 === 0)) {
acc.push(value);
let info = acc.join(".");
Rx.Observable
.fromArray(acc)
.subscribe( (value) => {
console.log(info, "=>", value);
});
acc = [];
} else {
acc.push(value);
}
return acc;
}, [])
.subscribe( function (x) {
// console.log(x);
});
This emit:
0 => 0
1.2.3 => 1
1.2.3 => 2
1.2.3 => 3
4.5 => 4
4.5 => 5
6 => 6
I am looking how to emit
0 => 0
1.2.3 => 1
1.2.3 => 2
1.2.3 => 3
4.5 => 4
4.5 => 5
6 => 6
7.8 => 7 last items are missing as I do not know how to detect end
7.8 => 8
Can you help me, grouping items?
Any good idea, even not using scan, is welcome.
Thank in advance
You can use the materialize operator. See the documentation here and the marbles here, and an example of use from SO.
In your case, I would try something like (untested but hopefully you can complete it yourself, note that I don't know a thing about typescript so there might be some syntax errors):
import * as Rx from "rx";
let r = Rx.Observable
.range(0, 8)
.materialize()
.scan( function(acc: number[], materializedNumber: Rx.Notification<number>): number[]{
let rangeValue: number = materializedNumber.value;
if (( rangeValue % 3 === 0) || ( rangeValue % 5 === 0)) {
acc.push(rangeValue);
generateNewObserverOnGroupOf(acc);
acc = [];
} else if ( materializedNumber.kind === "C") {
generateNewObserverOnGroupOf(acc);
acc = [];
} else {
acc.push(rangeValue);
}
return acc;
}, [])
// .dematerialize()
.subscribe( function (x) {
// console.log(x);
});
function generateNewObserverOnGroupOf(acc: number[]) {
let info = acc.join(".");
Rx.Observable
.fromArray(acc)
.subscribe( (value) => {
console.log(info, "=>", value);
});
The idea is that the materialize and dematerialize works with notifications, which encodes whether the message being passed by the stream is one of next, error, completed kinds (respectively 'N', 'E', 'C' values for the kind property). If you have a next notification, then the value passed is in the value field of the notification object. Note that you need to dematerialize to return to the normal behaviour of the stream so it can complete and free resources when finished.
I try to create a linq query but unfortunately I have no ideas to resolve my problem.
I would like get the highest entry of all customers and form this result only 5 entries sort by date.
ID Date ID_Costumer
1 - 01.01.2014 - 1
2 - 02.01.2014 - 2
3 - 02.01.2014 - 1
4 - 03.01.2014 - 1 --> this value
5 - 04.01.2014 - 3
6 - 05.01.2014 - 3 --> this value
7 - 05.01.2014 - 4
8 - 06.01.2014 - 4 --> this value
9 - 08.01.2014 - 5 --> this value
10 - 09.01.2014 - 6 --> this value
I try it with this query
var query = from g in context.Geraete
where g.Online && g.AltGeraet == false
select g;
query.GroupBy(g => g.ID_Anbieter).Select(g => g.Last());
query.Take(5);
but it doesn't work.
You should assign results of selecting last item from group back to query variable:
query = query.GroupBy(g => g.ID_Anbieter).Select(g => g.Last());
var result = query.Take(5);
Keep in mind - operator Last() is not supported by Linq to Entities. Also I think you should add ordering when selecting latest item from each group, and selecting top 5 latest items:
var query = from g in context.Geraete
where g.Online && !g.AltGeraet
group g by g.ID_Anbieter into grp
select grp.OrderByDescending(g => g.Date).First();
var result = query.OrderByDescending(x => x.Date).Take(5);
Alrightie, so I'm building an CSV file this time with ruby. The outer loop will run up to length of num_of_loops, but it runs for an entire set rather than up to the specified row. I want to change the first column of a CSV file to a new name for each row.
If I do this:
class_days = %w[Wednesday Thursday Friday]
num_of_loops = (num_of_loops / class_days.size).ceil
num_of_loops.times {
["Wednesday","Thursday","Friday"].each do |x|
data[0] = x
data[4] = classname()
# Write all to file
#
csv << data
end
}
Then the loop will run only 3 times for a 5 row request.
I'd like it to run the full 5 rows such that instead of stopping at Wed/Thurs/Fri it goes to Wed/Thurs/Fri/Wed/Thurs instead.
class_days = %w[Wednesday Thursday Friday]
num_of_loops.times do |i|
data[0] = class_days[i % class_days.size]
data[4] = classname
csv << data
end
The interesting part is here:
class_days[i % class_days.size]
We need an index into class_days that is between 0 and class_days.size - 1. We can get that with the % (modulo) operator. That operator yields the remainder after dividing i by class_days.size. This table shows how it works:
i i % 3
0 0
1 1
2 2
3 0
4 1
5 2
...
The other key part is that the times method yields indices starting with 0.
Note:
This is a cross-post, it is firstly posted at the sphinx forum,however I got no answer, so I post it here.
First take a look at a example:
The following is my table(just for test used):
+----+--------------------------+----------------------+
| Id | title | body |
+----+--------------------------+----------------------+
| 1 | National first hospital | NASA |
| 2 | National second hospital | Space Administration |
| 3 | National govenment | Support the hospital |
+----+--------------------------+----------------------+
I want to search the contents from the title and body field, so I config the sphinx.conf
as shown followed:
--------The sphinx config file----------
source mysql
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =0000
sql_db = testfull
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT * FROM test
}
index mysql
{
source = mysql
path = var/data/mysql_old_test
docinfo = extern
mlock = 0
morphology = stem_en, stem_ru, soundex
min_stemming_len = 1
min_word_len = 1
charset_type = utf-8
html_strip = 0
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312
read_timeout = 5
max_children = 30
max_matches = 1000
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
pid_file = var/log/searchd_mysql.pid
log = var/log/searchd_mysql.log
query_log = var/log/query_mysql.log
}
------------------
Then I reindex the db and start the searchd daemon.
In my client side I set the attribute as:
----------Client side config-------------------
sc = new SphinxClient();
///other thing
HashMap<String, Integer> weiMap=new HashMap<String, Integer>();
weiMap.put("title", 100);
weiMap.put("body", 0);
sc.SetFieldWeights(weiMap);
sc.SetMatchMode(SphinxClient.SPH_MATCH_ALL);
sc.SetSortMode(SphinxClient.SPH_SORT_EXTENDED,"#weight DESC");
When I try to search "National hospital", I got the following output:
Query 'National hospital' retrieved 3 of 3 matches in 0.0 sec.
Query stats:
'nation' found 3 times in 3 documents
'hospit' found 3 times in 3 documents
Matches:
1. id=3, weight=101
2. id=1, weight=100
3. id=2, weight=100
The match number (three matched) is right,however the order of the result is not what I
wanted.
Obviously the document of id 1 and 2 should be the most closed items to the required
string( "National hospital" ), so in my opinion they should be given the largest
weights,but they are orderd at the last position.
I wonder if there is anyway to meet my requirement?
PS:
1)please do not suggestion me set the sortModel to :
sc.SetSortMode(SphinxClient.SPH_SORT_EXTENDED,"#weight ASC");
This may work for just this example, it will caused some other potinal problems.
2)Actuall the contents in my table is Chinese, I just use the "National Hosp..l" to make
a example.
1° You ask "National hospital" but sphinx search "nation" and "hospit" because
morphology = stem_en, stem_ru, soundex
2° You give weight
weiMap.put("title", 100);
weiMap.put("body", 0);
to unexisting text fields
sql_query = SELECT * FROM test
3° finaly my simple answer to main question
You sort by weight,
the third row has more weight because no words between nation and hospit