Context Free Grammar for English Sounding Names - random

I am currently writing an application that will generate random data; specifically, random names. I have made some decent progress, but am not satisfied with many of the generated names. The problem lies in my production rules, which I've attached to the bottom of this post.
The basic idea is: consonant, vowel, consonant, vowel, but some consonants themselves map to vowels (such as b< VO >).
I have not fully created the rules yet, but the final idea would follow the format shown below. However, rather than finishing it, I would like to make a better basis for the production rules.
I have tried to find a reference that discusses either: a CFG already created for English-sounding words, or an English reference that disassembles the basic format of letter combinations for words. Unfortunately, I have not been able to find a useful resource to help me advance farther than I already have. Does anyone know of a place I should look, or a reference I can look at?
ALSO: in your opinion, do you believe a context-sensitive grammar might work better?
//the following will deal with single vowels and consonants
var CO = ['b','c','d','f','g','h','j','k','l','m','n','p','qu','r','s','t','v','w','x','y','z'];
CO.probabilities = [2.41,4.49,6.87,3.59,3.25,9.84,0.24,1.24,6.5,3.88,10.9,3.11,0.153,9.67,10.2,14.6,1.58,3.81,0.242,3.19,0.12];
CO.name = "CO";
var VO = ['a','e','i','o','u'];
VO.probabilities = [21.43,33.33,18.28,19.7,7.23];
VO.name = "VO";
var LETTER = ['<VO>','<CO>'];
LETTER.probabilities = [38.1,61.9];
LETTER.name = "LETTER";
//the following deal with connsonant pairs
var BH = ['c','p','r','s','t']; //the fisrt part of a th, ph, sh, pair (before H)
BH.probabilities = [20,10,20,25,25];
BH.name = "BH";
var BL = ['b','c','f','g','p','s']; //before letter l
BL.probabilities = [10,20,10,10,25,25]
BL.name = "BL";
var COP = ['<BH>h','<BL>l'] //consonant pairs
COP.probabilities = [50,50];
COP.name = "COP";
//this is a generic syllable, that does not take grammar rules into consideration
var SYL = ['<CO><VO>','<VO><CO>','<CO><VO><VO>'];
SYL.probabilities = [50,20,30];
SYL.name = "SYL";
//the following deal with mid word syllablse
var CLOSED = ['<CO><VO><CO>','<CO><VO><CO><CO>'];
CLOSED.probabilities = [75,25];
CLOSED.name = "CLOSED";
var OPEN = ['<CO><VO>','<CO><CO><VO>'];
OPEN.probabilities = [60,40];
OPEN.name = "OPEN";
var VR = ['<VO>r']; //vowel-r
VR.probabilities = [100];
VR.name = "VR";
var MID = ['<CLOSED>','<OPEN>','<VR>'];
MID.probabilities = [33,33,33];
MID.name = "MID";
//the following will deal with ending syllables
var VCE = ['<VO><CO>e','<LETTER><VO><CO>e'];
VCE.probabilities = [75,25];
VCE.name = "VCE";
var CLE = ['<CO>le'];
CLE.probabilities = [100];
CLE.name = "CLE";
var OE = ['tion','age','ive']; //other endings
OE.probabilities = [33,33,33];
OE.name = "OE";
var ES = ['<VCE>','<CLE>','<OE>','<VR>']; //contains all ending syllables
ES.probabilities = [40,40,20];
ES.name = "ES";
var rules = [CO,VO,BH,BL,COP,LETTER,SYL,CLOSED,OPEN,VR,MID,VCE,CLE,OE,ES];
//These are some highly-defined production rules
var streetSuffix = ['road','street','way','avenue','drive','grove','lane','gardens','place','crescent','close','square','hill','circus','mews','vale','rise','mead'];
streetSuffix.probabilities = [15,15,5,10,5,2.7,2.7,2.7,2.7,2.7,2.7,2.7,2.7,2.7,2.7,2.7,2.7,2.7];
var states = ['Alabama','Alaska','American Samoa','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','Florida','Georgia','Guam','Hawaii','Idaho','Illinois','Indiana','Iowa','Kansas','Kentucky','Louisiana','Maine','Marshall Islands','Maryland','Massachusetts','Michigan','Minnesota','Mississippi','Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico','New York','North Carolina','North Dakota','Ohio','Oklahoma','Oregon','Palau','Pennsylvania','Puerto Rico','Rhode Island','South Carolina','South Dakota','Tennessee','Texas','Utah','Vermont','Virgin Island','Virginia','Washington','West Virginia','Wisconsin','Wyoming'];
var cityNewWordSuffix = ['city','town',''];
var cityEndWordSuffix = ['polis','ville','ford','furt','forth','shire','berg','gurg','borough','brough','field','kirk','bury','stadt',''];
var siteSuffix = ['com','org','net','edu'];
/**
This will generate a random name of Length length
*/
function generateRandomName() {
//string will be random length of CO VO pattern for now
var result;
result = "<COP><VO><MID><VO><ES>";
while (hasNonTerminal(result)) {
result = replaceFirstNonTerminal(result);
}
return result;
}
Here are a few words generated by the machine in its current state:
"cheiroene",
"sloeraase",
"sledehgeute",
"rhaorenone",
"rheerisute",
"chaereehe",
"sletraoege",
"sluureese",
"chaheyleete",
"chierauhe",
"ploclooate",
"glawofhaice",
"thanisgoage",
"slelaodose",
"blaereode",
"shihudeife",
"slaereene",
"pleheaele",
"rhepicsaile",
"ploeruoge",
"sliareuhe",
"thaereafe",
"thaaraeke",
"cheoreate",
"shofetniote",
"phiraoese",
"clilniueye",
"slepceikede",
"cligloueohe",
"phitleoime",

Related

Is "insample" in mlr3tuning resampling can be used when we want to do hyperparameter tuning with the full dataset?

I've been trying to do some tuning hyperparameters for the survival SVM model. I used the AutoTuner function from the mlr3tuning package. I want to do tuning for the whole dataset (No train & test split). I've found the resampling class which is "insample". When I look at the mlr3 dictionary, it said "Uses all observations as training and as test set."
My questions is, Is "insample" in mlr3tuning resampling can be used when we want to do hyperparameter tuning with the full dataset and if it applies, why when I tried to use the hyperparameter to the survivalsvm function from the survivalsvm package, it gives the different output of concordance index?
This is the code I used for hyperparameter tuning
veteran<-veteran
set.seed(1)
task = as_task_surv(x = veteran, time = 'time', event = 'status')
learner = lrn("surv.svm", type = "hybrid", diff.meth = "makediff3",
gamma.mu = c(0.1, 0.1),kernel = 'rbf_kernel')
search_space = ps(gamma = p_dbl(2^-5, 2^5),mu = p_dbl(2^-5, 2^5))
search_space$trafo = function(x, param_set) {
x$gamma.mu = c(x$gamma, x$mu)
x$gamma = x$mu = NULL
x}
ssvm_at = AutoTuner$new(
learner = learner,
resampling = rsmp("insample"),
search_space = search_space,
measure = msr('surv.cindex'),
terminator = trm('evals', n_evals = 5),
tuner = tnr('grid_search'))
ssvm_at$train(task)
And this is the code that I've been trying using the survivalsvm function from the survivalsvm package
survsvm.reg <- survivalsvm(Surv(veteran$time , veteran$status ) ~ .,
data = veteran,
type = "hybrid", gamma.mu = c(32,32),diff.meth = "makediff3",
opt.meth = "quadprog", kernel = "rbf_kernel")
pred.survsvm.reg <- predict(survsvm.reg,veteran)
conindex(pred.survsvm.reg, veteran$time)

DocuSign Require Signing Document Twice with Different Tabs

I have a document that is using tabs to fill a document. The document is signed before and after completion of a task. Is it possible to modify tabs on an envelope - then re-generate an DocuSign_eSign::RecipientViewRequest (still having the initial signature / fields)?
Thus far I've been able to generate two DocuSign_eSign::RecipientViewRequest, but cannot figure out how to change the tabs in between signing:
PRE_SIGNER = 'pre_signer'
POST_SIGNER = 'post_signer'
PRIVATE_KEY = CREDENTIALS['private_key']
PUBLIC_KEY = CREDENTIALS['public_key']
USER_ID = CREDENTIALS['user_id']
CLIENT_ID = CREDENTIALS['client_id']
ACCOUNT_ID = CREDENTIALS['account_id']
BASE_URL = CREDENTIALS['base_url']
configuration = DocuSign_eSign::Configuration.new
configuration.host = "#{BASE_URL}/restapi"
configuration.debugging = true
api_client = DocuSign_eSign::ApiClient.new(configuration)
api_client.base_path = BASE_URL
envelope_api = DocuSign_eSign::EnvelopesApi.new(api_client)
pre_signer_text = DocuSign_eSign::Text.new
pre_signer_text.value = 'Alpha'
pre_signer_text.tab_label = 'pre_value'
pre_signer = DocuSign_eSign::Signer.new
pre_signer.role_name = PRE_SIGNER
pre_signer.client_user_id = PRE_SIGNER
pre_signer.recipient_id = 1
pre_signer.name = 'Kevin Sylvestre'
pre_signer.email = 'kevin#fake.com'
pre_signer.tabs = DocuSign_eSign::Tabs.new
pre_signer.tabs.text_tabs = [pre_signer_text]
post_signer = DocuSign_eSign::Signer.new
post_signer.role_name = POST_SIGNER
post_signer.client_user_id = POST_SIGNER
post_signer.recipient_id = 2
post_signer.name = 'Kevin Sylvestre'
post_signer.email = 'kevin#fake.com'
post_signer.tabs = DocuSign_eSign::Tabs.new
post_signer.tabs.text_tabs = []
server_template = DocuSign_eSign::ServerTemplate.new
server_template.sequence = 0
server_template.template_id = TEMPLATE_ID
inline_template = DocuSign_eSign::InlineTemplate.new
inline_template.sequence = 0
inline_template.recipients = DocuSign_eSign::Recipients.new
inline_template.recipients.signers = [
pre_signer,
post_signer,
]
composite_template = DocuSign_eSign::CompositeTemplate.new
composite_template.server_templates = [server_template]
composite_template.inline_templates = [inline_template]
envelope_event = DocuSign_eSign::EnvelopeEvent.new
envelope_event.envelope_event_status_code = 'completed'
envelope_definition = DocuSign_eSign::EnvelopeDefinition.new
envelope_definition.status = 'sent'
envelope_definition.composite_templates = [composite_template]
api_client.request_jwt_user_token(CLIENT_ID, USER_ID, PRIVATE_KEY)
envelope = envelope_api.create_envelope(ACCOUNT_ID, envelope_definition)
pre_signer_recipient_view_request = DocuSign_eSign::RecipientViewRequest.new
pre_signer_recipient_view_request.authentication_method = 'none'
pre_signer_recipient_view_request.client_user_id = PRE_SIGNER
pre_signer_recipient_view_request.user_name = 'Kevin Sylvestre'
pre_signer_recipient_view_request.email = 'kevin#fake.com'
pre_signer_recipient_view_request.return_url = 'https://ksylvest.com'
pre_recipient_view = envelope_api.create_recipient_view(ACCOUNT_ID, envelope.envelope_id, pre_signer_recipient_view_request)
url = pre_recipient_view.url
`open #{url}`
puts "Continue?"
gets
# at this point I'd like to enter values for tabs...
post_signer_text = DocuSign_eSign::Text.new
post_signer_text.value = 'Omega'
post_signer_text.tab_label = 'post_value'
post_signer_recipient_view_request = DocuSign_eSign::RecipientViewRequest.new
post_signer_recipient_view_request.authentication_method = 'none'
post_signer_recipient_view_request.client_user_id = POST_SIGNER
post_signer_recipient_view_request.user_name = 'Kevin Sylvestre'
post_signer_recipient_view_request.email = 'kevin#fake.com'
post_signer_recipient_view_request.return_url = 'https://ksylvest.com'
post_recipient_view = envelope_api.create_recipient_view(ACCOUNT_ID, envelope.envelope_id, post_signer_recipient_view_request)
url = post_recipient_view.url
`open #{url}`
You could add the same person to sign twice, as two separate recipients that are the same person. You can generate different recipient views. You can set the routing order to be different. Only reason I didn't post this as an answer is that you may mean that you need to pause the envelope?
you can add tabs using your code where you have post_signer.tabs, but if you want to modify existing tabs that came from the template then you have to create the envelope in draft mode ("created") and then make a different API call to modify the tabs and then a final API call to send it. Another option is to pause the envelope and "correct" it.
Pause envelope workflow code examples
https://github.com/docusign/docusign-esign-ruby-client/blob/c477b07c2f578214fdf7d0c5a33355f01e9a0b4e/lib/docusign_esign/api/envelopes_api.rb#L6132 update_recipients() method should do the trick...

gdata.data.PhoneNumber: How do I get the type of Phone Number?

Using the class gdata.data.PhoneNumber, how do I get the type (Home/Business/Mobile/etc.) of that phone number?
This is the documentation I am referencing: https://gdata-python-client.googlecode.com/hg/pydocs/gdata.data.html#PhoneNumber
The "rel" attribute should be what you are looking for.
This is example code from https://github.com/google/gdata-python-client/blob/master/tests/gdata_tests/contacts/service_test.py:
# Create a new entry
new_entry = gdata.contacts.ContactEntry()
new_entry.title = atom.Title(text='Elizabeth Bennet')
new_entry.content = atom.Content(text='Test Notes')
new_entry.email.append(gdata.contacts.Email(
rel='http://schemas.google.com/g/2005#work',
primary='true',
address='liz#gmail.com'))
new_entry.phone_number.append(gdata.contacts.PhoneNumber(
rel='http://schemas.google.com/g/2005#work', text='(206)555-1212'))
new_entry.organization = gdata.contacts.Organization(
org_name=gdata.contacts.OrgName(text='TestCo.'),
rel='http://schemas.google.com/g/2005#work')
It doesn't access the "rel" attribute but it is there, I swear :)
Once you get a PhoneNumer instance you can print every attribute with the built-in dir() function:
print(dir(phone_number))
The following is a list of "rel"s (https://github.com/google/gdata-python-client/blob/master/src/gdata/data.py). I don't know whether all are applicable to phone numbers or not but it may be useful for checking the type:
FAX_REL = 'http://schemas.google.com/g/2005#fax'
HOME_REL = 'http://schemas.google.com/g/2005#home'
HOME_FAX_REL = 'http://schemas.google.com/g/2005#home_fax'
ISDN_REL = 'http://schemas.google.com/g/2005#isdn'
MAIN_REL = 'http://schemas.google.com/g/2005#main'
MOBILE_REL = 'http://schemas.google.com/g/2005#mobile'
OTHER_REL = 'http://schemas.google.com/g/2005#other'
OTHER_FAX_REL = 'http://schemas.google.com/g/2005#other_fax'
PAGER_REL = 'http://schemas.google.com/g/2005#pager'
RADIO_REL = 'http://schemas.google.com/g/2005#radio'
TELEX_REL = 'http://schemas.google.com/g/2005#telex'
TTL_TDD_REL = 'http://schemas.google.com/g/2005#tty_tdd'
WORK_REL = 'http://schemas.google.com/g/2005#work'
WORK_FAX_REL = 'http://schemas.google.com/g/2005#work_fax'
WORK_MOBILE_REL = 'http://schemas.google.com/g/2005#work_mobile'
WORK_PAGER_REL = 'http://schemas.google.com/g/2005#work_pager'
NETMEETING_REL = 'http://schemas.google.com/g/2005#netmeeting'
Those OTHER "rel"s can (or maybe should?) be joined with the object's "label" attribute.

SharePoint 2013 - Sorting Search Results not working (KeywordQuery-SortList)

am using KeywordQuery to search and.. the SortList does not affect result, it is always return first 5 results. Any suggestion? The code is bellow...
using (KeywordQuery query = new KeywordQuery(site))
{
var fedManager = new FederationManager(application);
var owner = new SearchObjectOwner(SearchObjectLevel.SPSite, site.RootWeb);
query.SourceId = fedManager.GetSourceByName("NewsRS", owner).Id;
query.QueryText = string.Format("WorkflowStatusOWSCHCS:Approved PublishedUntilDate>=\"{0}\" OR NewsNewsPublishedDate<=\"{0}\"", DateTime.Now);
query.KeywordInclusion = KeywordInclusion.AllKeywords;
query.RowLimit = 5;
query.StartRow = 1;
query.SelectProperties.Add("NewsFriendlyUrl");
query.SelectProperties.Add("NewsNewsTeaser");
query.SelectProperties.Add("NewsNewsDate");
query.SelectProperties.Add("NewsPublishedUntilDate");
query.SelectProperties.Add("NewsNewsContent");
query.SelectProperties.Add("NewsNewsPublishedDate");
query.SelectProperties.Add("NewsNewsImage");
query.SortList.Add("NewsNewsDate", SortDirection.Descending);
var searchExecutor = new SearchExecutor();
var myResults = searchExecutor.ExecuteQuery(query);
}
}
... the NewsNewsDate is marked as Sortable
query.RowLimit = 5; => You are explicitly specifying the Rowlimit to be 5. That is why it returns the first 5 results always. Change the rowlimit and set it to the number of results you need.

Use linq to query blocks of text?

I have a text file that contains some data in a "block" format:
source : source location
filename : somefile.txt
vendor : somevendor
version : xx.xx.xxx
source : source location2
filename : somefile2.txt
vendor : somevendor2
version : yy.yy.yyy
can I use Linq to query this data and if so how would you go about it? I have used linq to query lines of data from text file many times, but never a "block" of data as above. Thanks for the input.
Yes, you can use LINQ, this approach is not optimized much if you have large file. Below is how to get data:
var lines = File.ReadLines("C:\\text.txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.ToList();
for (int i = 0; i < lines.Count; i += 4)
{
var location = lines[i].Split(':')[1];
var fileName = lines[i + 1].Split(':')[1];
var vendor = lines[i + 2].Split(':')[1];
var version = lines[i + 3].Split(':')[1];
}
Version to use LINQ:
var result = Enumerable.Range(0, lines.Count()/4).Select(i => new {
location = lines[4*i].Split(':')[1];
fileName = lines[4*i + 1].Split(':')[1];
vendor = lines[4*i + 2].Split(':')[1];
version = lines[4*i + 3].Split(':')[1];
});

Resources