Machine learning and actual predictions [closed] - algorithm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a question about machine learning regarding predictions.
So typically I would have a dataset with x's and y's that i would train my algo on. But what if I just have a dataset with input variables only (x's) and no actual predictions (y's)?
For example, im looking for fradulent transactions.
In dataset A i have a bunch of input variables like amounts, zipcodes, merchant, etc. and i have a fraud status variable that says 1 for possible fraud, 0 for safe transaction. Here i have known frauds/known non frauds that i can train my model on.
However, what if i have a dataset where there is no fraud varaible. All i have is my input variables and no variable that indicates whether it is fraud or not. How could an ML algo then predict the probability of it being a fraudulent transaction for this specific dataset?

I think what you are looking for is anomaly detection. In anomaly detection, you will try to find the datapoints, which are different from the rest of the data points, in your case it is fraudulent transaction.
There are quite a few algorithms available in sklearn, look here. I would recommend start with IsolationForest model for your problem.
From Documentation.

Related

What's a good data structure to store and work with a thesaurus? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I've been working for a few years on an English-language thesaurus project, which combines a few sources (e.g. WordNet, Wiktionary Thesaurus, Moby Thesaurus, Word2vec) to make a large thesaurus. Currently I have the data defined as a list of lists. And each link has a score (higher = stronger), so "hotel" and "inn" might have a score of 2.0; but "hotel" and "fleabag" has a score of 0.2. High scores are near synonyms, low scores are more distant associations. I've been able to use Dijkstra and A* to find links between words (so-called "synonym chains").
Is there a type of graph database and/or analysis tools which is ideally suited for this sort of data? Word relationship strengths are often asymmetric. For example "Hoover Dam" links to "Herbert Hoover" more strongly than "Herbert Hoover" links back to "Hoover Dam". I'm interested in better ways to find the links between words, find unrelated words, measure word similarity.
I'd appreciate any new pointers/direction.
Interesting question. Not sure about the best data structure, but for processing, you can look at shell neighbors within this package: https://grispy.readthedocs.io/en/latest/api.html

Split text files into two groups - unsupervised learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Imagine, you are a librarian and during time you
have classified a bunch of text files (approx 100)
with a general ambiguous keyword.
Every text file is actually a topic of keyword_meaning1
or a topic of keyword_meaning2.
Which unsupervised learning approach would you use,
to split the text files into two groups?
What precision (in percentage) of correct classification
can be achieved according to a number of text files?
Or can be somehow indicated in one group, that there is
a need of a librarian to check certain files, because
they may be classifed incorrectly?
The easiest starting point would be to use a naive Bayes classifier. It's hard to speculate about the expected precision. You have to test it yourself. Just get a program for e-mail spam detection and try it out. For example, SpamBayes (http://spambayes.sourceforge.net/) is a quite good starting point and easily hackable. SpamBayes has a nice feature that it will label messages as "unsure" when there is no clear separation between two classes.
Edit: When you really want unsupervised clustering method, then perhaps something like Carrot2 (http://project.carrot2.org/) is more appropriate.

how to efficiently work through algorithms on paper [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am currently reading a programming text book and as I discover different algorithms used in the book I'm finding it necessary to understand how they work by working through them. Is there a standard & efficient way to work through simple algorithms on paper?
Write the algorithm down on the paper. Write the corresponding graphs and variables that you use in algorithm.
Now follow algorithm step by step and note what changed with variables and graphs etc.
Time slices. Make a table, where the column headers are variables involved, and row headers are step numbers. Fill in row zero with initial values if any, and each row represents the result of the current step on the previous row.

Ruby Object manipulation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
We have an algorithm that compare ruby objects coming from MongoDB. The majority of the time spent, it taking the results (~1000), assigning a weight to them, and comparing them to a base object. This process takes ~2 sec for 1000 objects. Afterwards, we order the objects by the weight, and take the top 10.
Given that the number of initial matches will continue to grow, I'm looking for more efficient ways to compare and sort matches in Ruby.
I know this is kind of vague, but let's assume they are User objects that have arrays of data about the person and we're comparing them to a single user to find the best match for that user.
Have you considered storing/caching the weight? This works well if the weight depends only on the attributes of each user and not on values external to that user.
Also, how complex is the calculation involving the weight associated with a user and the "base" user? If it's complex you may want to consider using a graph database, which can store data that is specific to the relation between 2 nodes/objects.

Algorithm for creating infinite terrain/landscape/surface? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Does any have an algorithm for creating infinite terrain/landscape/surface?
Constraints
The algorithm should start by a random seed
The algorithm should be one to one, (the same seed gives the same result)
Other input parameter are allowed as long as 2 is fulfilled
The algorithm may output a 2d map
It suppose to create only surface with varying height (mountains), not three, ocean etc.
I’m looking for an algorithm and not a software.
It should be fast
None of other related questions in here answers this question.
If anything is unclear please let me know!
I would suggest something like Perlin noise, I've used it before for something like you're describing above, and it fits the bill. Check out this Example and you can see the sort of output you would expect from the noise generator.Here is a link to algorithm p-code too.
http://freespace.virgin.net/hugo.elias/models/m_perlin.htm
As others already said perlin noise is a possibility. Gpugems 3 has a nice capter about procedual generation using (IIRC, it has been some time since I read this) 3D Perlin noise.
Of course there are other methods too, e.g. Vterrain.org might be worth a look.

Resources