Difference between one pass (scan) and two pass(scan) - data-structures

I had an Interview, a day before.
The Interviewer told me to , " Write a program to add a node at the end of a linked list ".
I had given him a solution. but he told me to implement it in one pass (one scan).
Can Anybody explain me, whats the meaning of one pass, and how to find the program written is in one pass or two pass?
Here is my code
public void atLast(int new_data)
{
Node new_node=new Node(new_data);
if(head==null)
{
head=new Node(new_data);
return;
}
new_node.next=null;
Node last=head;
while(last.next!=null)
{
last=last.next;
}
last.next=new_node;
return;
}

If that is the code you gave the interviewer must have misread it because it is a single pass.
In your case a "pass" would be your while loop. It could also be done with recursion, for, or any other type of loop that goes through the elements in the array (or other form of a list of items).
In your code you run through the list of Node and insert the element at the end. This is done in one loop making it a single pass.
Now to look at a case with two passes. Say for example you were asked to remove the element with the largest value and wrote something similar to this:
int index = 0;
int count = 0;
int max = 0;
while(temp_node != null)
{
if(temp_node.data > max)
{
index = count;
max = temp_node.data;
}
count++;
temp_node = temp_node.next;
}
for(int i = 0; i < count; i++)
{
if(i == index)
{
//Functionality to remove node.
}
}
The first pass (while) detects the Node which has the maximum value. The second pass (for) removes this Node by looping through all the elements again until the correct one is found.

I'd imagine "two passes" here means that you iterated through the whole list twice in your code. You shouldn't need to do that to add a new node.

Related

What makes this a fixed-length list in Dart?

List<String> checkLength(List<String> input) {
if (input.length > 6) {
var tempOutput = input;
while (tempOutput.length > 6) {
var difference = (tempOutput.length/6).round() + 1;
for (int i = 0; i < tempOutput.length - 1; i + difference) {
tempOutput.removeAt(i); //Removing the value from the list
}
}
return tempOutput; //Return Updated list
} else {
return input;
}
}
I am trying to delete something out of a temporary list. Why does it not work? I do not see how it is fixed, in other problems I have solved, I used a similar approach and it worked (Even identical nearly)
Please note I am kind of new to Dart, so please forgive me this sort of question, but I couldn't figure out the solution.
Find the Code available in the Dart Link
Code in Dart
You can ensure that tempOutput is not a fixed-length list by initializing it as
var tempOutput = new List<String>.from(input);
thereby declaring tempOutput to be a mutable copy of input.
FYI it also looks like you have another bug in your program since you are doing i + difference in your for-loop update step but I think you want i += difference.
Can you please try this code and let me know is that works?
List<String> checkLength(List<String> input) {
if (input.length > 6) {
var tempOutput = input;
while (tempOutput.length > 6) {
var difference = (tempOutput.length/6).round() + 1;
for (int i = 0; i < tempOutput.length - 1; i = i + difference) {
tempOutput.removeAt(i); //Removing the value from the list
}
}
return tempOutput.toList(); //Return Updated list
} else {
return input.toList();
}
}
Note: You used "i + difference" which is same value say for example in first iteration you i=1 and difference = 1, then "tempOutput.removeAt(i)" will remove the value at "1" position, again in the second iteration you are trying to remove the same position, so the error clearly states "Cannot remove from the Fixed length"
Here the i value has to be incremented or decremented for each iteration process, in your for loop that is missing.
The answer of #harry-terkelsen was very helpful for solving the fixed-length problem.
For those who were asking about my algorithm:
The difference is for skipping the amount of characters when wanting to remove some. Also, I had to change the for-loop, as it did not quite do what I wanted it to.
The fix is here! https://github.com/luki/wordtocolor/blob/master/web/algorithms.dart
Thank you for understanding me!

Using find_if with vector object

this code always returns false
I tried to pass lambda parameter by reference and I had the same result
any tip please
vector<int> v1;
v1.push_back(1);
v1.push_back(2);
v1.push_back(3);
v1.push_back(5);
for (int x : v1)
{
auto it = find_if(v1.begin(), v1.end(), [x](int y){ return x == y; });
if (it != v1.end())
return false;
return true;
}
To check for duplicates (not remove them, just check for them) then you can do something like this:
Get the first value, and check for it in the rest of the container. You should not check the first element again because that's the element we are checking currently.
If a duplicate is not found then go on to the second element, and check from the third element forward. WE don't need to check the first element because that was done in the previous step.
Then continue like that for all elements.
If you find a duplicate then stop the searching and return true. If none are found then continue until the end, and then return false.
This can be done easily using iterators:
// Outer loop, current element to check
for (auto const i = v1.begin(); i != v1.end(); ++i)
{
// Inner loop, the element to check against
for (auto const j = i + 1; j != v1.end(); ++j)
{
if (*i == *j)
return true; // Duplicate found
}
}
// No duplicates found
return false;
The above code shows the principle, you could of course use std::find_if instead of the inner loop. The important thing is to start looking at the next element. All the previous have already been checked, and you should not compare the current value with itself.
Look at your condition in the if. What you meant it to do and what it actually does?
From using for it looks like you want it to go over all the container. Can you find a case where the body of the loop doesn't return immediately on the first iteration?

Converting an if code into forloop statement

Right now i have to write a code that will print out "*" for each collectedDots, however if it doesn't collect any collectedDots==0 then it print out "". Using too many if statements look messy and i was wandering how you would implement the forloop in this case.
As a general principle the kind of rearrangement you've done here is good. You have found a way to express the rule in a general way rather than as a sequence of special cases. This is much easier to reason about and to check, and it's obviously extensible to cases where you have more than 3 dots.
You probably have made an error in confusing your target number and the iteration value, I assume that collectedDots contains the number of dots you have (as per your if statement) and so you need to introduce a variable to count up to that value
for (int i =0; i <= collectedDots; i++)
{
stars = "*";
System.out.print(stars);
}
Ok, so you already have a variable called collectedDots that is a number which tells you how many stars to print?
So your loop would be something like
for every collected dot
print *
But you can't just print it out, you need to return a string that will be printed out. So it's more like
for every collected dot
add a * to our string
return the string
They key difference between this and your attempt so far is that you were assigning a star to be your string each time through the loop, then at the end of it, you return that string–no matter how many times you assign a star to the string, the string will always just be one star.
You also need a separate variable to keep track of your loop, this should do the trick:
String stars = "";
for(int i = 0; i < collectedDots; i++)
{
stars = stars + "*";
}
return stars;
You are almost correct. Just need to change range limit of looping. Looping initial value is set to 1. So whenever you have collectedDots = 0, it will not go in loop and will return "", as stars is intialized with "" before loop.
String stars = "";
for (int i =1; i <= collectedDots; i++)
{
stars = "*";
System.out.print(stars);
}
return stars;

How to eliminate duplicate filename in hadoop mapreduce?

I want to eliminate duplicate filenames in my output of the hadoop mapreduce inverted index program. For example, the output is like - things : doc1,doc1,doc1,doc2 but I want it to be like
things : doc1,doc2
Well you want to remove duplicates which were mapped, i.e. you want to reduce the intermediate value list to an output list with no duplicates. My best bet would be to simply convert the Iterator<Text> in the reduce() method to a java Set and iterate over it changing:
while (values.hasNext()) {
if (!first)
toReturn.append(", ") ;
first = false;
toReturn.append(values.next().toString());
}
To something like:
Set<Text> valueSet = new HashSet<Text>();
while (values.hasNext()) {
valueSet.add(values.next());
}
for(Text value : valueSet) {
if(!first) {
toReturn.append(", ");
}
first = false;
toReturn.append(value.toString());
}
Unfortunately I do not know of any better (more concise) way of converting an Iterator to a Set.
This should have a smaller time complexity than orange's solution but a higher memory consumption.
#Edit: a bit shorter:
Set<Text> valueSet = new HashSet<Text>();
while (values.hasNext()) {
Text next = values.next();
if(!valueSet.contains(next)) {
if(!first) {
toReturn.append(", ");
}
first = false;
toReturn.append(value.toString());
valueSet.add(next);
}
}
Contains should be (just like add) constant time so it should be O(n) now.
To do this with the minimal amount of code change, just add an if-statement that checks to see if the thing you are about to append is already in toReturn:
if (!first)
toReturn.append(", ") ;
first = false;
toReturn.append(values.next().toString());
gets changed to
String v = values.next().toString()
if (toReturn.indexOf(v) == -1) { // indexOf returns -1 if it is not there
if (!first) {
toReturn.append(", ") ;
}
toReturn.append(v);
first = false
}
The above solution is a bit slow because it has to traverse the entire string every time to see if that string is there. Likely the best way to do this is to use a HashSet to collect the items, then combining the values in the HashSet into a final output string.

word distribution problem

I have a big file of words ~100 Gb and have limited memory 4Gb. I need to calculate word distribution from this file. Now one option is to divide it into chunks and sort each chunk and then merge to calculate word distribution. Is there any other way it can be done faster? One idea is to sample but not sure how to implement it to return close to correct solution.
Thanks
You can build a Trie structure where each leaf (and some nodes) will contain the current count. As words will intersect with each other 4GB should be enough to process 100 GB of data.
Naively I would just build up a hash table until it hits a certain limit in memory, then sort it in memory and write this out. Finally, you can do n-way merging of each chunk. At most you will have 100/4 chunks or so, but probably many fewer provided some words are more common than others (and how they cluster).
Another option is to use a trie which was built for this kind of thing. Each character in the string becomes a branch in a 256-way tree and at the leaf you have the counter. Look up the data structure on the web.
If you can pardon the pun, "trie" this:
public class Trie : Dictionary<char, Trie>
{
public int Frequency { get; set; }
public void Add(string word)
{
this.Add(word.ToCharArray());
}
private void Add(char[] chars)
{
if (chars == null || chars.Length == 0)
{
throw new System.ArgumentException();
}
var first = chars[0];
if (!this.ContainsKey(first))
{
this.Add(first, new Trie());
}
if (chars.Length == 1)
{
this[first].Frequency += 1;
}
else
{
this[first].Add(chars.Skip(1).ToArray());
}
}
public int GetFrequency(string word)
{
return this.GetFrequency(word.ToCharArray());
}
private int GetFrequency(char[] chars)
{
if (chars == null || chars.Length == 0)
{
throw new System.ArgumentException();
}
var first = chars[0];
if (!this.ContainsKey(first))
{
return 0;
}
if (chars.Length == 1)
{
return this[first].Frequency;
}
else
{
return this[first].GetFrequency(chars.Skip(1).ToArray());
}
}
}
Then you can call code like this:
var t = new Trie();
t.Add("Apple");
t.Add("Banana");
t.Add("Cherry");
t.Add("Banana");
var a = t.GetFrequency("Apple"); // == 1
var b = t.GetFrequency("Banana"); // == 2
var c = t.GetFrequency("Cherry"); // == 1
You should be able to add code to traverse the trie and return a flat list of words and their frequencies.
If you find that this too still blows your memory limit then might I suggest that you "divide and conquer". Maybe scan the source data for all the first characters and then run the trie separately against each and then concatenate the results after all of the runs.
do you know how many different words you have? if not a lot (i.e. hundred thousand) then you can stream the input, determine words and use a hash table to keep the counts. after input is done just traverse the result.
Just use a DBM file. It’s a hash on disk. If you use the more recent versions, you can use a B+Tree to get in-order traversal.
Why not use any relational DB? The procedure would be as simple as:
Create a table with the word and count.
Create index on word. Some databases have word index (f.e. Progress).
Do SELECT on this table with the word.
If word exists then increase counter.
Otherwise - add it to the table.
If you are using python, you can check the built-in iter function. It will read line by line from your file and will not cause memory problems. You should not "return" the value but "yield" it.
Here is a sample that I used to read a file and get the vector values.
def __iter__(self):
for line in open(self.temp_file_name):
yield self.dictionary.doc2bow(line.lower().split())

Resources