Cant get the index of string on the mapper function - hadoop

on getIndexes (int number , int size , int characters )
I have to add the converted number at end of array as I have to apply padding of 0 suppose its 231...
which means I have to put 6 zeros at start and then 20.
//Input characters and Lenght of Motif
char [] inputChars = {'a','c','g','t'} ;
int lengthOfMotif = 8 ;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
{
/*
* To generate all the combinations of lengthOfMotif i used formula or probability to count of all possible strings that will be power of ( inputChars length , elements in motif )
*
* Then i called my getIndexes method to get int[] of length lengthOfMotif representing an index from inputChars
*
* I generated a motif and made it key of Mapper and called minDistance returning minDistance,bestMatchingString,indexOfBestMatchingString and made it value for key motif
*
*
*/
for (int i = 0 ; i < Math.pow(inputChars.length, lengthOfMotif) ; i ++ )
{
String motif = "" ; // initialize the empty motif string
for ( int j : getIndexes ( i , lengthOfMotif , inputChars.length ) ) //loop on array returned by getIndexes() with indexes to select from inputChar Array to build the string
{
motif = motif+inputChars[j] ;
}
context.write(new Text(motif), new Text ( minDistance(motif,value.toString() ) ) ) ;
}
}
// It takes a number , length of resultant indexes , number of unique characters
/*
* I convert the number to base of unique characters so the max index that can be generated will be less than the power
* then place the number at end of indexes array which will keep the starting indexes to be 0
*
* As our length is 8 and characters are 4 so
* if my number is 0
* i converted it to base 4 so it will remain 0
* i placed it at end of indexes array so my array will be like
* 0 0 0 0 0 0 0 0
* which in our case if considered as index of inputChars it will return
* a a a a a a a a
* The max number will be 8 ^ 4 = 65536 as we are starting from 0 our max number will be 65536
* in base 4 65536 is 3 3 3 3 3 3 3 3 which if we consider indexes will become
* t t t t t t t t
* So every number from 0 to 65536 will be covered and each combination will be passed as key of mapper
*
*/
int[] getIndexes (int number , int size , int characters )
{
//init new result array
int[] result = new int[size] ;
// I stuck here
}
return result ;
}
//return concatinated string in format minDistance,bestMatching,index

/ in string
// 2 -> index -> 0
// 3 -> index -> 1
// 1 -> index -> 2
//Array has indexes 0 to 7
// so to add paadd
// 2 have to be at 5th index
// 3 have to be at 6th index
// 1 have to be at 7th index
// size variable has total required length that is 8
// so i subtracted length of 231 from 8. 8-3 = 5 + i = 5+0
// now i have to place second at 6th... so 8-3 = 5 + i = 5 + 1
// for 7th index 8 - 3 = 5 + i = 5 + 7
// i concatinated "" to make the character into string and Integer.parse int converted them to integer
// so 5 6 7 indexes will be filled with 2 3 1
result [ size-indexes.length()+i ] = Integer.parseInt( indexes.charAt(i) + "") ;

String [] tokens = values.iterator().next().toString().split(",");
int minDistance = Integer.parseInt ( tokens [ 0 ] ) ;
String bestMatching = tokens[ 1 ] ;
int index = Integer.parseInt( tokens [ 2 ] ) ;
int minimumDistance = minDistance ;
for ( Text t : values )
{
tokens = t.toString().split(",");
int distance = Integer.parseInt( tokens [ 0 ] ) ;
if ( distance < minDistance )
{
minDistance = distance ;
bestMatching = tokens [ 1 ] ;
index = Integer.parseInt( tokens [ 2 ] ) ;
}
minimumDistance = minimumDistance + distance ;

Related

Optimised EmEditor macro to populate column based on another column for a large file

I’ve got a really large file, circa 10m rows, in which I’m trying to populate a column based on conditions on another column via a jsee macro. While it is quite quick for small files, it does take some time for the large file.
//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
THEN cell in Col2 = previous cell in Col2
//jsee code
document.CellMode = true; // Must be cell selection mode
totalLines = document.GetLines();
for( i = 1; i < totalLines; i++ ) {
nref = document.GetCell( i, 1, eeCellIncludeNone );
gsize = document.GetCell( i, 2, eeCellIncludeNone );
if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
}
}
Input File:
Reference
Group Size
14/12/01819
1
14/12/01820
1
15/01/00191
4
15/01/00191
15/01/00191
15/01/00198
15/01/00292
3
15/01/00292
15/01/00292
15/01/00401
5
15/01/00401
15/01/00402
1
15/01/00403
2
15/01/00403
15/01/00403
15/01/00403
15/01/00404
20/01/01400
1
Output File:
Reference
Group Size
14/12/01819
1
14/12/01820
1
15/01/00191
4
15/01/00191
4
15/01/00191
4
15/01/00198
15/01/00292
3
15/01/00292
3
15/01/00292
3
15/01/00401
5
15/01/00401
5
15/01/00402
1
15/01/00403
2
15/01/00403
2
15/01/00403
2
15/01/00403
2
15/01/00404
20/01/01400
1
Any ideas on how to optimise this and make it run even faster?
I wrote a JavaScript for EmEditor macro for you. You might need to set the correct numbers in the first 2 lines for iColReference and iColGroupSize.
iColReference = 1; // the column index of "Reference"
iColGroupSize = 2; // the column index of "Group Size"
document.CellMode = true; // Must be cell selection mode
sDelimiter = document.Csv.Delimiter; // retrieve the delimiter
nOldHeadingLines = document.HeadingLines; // retrieve old headings
document.HeadingLines = 0; // set No Headings
yBottom = document.GetLines(); // retrieve the number of lines
if( document.GetLine( yBottom ).length == 0 ) { // -1 if the last line is empty
--yBottom;
}
str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 1st column from top to bottom, separated by TAB
sCol1 = str.split( sDelimiter );
str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom ); // get whole 2nd column from top to bottom, separated by TAB
sCol2 = str.split( sDelimiter );
s1 = "";
s2 = "";
for( i = 0; i < yBottom; ++i ) { // loop through all lines
if( sCol2[i].length != 0 ) {
s1 = sCol1[i];
s2 = sCol2[i];
}
else {
if( s1.length != 0 && sCol1[i] == s1 ) { // same value as previous line, copy s2
if( s2.length != 0 ) {
sCol2[i] = s2;
}
}
else { // different value, empty s1 and s2
s1 = "";
s2 = "";
}
}
}
str = sCol2.join( sDelimiter );
document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote ); // set whole 2nd column from top to bottom with the new values
document.HeadingLines = nOldHeadingLines; // restore the original number of headings
To run this, save this code as, for instance, Macro.jsee, and then select this file from Select... in the Macros menu. Finally, select Run Macro.jsee in the Macros menu.

Unable to find error sement tree : minimum in subarray

I am new to data structures and algo, and unable to find error in my code for the question
Range Minimum Query
Given an array A of size N, there are two types of queries on this array.
q l r: In this query you need to print the minimum in the sub-array A[l:r].
u x y: In this query you need to update A[x]=y.
Input: First line of the test case contains two integers, N and Q, size of array A and number of queries.
Second line contains N space separated integers, elements of A.
Next Q lines contain one of the two queries.
Output:
For each type 1 query, print the minimum element in the sub-array A[l:r].
Constraints:
1 ≤ N,Q,y ≤ 10^5
1 ≤ l,r,x≤N
#include<bits/stdc++.h>
using namespace std;
long a [100001];
//global array to store input
long tree[400004];
//global array to store tree
// FUNCTION TO BUILD SEGMENT TREE //////////
void build(long i,long start,long end) //i = tree node
{
if(start==end)
{
tree[i]=a[start];
return;
}
long mid=(start+end)/2;
build(i*2,start,mid);
build(i*2+1,mid+1,end);
tree[i] = min(tree[i*2] , tree[i*2+1]);
}
// FUNCTION TO UPDATE SEGMENT TREE //////////
void update (long i,long start,long end,long idx,long val)
//idx = index to be updated
// val = new value to be given at that index
{
if(start==end)
tree[i]=a[idx]=val;
else
{
int mid=(start+end)/2;
if(start <= idx and idx <= mid)
update(i*2,start,mid,idx,val);
else
update(i*2+1,mid+1,end,idx,val);
tree[i] = min(tree[i*2] , tree[i*2+1]);
}
}
// FUNCTION FOR QUERY
long query(long i,long start,long end,long l,long r)
{
if(start>r || end<l || start > end)
return INT_MAX;
else
if(start>=l && end<=r)
return tree[i];
long mid=(start+end)/2;
long ans1 = query(i*2,start,mid,l,r);
long ans2 = query(i*2+1,mid+1,end,l,r);
return min(ans1,ans2);
}
int main()
{
long n,q;
cin>>n>>q;
for(int i=0 ; i<n ; i++)
cin>>a[i];
//for(int i=1 ; i<2*n ; i++) cout<<tree[i]<<" "; cout<<endl;
build(1,0,n-1);
//for(int i=1 ; i<2*n ; i++) cout<<tree[i]<<" "; cout<<endl;
while(q--)
{
long l,r;
char ch;
cin>>ch>>l>>r;
if(ch=='q')
cout<<query(1,0,n-1,l-1,r-1)<<endl;
else
update(1,0,n-1,l,r);
}
return 0;
}
Example :input
5 15
1 5 2 4 3
q 1 5
q 1 3
q 3 5
q 1 5
q 1 2
q 2 4
q 4 5
u 3 1
u 3 100
u 3 6
q 1 5
q 1 5
q 1 2
q 2 4
q 4 5
Expected output:
1
1
2
1
1
2
3
1
1
1
4
3
It appears that all given values assume 1 based indexing: 1 ≤ l,r,x ≤ N
You chose to build your segment tree with 0 based indexing, so all queries and updates also should use same indexing.
So this part is wrong, because you need to set A[x]=y, and because you use 0 based indexing your code actually sets A[x+1]=y
update(1,0,n-1,l,r);
To fix change it to this:
update(1,0,n-1,l-1,r);

How do we Construct LCP-LR array from LCP array?

To find the number of occurrences of a given string P ( length m ) in a text T ( length N )
We must use binary search against the suffix array of T.
The issue with using standard binary search ( without the LCP information ) is that in each of the O(log N) comparisons you need to make, you compare P to the current entry of the suffix array, which means a full string comparison of up to m characters. So the complexity is O(m*log N).
The LCP-LR array helps improve this to O(m+log N).
know more
How we precompute LCP-LR array from LCP array?
And How does LCP-LR help in finding the number of occurrences of a pattern?
Please Explain the Algorithm with Example
Thank you
// note that arrSize is O(n)
// int arrSize = 2 * 2 ^ (log(N) + 1) + 1; // start from 1
// LCP = new int[N];
// fill the LCP...
// LCP_LR = new int[arrSize];
// memset(LCP_LR, maxValueOfInteger, arrSize);
//
// init: buildLCP_LR(1, 1, N);
// LCP_LR[1] == [1..N]
// LCP_LR[2] == [1..N/2]
// LCP_LR[3] == [N/2+1 .. N]
// rangeI = LCP_LR[i]
// rangeILeft = LCP_LR[2 * i]
// rangeIRight = LCP_LR[2 * i + 1]
// ..etc
void buildLCP_LR(int index, int low, int high)
{
if(low == high)
{
LCP_LR[index] = LCP[low];
return;
}
int mid = (low + high) / 2;
buildLCP_LR(2*index, low, mid);
buildLCP_LR(2*index+1, mid + 1, high);
LCP_LR[index] = min(LCP_LR[2*index], LCP_LR[2*index + 1]);
}
Reference: https://stackoverflow.com/a/28385677/1428052
Not having enough reps to comment so posting. Is anybody able to create the LCP-LR using #Abhijeet Ashok Muneshwar solution. For ex for text- mississippi the Suffix array-
0 1 2 3 4 5 6 7 8 9 10
10 7 1 4 0 9 8 3 6 2 5
The LCP array will be
0 1 2 3 4 5 6 7 8 9 10
1 1 4 0 0 1 0 2 1 3 0
And LCP-LR will be
0 1 2 3 4 5 6 7 8 9 10
1 1 0 4 0 0 0 0 0 1 3
But the LCP-LR obtained using the code is not same as above.
To the method buildLCP_LR i am passing index=0, low=0, high=n

SpriteKit for loop

Hi I'm trying to follow a tutorial on Ray Wenderlich site
[http://www.raywenderlich.com/76740/make-game-like-space-invaders-sprite-kit-and-swift-tutorial-part-1][1]
so I'm going thru the functions breaking it down so i can get an understanding of how it works I've commented out stuff which i think i understand but this bit has me stumped
thanks for looking
the for loop whats the var row = 1 at the beginning doing ?
I've only ever done for lops like
for Position in 0...9
{
// do something with Position ten times
}
then whats the % in if row %3 mean?
for var row = 1; row <= kInvaderRowCount; row++ // start of loop
{
var invaderType: InvaderType // varible of atype etc
if row % 3 == 0
{
invaderType = .AType
} else if row % 3 == 1
hers the rest of the code
func makeInvaderOfType(invaderType: InvaderType) -> (SKNode) // function passes in a enum of atype,btype,ctype and returns sknode
{
var invaderColor: SKColor// variable for the colour
switch(invaderType)// switch statment if we pass in atype we will get red
{
case .AType:
invaderColor = SKColor.redColor()
case .BType:
invaderColor = SKColor.greenColor()
case .CType:
invaderColor = SKColor.blueColor()
default:
invaderColor = SKColor.blueColor()
}
let invader = SKSpriteNode(color: invaderColor, size: kInvaderSize)//variable of a skspritenode with color from switch statement size from vairiabe kinvadersize
invader.name = kInvaderName // name is invader fron let kinvadername
return invader //return the spritenode with color size name
}
func setupInvaders()
{
let baseOrigin = CGPoint(x:size.width/3, y:180) // vairible to hold cgpoint screen size /3 width 180 height
for var row = 1; row <= kInvaderRowCount; row++ // start of loop
{
var invaderType: InvaderType // varible of atype etc
if row % 3 == 0
{
invaderType = .AType
} else if row % 3 == 1
{
invaderType = .BType
} else
{
invaderType = .CType
}
let invaderPositionY = CGFloat(row) * (kInvaderSize.height * 2) + baseOrigin.y// varible to hold cgfloat row ? think its the incriment of the for loop times 16 times 2 = 32 plus 180 first time is 212 then 244
/* so if ive got his rightthe sum goes row = 1 kinvadersize.hieght *2 = 32 + baseoringin.y = 180
1 * 32 +180 = 212
2 * 32 + 180 = 392 but its 244
*/
println(row)
var invaderPosition = CGPoint(x:baseOrigin.x, y:invaderPositionY) // varible to hold cgpoint
println(invaderPosition.y)
for var col = 1; col <= kInvaderColCount; col++
{
var invader = makeInvaderOfType(invaderType)// varible that runs function and return the spritenode with color size name????
invader.position = invaderPosition
addChild(invader)
invaderPosition = CGPoint(x: invaderPosition.x + kInvaderSize.width + kInvaderGridSpacing.width, y: invaderPositionY)
}
}
}
If I understand your question correctly, here's the answer. Based on this code:
for var row = 1; row <= kInvaderRowCount; row++ // start of loop
{
var invaderType: InvaderType // varible of atype etc
if row % 3 == 0
{
invaderType = .AType
} else if row % 3 == 1
The first line means:
var row = 1: given a new variable, row, with a value of 1
row <= kInvaderRowCount: as long as the variable row is less than or equal to kInvaderRowCount, keep running the for loop
row++: after each time the loop is run, increment (increase) the value of row by 1
As for the "%", that is the modulo operator. It returns the remainder after a division operation on integer values. So if 7 divided by 3 = 2, with a remainder of 1, then
7 / 3 = 2
7 % 3 = 1
The modulus operator results in an integer. While 1 / 3 = 0.33..., 1 % 3 = 1. Because the remainder of 1 divided by 3 is 1.
1 % 3 = 1
2 % 3 = 2
3 % 3 = 0
4 % 3 = 1
5 % 3 = 2
6 % 3 = 0
see also: How Does Modulus Divison Work.

Most Frequent Digit In a Specific Range

First of all: before you downgrade THIS IS NOT MY HOMEWORK, this question belongs to codingbat or eulerproject or another website. I am NOT asking you to give me a fully completed and coded answer I am asking you to give me some ideas to HELP me.
Later on, I am having a time limit trouble with this problem. I actually solved it but my solution is too slow. It needs to be done within at 0 to 1 second. In the worst case scenario my code consumes more than 8 seconds. If you could help me with some ideas or if you could show me a more accurate solution pseudo code etc. I would really appreciate it.
First input means how many times we are going to process. Later on, user enters two numbers [X, Y], (0 < X < Y < 100000) We need to compute the most frequent digit in the range of these two numbers X and Y. (including X and Y) Besides, If multiple digits have the same maximum frequency than we suppose to print the smallest of them.
To illustrate:
User first enters number of test cases: 7
User enters X and Y(first test case): 0 21
Now I did open all digits in my solution you may have another idea you are free to use it but to give you a hint: We need to treat numbers like this: 0 1 2 3 ... (here we should open 10 as 1 and 0 same for all of them) 1 0 1 1 1 2 1 3 ... 1 9 2 0 2 1 than we show the most frequent digit between 0 and 21 (In this case: 1)
More examples: (Test cases if you want to check your solution)
X: 7 Y: 956 Result: 1
X: 967 Y: 8000 Result: 7
X: 420 Y: 1000 Result: 5 etc.
Here's my code so far:
package most_frequent_digit;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.Set;
public class Main
{
public static int secondP = 0;
public static void getPopularElement(int[] list)
{
Map<Integer, Integer> map = new HashMap<Integer, Integer>();
for (Integer nextInt : list)
{
Integer count = map.get(nextInt);
if (count == null)
{
count = 1;
} else
{
count = count + 1;
}
map.put(nextInt, count);
}
Integer mostRepeatedNumber = null;
Integer mostRepeatedCount = null;
Set<Integer> keys = map.keySet();
for (Integer key : keys)
{
Integer count = map.get(key);
if (mostRepeatedNumber == null)
{
mostRepeatedNumber = key;
mostRepeatedCount = count;
} else if (count > mostRepeatedCount)
{
mostRepeatedNumber = key;
mostRepeatedCount = count;
} else if (count == mostRepeatedCount && key < mostRepeatedNumber)
{
mostRepeatedNumber = key;
mostRepeatedCount = count;
}
}
System.out.println(mostRepeatedNumber);
}
public static void main(String[] args)
{
#SuppressWarnings("resource")
Scanner read = new Scanner(System.in);
int len = read.nextInt();
for (int w = 0; w < len; w++)
{
int x = read.nextInt();
int y = read.nextInt();
String list = "";
for (int i = x; i <= y; i++)
{
list += i;
}
String newList = "";
newList += list.replaceAll("", " ").trim();
int[] listArr = new int[list.length()];
for (int j = 0; j < newList.length(); j += 2)
{
listArr[secondP] = Character.getNumericValue(newList.charAt(j));
secondP++;
}
getPopularElement(listArr);
secondP = 0;
}
}
}
As you can see it takes too long if user enters X: 0 Y: 1000000 like 8 - 9 seconds. But it supposed to return answer in 1 second. Thanks for checking...
Listing all digits and then count them is a very slow way to do this.
There are some simple cases:
X = 10n, X = 10n+1-1 (n > 0) :
The digits 1 to 9 are appearing 10n + n⋅(10n-10n-1) times, 0 appears n⋅(10n-10n-1) times.
E.g.
10, 99: the digits 1 to 9 are appearing 19 times, 0 appears 9 times.
100, 999: the digits 1 to 9 are appearing 280 times, 0 appears 180 times.
X = a⋅10ⁿ, Y = (a+1)⋅10ⁿ-1 (1 ≤ a ≤ 9):
All digits except for a appears n⋅10n-1, the digit a appears 10n + n⋅10n-1 times.
E.g.
10, 19: all digits except for 1 appear one time, 1 appears 11 times.
20, 299: all digits except for 2 appear 20 times, 2 appears 120 times.
With this cases you can split off the input into sub cases. E.g.
X = 0, Y = 21. Split it up into
X₁ = 0, Y₁ = 9 (special case, but very simple),
X₂ = 10, Y₂ = 19 (case 2),
X₃ = 20, Y₃ = 21 (case 3)
X = 0, Y = 3521. Split it up into
X₁ = 0, Y₁ = 9 (special case, but very simple),
X₂ = 10, Y₂ = 99 (case 1),
X₃ = 100, Y₃ = 999 (case 1),
X₄ = 1000, Y₄ = 1999 (case 2),
X₅ = 2000, Y₅ = 2999 (case 2),
X₆ = 3000, Y₆ = 3521 (case 3)
I left case 3 open. The case looks like X = a⋅10ⁿ, Y = a⋅10ⁿ + b (1 ≤ a ≤ 9, 0 ≤ b < 10ⁿ).
Here you know you get the digit a b-times plus the number of appearances in 0 to b. Since X and Y are n+1 digit numbers, b has n digits, with leading zeros.
The missing parts of case 3 have to be filled by the reader.

Resources