How to generate 1 to 1 Source Map? - source-maps

How to generate Source Map if the Source and Target are exactly the same?
One possible way is to write mapping for every single character in the file, like
x in source == x in target
It works, but the resulting Source Map size is huge. Is there any better way to do it?
P.S.
Why do I need such a strange thing? Because I need to join multiple source maps (batch).
a.js -- (possible) transformations --> |
... | -- join --> batch.js.map
x.js -- (possible) transformations --> |
In development we don't use minifications (so there's no initial source map for JS file), but still use batches, and I need source map (1to1 would be fine) to produce source map for the batch (I post-process source maps with special processor that sets correct offsets for batch source map, etc., but it needs a source map to start with).

There are several options to create an identity source map:
You could use the generate-source-map package. It works only for javascript files; it parses them and generates one mapping for each javascript token (operators, identifiers etc). Best quality maps but at the cost of generation time and map size.
var generate = require('generate-source-map');
var fs = require('fs');
var file = 'test.js';
var map = generate({
source: fs.readFileSync(file),
sourceFile: file
});
fs.writeFileSync(file + '.map', map.toString());
Then there's the more generic source-list-map. It works for any text file and generates 1-to-1 line mappings. Less detailed, probably faster and smaller maps.
var SourceListMap = require('source-list-map').SourceListMap;
var fs = require('fs');
var file = 'test.js';
var sourceListMap = new SourceListMap();
var fileContents = fs.readFileSync(file).toString();
sourceListMap.add(fileContents, file, fileContents);
var map = sourceListMap.toStringWithSourceMap({ file: file });
fs.writeFileSync(file + '.map', JSON.stringify(map));
For ultimate flexibility you could use the source-map package and do the splitting yourself:
var sourceMap = require('source-map');
var sourceMapResolve = require('source-map-resolve');
var fs = require('fs');
var file = 'test.js';
var fileContents = fs.readFileSync(file).toString();
var chunks = [];
var line = 1;
fileContents.match(/^[^\r\n]*(?:\r\n?|\n?)/mg).forEach(function(token) {
if (!/^\s*$/.test(token)) {
chunks.push(new sourceMap.SourceNode(line, 0, file, token));
}
++line;
});
var node = new sourceMap.SourceNode(null, null, null, chunks);
node.setSourceContent(file, fileContents);
var result = node.toStringWithSourceMap({file: file});
fs.writeFileSync(file + '.map', result.map);
The two last ones also provide additional functionality e.g. for merging multiple files together.

Related

Any ar js multimarkers learning tutorial?

I have been searching for ar.js multimarkers tutorial or anything that explains about it. But all I can find is 2 examples, but no tutorials or explanations.
So far, I understand that it requires to learn the pattern or order of the markers, then it stores it in localStorage. This data is used later to display the image.
What I don't understand, is how this "learner" is implemented. Also, the learning process is only used once by the "creator", right? The output file should be stored and then served later when needed, not created from scratch at each person's phone or computer.
Any help is appreciated.
Since the question is mostly about the learner page, I'll try to break it down as much as i can:
1) You need to have an array of {type, URL} objects.
A sample of creating the default array is shown below (source code):
var markersControlsParameters = [
{
type : 'pattern',
patternUrl : 'examples/marker-training/examples/pattern-files/pattern-hiro.patt',
},
{
type : 'pattern',
patternUrl : 'examples/marker-training/examples/pattern-files/pattern-kanji.patt',
}]
2) You need to feed this to the 'learner' object.
By default the above object is being encoded into the url (source) and then decoded by the learner site. What is important, happens on the site:
for each object in the array, an ArMarkerControls object is created and stored:
// array.forEach(function(markerParams){
var markerRoot = new THREE.Group()
scene.add(markerRoot)
// create markerControls for our markerRoot
var markerControls = new THREEx.ArMarkerControls(arToolkitContext, markerRoot, markerParams)
subMarkersControls.push(markerControls)
The subMarkersControls is used to create the object used to do the learning. At long last:
var multiMarkerLearning = new THREEx.ArMultiMakersLearning(arToolkitContext, subMarkersControls)
The example learner site has multiple utility functions, but as far as i know, the most important here are the ArMultiMakersLearning members which can be used in the following order (or any other):
// this method resets previously collected statistics
multiMarkerLearning.resetStats()
// this member flag enables data collection
multiMarkerLearning.enabled = true
// this member flag stops data collection
multiMarkerLearning.enabled = false
// To obtain the 'learned' data, simply call .toJSON()
var jsonString = multiMarkerLearning.toJSON()
Thats all. If you store the jsonString as
localStorage.setItem('ARjsMultiMarkerFile', jsonString);
then it will be used as the default multimarker file later on. If you want a custom name or more areas - then you'll have to modify the name in the source code.
3) 2.1.4 debugUI
It seems that the debug UI is broken - the UI buttons do exist but are nowhere to be seen. A hot fix would be using the 'markersAreaEnabled' span style for the div
containing the buttons (see this source bit).
It's all in this glitch, you can find it under the phrase 'CHANGES HERE' in the arjs code.

Calling it statements inside loops

I am new to Mocha. i am calling the it statement in a loop. I have a working script which i add here to ask if there is a better way to do this.
The following is the working script
var xl = require('./excel');
describe("Register User", function(){
var csv = xl.readExcel(); //gets multiple rows as csv.
var arrRows = csv.split("\n");
var arrRow = []; //will store the current row under test
var iRow = 0;
before(function() {
//can variables csv and arrRows be initialized here?
});
beforeEach(function(){
arrRow = xl.splitCsvToArray(arrRows[iRow++]);
});
for(var i = 0; i < arrRows.length - 1; i++){
it('test case X', function(){
console.log("current row is: " + iRow);
console.log("1st column is: " + arrRow[0][1]);
console.log("2nd column is: " + arrRow[0][2]);
});
}
});
result is
1st column is: col2row3
2nd column is: col3row3
√ test case X
current row is: 5
1st column is: col2row4
2nd column is: col3row4
√ test case X
current row is: 6
1st column is: col2row5
2nd column is: col3row5
√ test case X
current row is: 7
1st column is: col2row6
2nd column is: col3row6
√ test case X
7 passing (27ms)
Thanks in advance.
There's absolutely no problem calling it inside a synchronous loop like you are showing in your code. I do it whenever I have a finite set of conditions I need to test and the tests can be generated by looping.
If you have a loop that generates tests asynchronously then you have to use --delay and call mocha.run() to indicate when the test generation is done and Mocha can start running the tests.
Ideally, you should move your initialization of csv and arrRows into your before hook:
describe("Register User", function(){
var csv;
var arrRows;
var arrRow = []; //will store the current row under test
var iRow = 0;
before(function() {
csv = xl.readExcel(); //gets multiple rows as csv.
arrRows = csv.split("\n");
});
[...]
The only initializations you should feel free to do outside of the before and beforeEach hooks are those that are extremely cheap to do. The problem is that initializations performed outside the hooks are always done even if the suite does not need them. For instance, if you use --grep to select some tests that are outside the describe you show in your question, and your initializations are as you show in your question, then Mocha will load your Excel file and break it into rows even though it is not needed. By putting such initializations in before/beforeEach in the describe block that wraps your tests, you ensure that Mocha will run the initializations only when it needs to run a test that depends on them.
The problem though is that arrRows needs to be defined to run the loop. You can:
Abandon the ideal of not having initialization code outside of the hooks. This means keeping your initialization code as it is.
Move the loop inside it and have one test that checks the entire array. The granularity of your tests is up to you. It is a matter of preference and of how the code you test is structured. There's no hard and fast rule here.
If the structure you expect is meant to be regular and have the same set number of rows each time. Define a variable, e.g. TABLE_LENGTH = 10 and a) use it as the limit in your loop (for(var i = 0; i < TABLE_LENGTH; i++)) b) include in your before an assertion that verifies that the table you get has the length you expect (assert.equal(arrRows.length, TABLE_LENGTH)). This would allow you to perform the initialization as recommented, inside before/beforeEach, and still have a loop that creates multiple it.

How can i make dataset/dataframe from compressed(.zip) local file in apache spark

I have large compressed(.zip) files around 10 GB each. I need to read content of file inside zip without unzipping it and want to apply transformations.
System.setProperty("HADOOP_USER_NAME", user)
println("Creating SparkConf")
val conf = new SparkConf().setAppName("DFS Read Write Test")
println("Creating SparkContext")
val sc = new SparkContext(conf)
var textFile = sc.textFile(filePath)
println("Count...."+textFile.count())
var df = textFile.map(some code)
`
When i passing a any .txt,.log,.md etc.. above is working fine. But when pass .zip files the it giving Zero Count.
Why it is giving count Zero
Please suggest me correct way of doing this, If am totally wrong.
You have to perform this task like this, it's a different operation then simply loading other kind of files which spark supports.
val rdd = sc.newAPIHadoopFile("file.zip", ZipFileInputFormat.class,Text.class, Text.class, new Job().getConfiguration());

Uploaded File is becoming zero bytes if I use xmldocument.load to check for some validation and then uploading it to azure blob

var xmlDoc = new XmlDocument();
xmlDoc.Load(file.InputStream);
var x = xmlDoc.GetElementsByTagName("typeId");
if (x.Count == 0)
{
getccdofuseranddoc(Convert.ToInt64(Session["UserID"]));
ViewBag.FileSizes = true;
return View();
}
If i use this block of code the file becomes of zero when it is uploaded to azure blob , if this block of code is not used it works fine
PS: this block is used to check the file only ; File s Content length is not zero at the time when Upload.Tostream is used.
You may be overwriting the file somewhere else. Can you please provide the code showing how you upload to Blob storage?

Apache Spark on YARN: Large number of input data files (combine multiple input files in spark)

A help for the implementation best practice is needed.
The operating environment is as follows:
Log data file arrives irregularly.
The size of a log data file is from 3.9KB to 8.5MB. The average is about 1MB.
The number of records of a data file is from 13 lines to 22000 lines. The average is about 2700 lines.
Data file must be post-processed before aggregation.
Post-processing algorithm can be changed.
Post-processed file is managed separately with original data file, since the post-processing algorithm might be changed.
Daily aggregation is performed. All post-processed data file must be filtered record-by-record and aggregation(average, max min…) is calculated.
Since aggregation is fine-grained, the number of records after the aggregation is not so small. It can be about half of the number of the original records.
At a point, the number of the post-processed file can be about 200,000.
A data file should be able to be deleted individually.
In a test, I tried to process 160,000 post-processed files by Spark starting with sc.textFile() with glob path, it failed with OutOfMemory exception on the driver process.
What is the best practice to handle this kind of data?
Should I use HBase instead of plain files to save post-processed data?
I wrote own loader. It solved our problem with small files in HDFS. It uses Hadoop CombineFileInputFormat.
In our case it reduced the number of mappers from 100000 to approx 3000 and made job significantly faster.
https://github.com/RetailRocket/SparkMultiTool
Example:
import ru.retailrocket.spark.multitool.Loaders
val sessions = Loaders.combineTextFile(sc, "file:///test/*")
// or val sessions = Loaders.combineTextFile(sc, conf.weblogs(), size = 256, delim = "\n")
// where size is split size in Megabytes, delim - line break character
println(sessions.count())
I'm pretty sure the reason your getting OOM is because of handling so many small files. What you want is to combine the input files so you don't get so many partitions. I try to limit my jobs to about 10k partitions.
After textFile, you can use .coalesce(10000, false) ... not 100% sure that will work though because it's been a while since I've done it, please let me know. So try
sc.textFile(path).coalesce(10000, false)
You can use this
First You can get a Buffer/List of S3 Paths / Same for HDFS or Local Path
If you're trying with Amazon S3 then :
import scala.collection.JavaConverters._
import java.util.ArrayList
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.ObjectListing
import com.amazonaws.services.s3.model.S3ObjectSummary
import com.amazonaws.services.s3.model.ListObjectsRequest
def listFiles(s3_bucket:String, base_prefix : String) = {
var files = new ArrayList[String]
//S3 Client and List Object Request
var s3Client = new AmazonS3Client();
var objectListing: ObjectListing = null;
var listObjectsRequest = new ListObjectsRequest();
//Your S3 Bucket
listObjectsRequest.setBucketName(s3_bucket)
//Your Folder path or Prefix
listObjectsRequest.setPrefix(base_prefix)
//Adding s3:// to the paths and adding to a list
do {
objectListing = s3Client.listObjects(listObjectsRequest);
for (objectSummary <- objectListing.getObjectSummaries().asScala) {
files.add("s3://" + s3_bucket + "/" + objectSummary.getKey());
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
//Removing Base Directory Name
files.remove(0)
//Creating a Scala List for same
files.asScala
}
Now Pass this List object to the following piece of code, note : sc is an object of SQLContext
var df: DataFrame = null;
for (file <- files) {
val fileDf= sc.textFile(file)
if (df!= null) {
df= df.unionAll(fileDf)
} else {
df= fileDf
}
}
Now you got a final Unified RDD i.e. df
Optional, And You can also repartition it in a single BigRDD
val files = sc.textFile(filename, 1).repartition(1)
Repartitioning always works :D

Resources