Hadoop - textouputformat.separator use ctrlA ( ^A ) - hadoop

I'm trying to use ^A as the separator between Key and Value in my reduce output files.
I found that the config setting "mapred.textoutputformat.separator" is what I want and this correctly switches the separator to ",":
conf.set("mapred.textoutputformat.separator", ",");
But it can't handle the ^A character:
conf.set("mapred.textoutputformat.separator", "\u0001");
throws this error:
ERROR security.UserGroupInformation: PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 94; Character reference "&#
I found this ticket https://issues.apache.org/jira/browse/HADOOP-7542 and see they tried to fix this but reverted the patch due to XML1.1 concerns.
SO I'm wondering if anyone has had success setting the separator to ^A (seems pretty common), using an easy work around. Or if I should just settle and use tab separator.
Thanks!
I'm running Hadoop 0.20.2-cdh3u5 on CentOS 6.2

Looking around it looks like there are maybe three options that i've found for solving this problem:
Character reference “&#1” is an invalid XML character - similar SO question
Unicode characters/Ctrl G or Ctrl A as TextOutputFormat (Hadoop) delimiter
The possible solutions as detailed in the link above are:
You can Base64 encode the separator character. You then need to create a custom TextOutputFormat that overrides the getRecordWriter method and decodes the Base64 encoded separator.
Create a custom TextOutputFormat again, except change the default separator character from a tab.
Provide the delimiter through an XML resource file. You can specify a custom resource file using the addResource() method of the jobs Configuration.

Related

MongoDB Error parsing YAML Config illegal map value for replica set

Here is my /etc/mongodb.conf - using MongoDB 3.6.
I'm having a challenge with the config file being parsed when starting mongod.
I have a single space after each colon and two spaces on each new line
I took the replica set example from mongoDB docs here: https://docs.mongodb.com/manual/reference/configuration-options/#replication-options
dbpath=/home/ubuntu/data/db
logpath=/home/ubuntu/data/db/log/mongo.log
logappend=true
journal=true
replication:
replSetName: rep
net:
bindIp: 127.0.0.1
port: 27017
The error is:
Error parsing YAML config file: yaml-cpp: error at line 6, column 12: illegal map value
Command I'm sending is
Error parsing YAML config file: yaml-cpp: error at line 6, column 12: illegal map value
try 'mongod --help' for more information
I don't know what you think the first four lines do but they are certainly not YAML; instead they use a format resembling .properties files, with a = separating property name from value.
Since a = in YAML is simply content, it parses the first six lines as multiline scalar, meaning the value of those lines in YAML is the scalar
dbpath=/home/ubuntu/data/db logpath=/home/ubuntu/data/db/log/mongo.log logappend=true journal=true
replication
(Single line breaks are folded into a space, an empty line generates a line break.)
Now the error happens because YAML disallows multiline scalars to be implicit keys of a mapping. Implicit keys are scalars preceding a : on the same line which form a mapping key.
You fix the error by removing the first four lines, or transforming them into proper YAML. It is unclear what your intention with those lines is since not every name has a corresponding setting in the documentation you linked.

Phoenix -> csv -> invalid char between encapsulated token and delimiter

I need to upload a CSV dump file to the Phoenix database
Files that did not contain any special characters were loaded without problems
./psql.py -t TTT localhost /home/isaev/output.csv -d';'
But as soon as I tried to load the same file in which the data fields were met with quotes, I get an error
java.lang.RuntimeException: java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
at org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:132)
at org.apache.phoenix.util.CSVCommonsLoader.upsert(CSVCommonsLoader.java:217)
at org.apache.phoenix.util.CSVCommonsLoader.upsert(CSVCommonsLoader.java:182)
at org.apache.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:308)
Caused by: java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
... 5 more
For example on the first line (line 1) I have this entry
5863355029;007320071; ZAO "With a smile for life";True;
I found the solution myself:
-q'\'
Can someone come in handy
You can resolve your problem using quotes 2 times:
5863355029;007320071; ZAO ""With a smile for life"";True;
Each field may or may not be enclosed in double quotes. If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.
Check this link if you are interested in why: https://www.marklogic.com/blog/delimited_text_mlcp/

How to fix "mapping values are not allowed in this context " error in yaml file?

I've browsed similar questions and believe i've applied all that i've been able to glean from answers.
I have a .yml file where as far as I can tell each element is formatted identically. And yet according to YamlLint.com
(<unknown>): mapping values are not allowed in this context at line 119 column 16
In this case, line 119 is the line containing the second instance the word "transitions" below. That I can tell each element is formatted identically. Am I missing something here?
landingPage:
include: false
transitions:
-
condition:location
nextState:location
location:
include:false
transitions:
-
condition:excluded
nextState:excluded
excluded:
include:false
transitions:
-
condition:excluded
nextState: excluded
-
condition:age
nextState:age
You cannot have a multiline plain scalar, such as your include:false transitions be the key to a mapping, that is why you get the mapping values not allowed in this context error.
Either you forgot that you have to have a space after the value indicator (:), and you meant to do:
include: false
transitions:
or you need to quote your multi-line scalar:
'include:false
transitions':
or you need to put that plain scalar on one line:
include:false transitions:
please note that some libraries do not allow value indicators in a plain scalar at all, even if they are not followed by space
I fixed this for myself by simply realizing I had indented a line too far, and un-indenting it.
we need to use space before ":"
Then it will excecute
check the yaml script in below
http://www.yamllint.com/
There are couple of issues in the yaml file, with yaml files it gets messy, fortunately it can be identified easily with tools like yaml lint
Install it
npm install -g yaml-lint
Here is how you can validate
E:\githubRepos\prometheus-sql-exporter-usage\etc>yamllint prometheus.yaml
√ YAML Lint successful.
For me the problem was a unicode '-' from a cut and paste. Visualy it looked OK, but the character was 'EN DASH' (U+2013) instead of 'HYPHEN MINUS' (U+002D)
In mine case it was the space after the : in a value:
query-url: https://blabla.com/blabla?label=blabla: blabla
To fix:
query-url: https://blabla.com/blabla?label=blabla:%20blabla
Or:
query-url: "https://blabla.com/blabla?label=blabla: blabla"
If you are using powershell and have copied the cat command, it won't work properly (I'm guessing it is encoding the content in some way). Instead of using "$(cat file.yaml)" you should use $(Get-Content file.yaml -Raw) without the quotes.
Really annoying!
In my case if was some odd disappearing of the initial formatting of the initial chart that was copied in Intellij Idea. It was possible to gfigure out with text-compare tool only:
So, when you do your copy and paste in your IDE, please double check is what you have copied is exactly what you paste, aren't some additional spaces were added.

Expected block end YAML error

When pasting this YAML file into an online yaml parser, I got an expected block end error:
ADDATTEMPTING: 'Tentative d ajout '
ATTEMPTINGTOGIVE: 'Tenter de donner '
ATTEMPTINGTOSET1: 'Tentative de définition '
ATTEMPTINGTOSET2: ' avec '
ALREADYEXISTS: 'Erreur. Package existe déjà’
CANCEL1: 'Annulation...'
(...)
Error
ERROR:
while parsing a block mapping
in "<unicode string>", line 1, column 1:
ADDATTEMPTING: 'Tentative d ajout '
^
expected <block end>, but found '<scalar>'
in "<unicode string>", line 6, column 11:
CANCEL1: 'Annulation...'
^
The line starting ALREADYEXISTS uses ’ as the closing quote, it should be using '. The open quote on the next line (where the error is reported) is seen as the closing quote, and this mix up is causing the error.
This error also occurs if you use four-space instead of two-space indentation.
e.g., the following would throw the error:
fields:
- metadata: {}
name: colName
nullable: true
whereas changing indentation to two-spaces would fix it:
fields:
- metadata: {}
name: colName
nullable: true
I would like to make this answer for meaningful, so the same kind of
erroneous user can enjoy without feel any hassle.
Actually, i was getting the same error but for the different reason, in my case I didn't used any kind of quoted, still getting the same error like expected <block end>, but found BlockMappingStart.
I have solved it by fixing, the Alignment issue inside the same .yml file.
If we don't manage the proper 'tab-space(Keyboard key)' for
maintaining successor or ancestor then we have to phase such kind of
things.
Now i am doing well.
With YAML, remember that it is all about the spaces used to define configuration through the hierarchical structures (indents). Many problems encountered whilst parsing YAML documents simply stems from extra spaces (or not enough spaces) before a key value somewhere in the given YAML file.
YAML follows indentation structure very strictly. Even one space/tab can cause above issue. In my case it was just once space at the start.
So make sure no extra spaces/tabs are introduced while updating YAML file
I got same issue and found that there is space in next line which combine with content of yml. For solution i just remove that space. Thanks
In my case, the error occured when I tried to pass a variable which was looking like a bytes-object (b"xxxx") but was actually a string.
You can convert the string to a real bytes object like this:
foo.strip('b"').replace("\\n", "\n").encode()

Unknown Character

Facing a typical issue of some unknown character.
Actually trying to compile some packages in database through script and got an error as below:
SP2-0734: unknown command beginning "?SET DEF..." - rest of line ignored.
When i open the log file in notepad++ it shows the line as shown above.
Now, if I open the same log file in scite editor it shows the same file as:
SP2-0734: unknown command beginning "SET DEF..." - rest of line ignored.
Not getting what could be the issue.
Any help would be welcomed.
Your script has an unprintable character at the start (as you discovered from comments), which some editors don't display at all, and others display as an unknown character. "" is the byte order mark:
The UTF-8 representation of the BOM is the byte sequence
0xEF,0xBB,0xBF. A text editor or web browser interpreting the text as
ISO-8859-1 or CP1252 will display the characters  for this.
From that article some editors (notable Notepad) add that automatically. It should be safe to open the file with a hex editor and remove the extra character, and you'll then be able to run the script normally.

Resources