I want to sort the mapper output records by the first 2 fields before feeding them to reducer, and here is how I did it:
hadoop streaming \-D mapred.job.name="multi_field_key_sort"\
-D mapred.job.map.capacity=100\
-D mapred.reduce.tasks=1\
-D stream.num.map.output.key.fields=2\
-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator\
-D mapred.text.key.comparator.options="-k1,2n"\
-input "..."\
-output "..."\
-mapper "..."\
-reducer "cat"\
but the final results are not sorted by the first 2 fields, they are only sorted by the 1st fields, why?
Anything wrong with my hadoop job conf?
Related
Suppose i have the below curl where i will be reading two of the varaibles from a file.how can we accomodate both the varaibles in a single while loop
while read p; do
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'HTTP_X_MERCHANT_CODE: STA' --header 'AK-Client-IP: 135.71.173.56' --header 'Authorization: Basic qwrewrereererweer' -d '{
"request_details": [
{
"id": "$p", #first dynamic varaible which will be fetched from the file file.txt
"id_id": "$q", #second dynamic varaible to be fetched from the file file.txt
"reason": "Pickup reattempts exhausted"
}
]
}' api.stack.com/ask
done<file.txt
file.tx will have two columns from which the dynamic variables whill be fetched to the above curl. Pls let me know how can we accomodate both the variable in the above while loop
i will need bit of a help regarding the same
Since you'll want to use a tool like jq to construct the payload anyway, you should let jq parse the file instead of using the shell.
filter='split(" ") |
{ request_details: [
{
id: .[0],
id_id: .[1],
reason: "Pickup reattempts exhausted"
}
]
}'
jq -cR "$filter" file.txt |
while IFS= read -r payload; do
curl -X POST --header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'HTTP_X_MERCHANT_CODE: STA' \
--header 'AK-Client-IP: 135.71.173.56' \
--header 'Authorization: Basic qwrewrereererweer' \
-d "$payload"
done
The -c option to jq ensures the entire output appears on one line: curl doesn't need a pretty-printed JSON value.
read accepts multiple target variable names. The last one receives all the content not yet read from the line. So read p reads the whole line, read p q would read the first token (separated by whitespace) into p and the rest into q, and read p q r would read the first two tokens into p and q and any remaining junk into r (for example if you want to support comments or extra tokens in file.txt).
I want to send a big json with long string field by curl, how should I crop it to multiple lines? For example:
curl -X POST 'localhost:3000/upload' \
-H 'Content-Type: application/json'
-d "{
\"markdown\": \"# $TITLE\\n\\nsome content with multiple lines....\\n\\nsome content with multiple lines....\\n\\nsome content with multiple lines....\\n\\nsome content with multiple lines....\\n\\n\"
}"
Use a tool like jq to generate your JSON, rather than trying to manually construct it. Build the multiline string in the shell, and let jq encode it. Most importantly, this avoids any potential errors that could arise from TITLE containing characters that would need to be correctly escaped when forming your JSON value.
my_str="# $TITLE
some content with multiple lines...
some content with multiple lines...
some content with multiple lines..."
my_json=$(jq --argjson v "$my_str" '{markdown: $v}')
curl -X POST 'localhost:3000/upload' \
-H 'Content-Type: application/json' \
-d "$my_json"
curl has the ability to read the data for -d from standard input, which means you can pipe the output of jq directly to curl:
jq --argjson v "$my_str" '{markdown: $v}' | curl ... -d#-
You can split anything to multiple lines using the technique already in your post, by terminating lines with \.
If you need to split in the middle of a quoted string,
terminate the quote and start a new one.
For example these are equivalent:
echo "foobar"
echo "foo""bar"
echo "foo"\
"bar"
But for your specific example I recommend a much better way.
Creating the JSON in a double-quoted string is highly error prone,
because of having to escape all the internal double-quotes,
which becomes hard to read and maintain as well.
A better alternative is to use a here-document,
pipe it to curl, and use -d#- to make it read the JSON from stdin.
Like this:
formatJson() {
cat << EOF
{
"markdown": "some content with $variable in it"
}
EOF
}
formatJson | curl -X POST 'localhost:3000/upload' \
-H 'Content-Type: application/json'
-d#-
If I were you, I'd save the JSON to a file:
curl -X POST 'localhost:3000/upload' \
-H 'Content-Type: application/json' \
-d "$(cat my_json.json)"
Since I have special char in one of the fields, I wanted to use lower value as delimiter. Hive works fine with the delimiter(\0) but sqoop fails with NoSuchElement Exception. Looks like it is not detecting the delimiter as \0.
This is how my hive an sqoop script looks like. Any help please.
CREATE TABLE SCHEMA.test
(
name CHAR(20),
id int,
dte_report date
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\0'
LOCATION '/user/$USER/test';
sqoop-export \
-Dmapred.job.name="TEST" \
-Dorg.apache.sqoop.export.text.dump_data_on_error=true \
--options-file ${OPTION_FILE_LOCATION}\conn_mysql \
--export-dir /user/$USER/test \
--input-fields-terminated-by '\0' \
--input-lines-terminated-by '\n' \
--input-null-string '\\N' \
--input-null-non-string '\\N' \
--table MYSQL_TEST \
--validate \
--outdir /export/home/$USER/javalib
In VI editor, the delimiter looks like '^#' and with od -c the delimiter is \0
Set the character set to UTF 8 in the my sql conn string that can resolve this issue.
mysql.url=jdbc:mysql://localhost:3306/nbs?useJvmCharsetConverters=false&useDynamicCharsetInfo=false&useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8&useEncoding=true
You should use \000 as delimiter , it will generate that character as a delimiter.
how to pass empty string to value?
-D <property=value> use value for given property
I use hadoop pipes
I tried
-D prop1=
-D prop2=value2
but it doesn't work
# HadoopPipes::JobConf
# jobconf.hasKey(prop1) is false
I'd like to run an ldapsearch query repeatedly substituting the uid from a list and output results to a new file.
ldapsearch -h ldap.com -p 389 -x -b "dc=top,dc=com" \
"uid=**value_from_a_text_file**" >>ldap.query.results.
Are there any suggestions on how to accomplish this?
Assuming your file is a list of UIDs, one-per-line, and is named uidfile.txt
for line in `cat uidfile.txt`; do
ldapsearch -h ldap.com -p 389 -x -b "dc=top,dc=com" "uid=${line}" >>ldap.query.results
done
Assuming data in CSV format with first field as UID
awk -F "," '{print $1}' data.csv | \
while read uiddata
do
ldapsearch -h ldap.com -p 389 -x -b "dc=top,dc=com" "uid=${uiddata}" >> ldap.query.results
done