SparkR Write into a Parquet file - parquet

I am trying to write a data frame into a parquet format. The data frame is
str(test)
'data.frame': 365 obs. of 4 variables:
$ id : chr "Apple" "Apple" "Apple" "Apple" ...
$ text : chr "譲渡 拡散希望\npsychopass サイコパス トレーディングラバーストラップ 宜野座伸元\n特典円通常円送料にてお譲りします検索からでもお"| truncated "retweet\n\npeachpanther albumin the world right now" "haarlem vacature internet strateeg opzoek naar cto software architectlead developer star applehaarl" "ในอายทเทากน\nผหญงมความเปนผใหญมากกวาผชาย\nไมมผชายคนไหนไปไดสวยกบผหญงอายเทากนไดหรอก\n you are the a""| truncated ...
$ emotion : chr "unknown" "unknown" "unknown" "unknown" ...
$ polarity: chr "positive" "positive" "positive" "positive" ...
When i try to use write.parquet i get the following error
write.parquet(test,"hdfs://xxx.xxx.xxx.xxx:9000/orcladv/intdata/processedtweets")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘write.parquet’ for signature ‘"data.frame", "character"’
Has anyone faced this issue. Please help me solve this.
Regards
Bala

"data.frame" in the error message indicates you are using an R data.frame. The write.parquet() function you are using operates on Spark DataFrames not R data.frames.
Details of how to convert between the two here: https://spark.apache.org/docs/latest/sparkr.html#creating-dataframes

Related

Could not convert parameter `tx` between node and runtime: No such variant in enum MultiSignature

Hi I am getting the below error in polkadot-js, when I am trying to transfer the balance from Alice to Dave (or any other transfer).
Error :
balances.transferKeepAlive
1002: Verification Error: Execution: Could not convert parameter tx between node and runtime: No such variant in enum MultiSignature: RuntimeApi, Execution: Could not convert parameter tx between node and runtime: No such variant in enum MultiSignature
Please refer the screen shot in the below :
Screen Shot
You are missing some data types on your UI, adding this in developer settings will do the job.
{
"Address": "MultiAddress",
"LookupSource": "MultiAddress"
}
https://polkadot.js.org/docs/api/FAQ#the-node-returns-a-could-not-convert-error-on-send
{
"Address": "MultiAddress",
"LookupSource": "MultiAddress"
}
Worked for me on the Substrate - Forkless Upgrade a Chain tutorial: https://substrate.dev/docs/en/tutorials/forkless-upgrade/sudo-upgrade
https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A9944#/settings/developer -> Settings - Developer - change the JSON to match the above -> Save, that's it.

Google automl_v1beta1 error "the provided location ID is not valid"

I am trying to call trained model from google colab with example provided.
But there is an error.
Who knows is it beta error or I have not set somethoing properly?
Thanks in advance.
The code
from google.cloud import automl_v1beta1 as automl
automl_client = automl.AutoMlClient()
# Create client for prediction service.
prediction_client =
automl.PredictionServiceClient().from_service_account_json(
'XXXXX.json')
# Get the full path of the model.
model_full_id = automl_client.model_path(
project_id, compute_region, model_id
)
# Read the file content for prediction.
#with open(file_path, "rb") as content_file:
snippet = "fsfsf" #content_file.read()
# Set the payload by giving the content and type of the file.
payload = {"text_snippet": {"content": snippet, "mime_type": "text/plain"}}
# params is additional domain-specific parameters.
# currently there is no additional parameters supported.
params = {}
response = prediction_client.predict(model_full_id, payload, params)
print("Prediction results:")
for result in response.payload:
print("Predicted class name: {}".format(result.display_name))
print("Predicted class score: {}".format(result.classification.score))
The eror msg^
InvalidArgument: 400 List of found errors: 1.Field: name; Message: The provided location ID is not valid.
You have to use a region that supports AutoML beta. This works for me:
create_dataset("myproj-123456", "us-central1", "my_dataset_id", "en", "de")
I clone the repo "python-docs-samples" :
$ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
I navigate to the automl examples
$ cd /home/MY_USER/python-docs-samples/language/automl/
I set the environment variables for [1]:
GOOGLE_APPLICATION_CREDENTIALS
PROJECT_ID
REGION_NAME
I typed:
$ python automl_natural_language_dataset.py create_dataset automltest1 False
I got this message:
Dataset name: projects/198768927566/locations/us-central1/datasets/TCN7889001684301386365
Dataset id: TCN7889001684301386365
Dataset display name: automltest1
Text classification dataset metadata:
classification_type: MULTICLASS
Dataset example count: 0
Dataset create time:
seconds: 1569367227
nanos: 873147000
I set the environment variable for :
DATASET_ID
Please note that I got this for the step 5.
I typed:
python automl_natural_language_dataset.py import_data $DATASET_ID "gs://$PROJECT_ID-lcm/complaints_manual.csv"
I got this message:
Processing import...
Dataset imported.

I met a error warning when run regression model for a panel using plm package

I have a panel for 27 years, but met this warning when I run a regression.
panel data of global suicide rate with temperature
I use the following codes:
library(plm)
install.packages("dummies")
library(dummies)
data2 <- cbind(mydata, dummy(mydata$year, sep ="_"))
suicide_fe <- plm(suiciderate ~ dmt, data2, index = c("country", "year"),
model= "within")
summary(suicide_fe)
But I got this error:
Error in pdim.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame to find out which, use
e.g. table(index(your_pdataframe), useNA = "ifany") 2: In
is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)

PIG: Unable to open iterator for alias AliasName.Scalar has more than one row in the output

I am new to pig and trying to learn on my own.
I have written a script to get the epoch time with a word that is reading from words.txt file.
Here is the script.
words = LOAD 'words.txt' AS word:chararray;
B = FOREACH A GENERATE CONCAT(CONCAT(A.word,'_'),(chararray)ToUnixTime(CurrentTime());
dump B;
But the issue is, if words.txt file have only one word it is giving proper output.
If it is having multiple words like
word1
word2
word3
word4
then it is giving the following error
ERROR 1066: Unable to open iterator for alias B
java.lang.Exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Scalar has more than one row in the output. 1st : (word1 ), 2nd :(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar"
should be "foo::bar" ) at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
0: Scalar has more than one row in the output. 1st : (word1 ), 2nd
:(word2) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar"
should be "foo::bar" ) at
org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:122) at
o
Please suggest me to solve this issue.
Thank you.
solved on my own.
just removed the A. from the inner CONCAT. It worked for me.
script:
words = LOAD 'words.txt' AS word:chararray;
B = FOREACH A GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime());
dump B;

What are the child OIDs in an SNMP trap?

I have inherited a MIB and example documentation, and need to re-implement the code that generates traps. (For various reason the original code is lost and gone forever, but CM is not my question.)
The MIB says:
alertObjects OBJECT IDENTIFIER ::= { corpAlert 1 }
alertEvents OBJECT IDENTIFIER ::= { corpAlert 2 }
alertDispatchTime OBJECT-TYPE
SYNTAX OCTET STRING
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"Time Event Dispatched"
::= { alertObjects 3 }
testFailure OBJECT IDENTIFIER ::= { alertEvents 4 }
testFailureClearTrap NOTIFICATION-TYPE
OBJECTS
{
alertDispatchTime,
[omitted]
}
STATUS current
DESCRIPTION
"Clear prior failure"
::= { testFailure 0 }
Our documentation has the following snippet:
/usr/bin/snmptrap \
-v 1 \
-c public 192.168.0.2:162 [our-base-oid] 127.0.0.1 6 4 '' \
[our-base-oid].2.4.0.4.1.0 s "May 21 2007 10:19PM" \
[etc]
What I can't figure out is the OID used for the alert dispatch time. I would understand it if it were [our-base-oid].1.3.0, or even [our-base-oid].2.4.0.[our-base-oid].1.3. If we were generating a trap at { alertEvents 3 }, what would the suffix be for the individual objects?
It is possible that the MIB was updated after the documentation, so if this looks wrong to an expert then what should the OID be for the alertDispatchTime?
Thanks.
As defined here, alertDispatchTime is a scalar object (only one instance), so its instance subidentifier is always 0 (full OID is [corpAlert].1.3.0). The notification's OID is [corpAlert].2.4.0.
Assuming by "[our-base-oid]" you mean corpAlert, the snmptrap command shown doesn't look to be correct because [our-base-oid].2.4.0.4.1.0 would be testFailureClearTrap.4.1.0, which doesn't make sense: traps don't have instance subidentifiers. But I'm making some assumptions here about the parts of the MIB spec you've not included.
If you have a working system, maybe it'll good if you can generate a trap and see its contents.

Resources