Skip to content

run demo of sona latest version bug #62

@lcx517

Description

@lcx517

Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~

2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for TERM
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for HUP
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for INT
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(deepthought); groups with view permissions: Set(); users  with modify permissions: Set(deepthought); groups with modify permissions: Set()
2019-12-26 14:09:20 INFO  UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5
2019-12-26 14:09:20 INFO  ApplicationMaster:54 - Preparing Local resources
2019-12-26 14:09:22 WARN  Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001
2019-12-26 14:09:28 INFO  AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis.
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Starting the user application in a separate Thread
2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception: 
java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491)
	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples)
2019-12-26 14:09:28 INFO  ShutdownHookManager:54 - Shutdown hook called

my SONA-example script:

source ./spark-on-angel-env.sh
export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop

$SPARK_HOME/bin/spark-submit \
        --master yarn-cluster \
        --driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \
        --keytab /home/deepthought/deepthought.keytab \
        --principal deepthought \
        --queue longyuan.p0 \
	--conf spark.ps.jars=$SONA_ANGEL_JARS \
	--conf spark.ps.instances=10 \
	--conf spark.ps.cores=2 \
	--conf spark.ps.memory=6g \
	--jars $SONA_SPARK_JARS\
	--name "LR-spark-on-angel" \
	--files /data/angel/sona-0.1.0-bin/jsons/logreg.json \
	--driver-memory 10g \
	--num-executors 10 \
	--executor-cores 2 \
	--executor-memory 4g \
	--class org.apache.spark.angel.examples.JsonRunnerExamples \
	./../lib/angelml-${SONA_VERSION}.jar \
	data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \
	modelPath:viewfs://hadoop-bd/user/deepthought/test/output \
	jsonFile:./lr.json \
	lr:0.1

and my spark-on-angel-env.sh:

export JAVA_HOME=/usr
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6
export SONA_HOME=/data/angel/sona-0.1.0-bin
export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin
export SONA_VERSION=0.1.0
export ANGEL_VERSION=3.0.1
export ANGEL_UTILS_VERSION=0.1.1
export ANGEL_MLCORE_VERSION=0.1.2

...<not changed default content below>...```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions