Topics

Kafka's pod in crashloobackoff during installation #casablanca #deploy

Signorelli Marco <marco.signorelli@...>
 

Hi all,

we are deploying Onap Casablanca release by following this guide:

https://docs.onap.org/en/casablanca/submodules/oom.git/docs/oom_quickstart_guide.html

After we run the installation command ( helm deploy dev ...), helm returns that all components have been deployed.

But when we run kubectl get pods -n onap, about 30 pods aren't in the running state, in particular:

onap          dev-dmaap-message-router-fb9f4bc7d-grhtt                           0/1       Init:0/1                           5          57m

onap          dev-dmaap-message-router-kafka-8c558bfc-l7854                 0/1       CrashLoopBackOff      15         57m

onap          dev-dmaap-message-router-zookeeper-bbd595c46-hdvhw  1/1       Running                          0          57m

onap          dev-sdnc-sdnc-dmaap-listener-84bffc54-sgntp                        0/1       Init:0/1                           1      2h

 

The message-router's pod kafka has the following log error:

 root@sb4-rancher:~# kubectl logs -n onap dev-dmaap-message-router-kafka-5fbc897f48-hsxlp

 [2019-02-04 16:06:06,166] INFO Initiating client connection, connectString=message-router-zookeeper:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@4313f5bc (org.apache.zookeeper.ZooKeeper)

[2019-02-04 16:06:06,199] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)

[2019-02-04 16:06:12,200] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) waiting for kafka to be ready

[2019-02-04 16:06:16,218] INFO Opening socket connection to server 10.42.8.38/10.42.8.38:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,240] INFO Socket connection established to 10.42.8.38/10.42.8.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,256] INFO Session establishment complete on server 10.42.8.38/10.42.8.38:2181, sessionid = 0x168b8e2e4280005, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,263] INFO Session: 0x168b8e2e4280005 closed (org.apache.zookeeper.ZooKeeper)

[2019-02-04 16:06:16,266] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server 'message-router-zookeeper:2181' with timeout of 6000 ms at

org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1233)

at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:157)

at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:131)

at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:103)

at kafka.utils.ZkUtils$.apply(ZkUtils.scala:85)

at kafka.server.KafkaServer.initZk(KafkaServer.scala:341)

at kafka.server.KafkaServer.startup(KafkaServer.scala:191)

at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)

at kafka.Kafka$.main(Kafka.scala:65)

at kafka.Kafka.main(Kafka.scala)

[2019-02-04 16:06:16,266] INFO EventThread shut down for session: 0x168b8e2e4280005 (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,271] INFO shutting down (kafka.server.KafkaServer)

[2019-02-04 16:06:16,285] INFO shut down completed (kafka.server.KafkaServer)

[2019-02-04 16:06:16,286] FATAL Exiting Kafka.  (kafka.server.KafkaServerStartable)

[2019-02-04 16:06:16,291] INFO shutting down (kafka.server.KafkaServer)

 

while zookeeper log is :

 2019-02-04 16:07:47,856 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.42.0.1:49336

2019-02-04 16:07:47,856 [myid:] – WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception  EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)

at java.lang.Thread.run(Thread.java:745)

2019-02-04 16:07:47,857 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.42.0.1:49336 (no session established for client)

2019-02-04 16:07:48,902 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.42.0.1:49360

2019-02-04 16:07:48,903 [myid:] – WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exceptionEndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

At org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

At org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)

at java.lang.Thread.run(Thread.java:745)

2019-02-04 16:07:48,903 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.42.0.1:49360 (no session established for client)

  

and zookeeper is listening on 0.0.0.0:2181 that I believe it is correct.

 

 The iptables of each k8s node has the following rule for kafka:

 -A KUBE-SERVICES -d 10.43.140.245/32 -p tcp -m comment --comment "onap/message-router-kafka:message-router-kafka  has no endpoints" -m tcp --dport 9092 -j REJECT  --reject-with icmp-port-unreachable

 

 We have also tried to change the zookeeper.connection.timeout params into

 oom/kubernetes/dmaap/charts/message-router/resources/config/dmaap/MsgRtrApi.properties

increasing it from 6000 ms to 60000 ms.

We have reinstalled the dmaap component but zookeeper does not update this config value and so we have the same issue

"Unable to connect to zookeeper server  'message-router-zookeeper:2181' with timeout of 6000 ms"

 

Did I change the wrong configuration file for zoookeeper?

 

Has anybody else experienced the same issue and was able to overcome it?

 

Any help will be greatly appreciated

 

Thanks in advance

Best regards,

 

Marco Signorelli

Michael O'Brien <frank.obrien@...>
 

Signorelli,

   Sorry to hear this.  Try the 3.0.0-ONAP tag for now until 3.0.1-ONAP is cut this week – 3.0.0 is highly stable since mid Dec – later move to 3.0.1

   The pods may need to be managed in both their order and startup timing depending on your undercloud.

   Follow https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-DeploymentIntegritybasedonPodDependencies

   And specifically the order in

https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n228

EPLOY_ORDER_POD_NAME_ARRAY=('consul msb dmaap dcaegen2 aaf robot aai esr multicloud oof so sdc sdnc vid policy portal log vfc uui vnfsdk appc clamp cli pomba vvp contrib sniro-emulator')

 

    In 3.0.0-ONAP both the MR and DR pods in dmaap will start

    For your sdnc failure – this is a normal pod dependency – SDNC is a good project that has defined its pod dependencies – it will not until DMAAP, SDC and Consul are up (in your case dmaap-mr)

https://wiki.onap.org/display/DW/Log+Streaming+Compliance+and+API#LogStreamingComplianceandAPI-DeploymentDependencyTree

 

     Try a subset of the above list to start and let us know

     Thank you

      /michael

 

From: onap-discuss@... <onap-discuss@...> On Behalf Of Signorelli Marco
Sent: Tuesday, February 5, 2019 10:40 AM
To: onap-discuss@...
Subject: [onap-discuss] Kafka's pod in crashloobackoff during installation #casablanca #deploy

 

Hi all,

we are deploying Onap Casablanca release by following this guide:

https://docs.onap.org/en/casablanca/submodules/oom.git/docs/oom_quickstart_guide.html

After we run the installation command ( helm deploy dev ...), helm returns that all components have been deployed.

But when we run kubectl get pods -n onap, about 30 pods aren't in the running state, in particular:

onap          dev-dmaap-message-router-fb9f4bc7d-grhtt                           0/1       Init:0/1                           5          57m

onap          dev-dmaap-message-router-kafka-8c558bfc-l7854                 0/1       CrashLoopBackOff      15         57m

onap          dev-dmaap-message-router-zookeeper-bbd595c46-hdvhw  1/1       Running                          0          57m

onap          dev-sdnc-sdnc-dmaap-listener-84bffc54-sgntp                        0/1       Init:0/1                           1      2h

 

The message-router's pod kafka has the following log error:

 root@sb4-rancher:~# kubectl logs -n onap dev-dmaap-message-router-kafka-5fbc897f48-hsxlp

 [2019-02-04 16:06:06,166] INFO Initiating client connection, connectString=message-router-zookeeper:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@4313f5bc (org.apache.zookeeper.ZooKeeper)

[2019-02-04 16:06:06,199] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)

[2019-02-04 16:06:12,200] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) waiting for kafka to be ready

[2019-02-04 16:06:16,218] INFO Opening socket connection to server 10.42.8.38/10.42.8.38:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,240] INFO Socket connection established to 10.42.8.38/10.42.8.38:2181, initiating session (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,256] INFO Session establishment complete on server 10.42.8.38/10.42.8.38:2181, sessionid = 0x168b8e2e4280005, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,263] INFO Session: 0x168b8e2e4280005 closed (org.apache.zookeeper.ZooKeeper)

[2019-02-04 16:06:16,266] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server 'message-router-zookeeper:2181' with timeout of 6000 ms at

org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1233)

at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:157)

at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:131)

at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:103)

at kafka.utils.ZkUtils$.apply(ZkUtils.scala:85)

at kafka.server.KafkaServer.initZk(KafkaServer.scala:341)

at kafka.server.KafkaServer.startup(KafkaServer.scala:191)

at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)

at kafka.Kafka$.main(Kafka.scala:65)

at kafka.Kafka.main(Kafka.scala)

[2019-02-04 16:06:16,266] INFO EventThread shut down for session: 0x168b8e2e4280005 (org.apache.zookeeper.ClientCnxn)

[2019-02-04 16:06:16,271] INFO shutting down (kafka.server.KafkaServer)

[2019-02-04 16:06:16,285] INFO shut down completed (kafka.server.KafkaServer)

[2019-02-04 16:06:16,286] FATAL Exiting Kafka.  (kafka.server.KafkaServerStartable)

[2019-02-04 16:06:16,291] INFO shutting down (kafka.server.KafkaServer)

 

while zookeeper log is :

 2019-02-04 16:07:47,856 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.42.0.1:49336

2019-02-04 16:07:47,856 [myid:] – WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception  EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)

at java.lang.Thread.run(Thread.java:745)

2019-02-04 16:07:47,857 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.42.0.1:49336 (no session established for client)

2019-02-04 16:07:48,902 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.42.0.1:49360

2019-02-04 16:07:48,903 [myid:] – WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exceptionEndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket

At org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

At org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)

at java.lang.Thread.run(Thread.java:745)

2019-02-04 16:07:48,903 [myid:] – INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.42.0.1:49360 (no session established for client)

  

and zookeeper is listening on 0.0.0.0:2181 that I believe it is correct.

 

 The iptables of each k8s node has the following rule for kafka:

 -A KUBE-SERVICES -d 10.43.140.245/32 -p tcp -m comment --comment "onap/message-router-kafka:message-router-kafka  has no endpoints" -m tcp --dport 9092 -j REJECT  --reject-with icmp-port-unreachable

 

 We have also tried to change the zookeeper.connection.timeout params into

 oom/kubernetes/dmaap/charts/message-router/resources/config/dmaap/MsgRtrApi.properties

increasing it from 6000 ms to 60000 ms.

We have reinstalled the dmaap component but zookeeper does not update this config value and so we have the same issue

"Unable to connect to zookeeper server  'message-router-zookeeper:2181' with timeout of 6000 ms"

 

Did I change the wrong configuration file for zoookeeper?

 

Has anybody else experienced the same issue and was able to overcome it?

 

Any help will be greatly appreciated

 

Thanks in advance

Best regards,

 

Marco Signorelli

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service