Re: APPC install error: It may not be safe to bootstrap the cluster from this node. #appc


Taka Cho
 

Hi,

 

The safe_to_bootstrap issue is because APPC is using persistent volume

 

You need to delete pvc, pv for appc items first, then rm the dev-appc directory under /dockerfs_data nfs mount, after you do helm undeploy – that would resolve that saft_to_bootstrap issue.

 

For anisble server, use ccsdk-ansible-server-image:0.4.1-STAGING-latest in value.yaml, since ansible container had issue in R3 Mtn release, I believe CCSDK will release the fix soon…

 

Taka

 

From: onap-discuss@... <onap-discuss@...> On Behalf Of jkzcristiano
Sent: Wednesday, March 27, 2019 10:40 AM
To: onap-discuss@...
Subject: [onap-discuss] APPC install error: It may not be safe to bootstrap the cluster from this node. #appc

 

Dear community,

after more than 2 months (with Casabalanca) APPC ansible server went to CrashLoopBackOff state. I tried to update its deployment without success. One day after, other APPC pods went to CrashLoopBackOff state too.

Now I am trying with a simple deployment that consists only of APPC service.

Below is the state of pods from Kubernetes dashboard:



Describing "dev-appc-appc-db-0":

ubuntu@rancher:~/oom/kubernetes$ kubectl describe pod/dev-appc-appc-db-0 -n onap

Name:           dev-appc-appc-db-0

Namespace:      onap

Node:           k8s-dev/10.0.0.31

Start Time:     Wed, 27 Mar 2019 14:16:01 +0000

Labels:         app=dev-appc-appc-db

                controller-revision-hash=dev-appc-appc-db-7cf88ff6b7

                statefulset.kubernetes.io/pod-name=dev-appc-appc-db-0

Annotations:    pod.alpha.kubernetes.io/initialized=true

Status:         Running

IP:             10.42.5.166

Controlled By:  StatefulSet/dev-appc-appc-db

Init Containers:

  mariadb-galera-prepare:

    Container ID:  docker://610ebd70b379ac6e7a223ba7f4dfbb7b6becf2a8965f0850ad8b9cf11f28b520

    Image:         nexus3.onap.org:10001/busybox

    Image ID:      docker-pullable://nexus3.onap.org:10001/busybox@sha256:4415a904b1aca178c2450fd54928ab362825e863c0ad5452fd020e92f7a6a47e

    Port:          <none>

    Host Port:     <none>

    Command:

      sh

      -c

      chown -R 27:27 /var/lib/mysql

    State:          Terminated

      Reason:       Completed

      Exit Code:    0

      Started:      Wed, 27 Mar 2019 14:16:14 +0000

      Finished:     Wed, 27 Mar 2019 14:16:14 +0000

    Ready:          True

    Restart Count:  0

    Environment:    <none>

    Mounts:

      /var/lib/mysql from dev-appc-appc-db-data (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from default-token-sbhz4 (ro)

Containers:

  appc-db:

    Container ID:   docker://03cfd676d76fc852dd7c75fb795e0223c731b4f4a22ee903f9784df70737d080

    Image:          nexus3.onap.org:10001/adfinissygroup/k8s-mariadb-galera-centos:v002

    Image ID:       docker-pullable://nexus3.onap.org:10001/adfinissygroup/k8s-mariadb-galera-centos@sha256:fbcb842f30065ae94532cb1af9bb03cc6e2acaaf896d87d0ec38da7dd09a3dde

    Ports:          3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP

    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP

    State:          Waiting

      Reason:       CrashLoopBackOff

    Last State:     Terminated

      Reason:       Error

      Exit Code:    1

      Started:      Wed, 27 Mar 2019 14:19:42 +0000

      Finished:     Wed, 27 Mar 2019 14:19:45 +0000

    Ready:          False

    Restart Count:  5

    Liveness:       exec [mysqladmin ping] delay=30s timeout=5s period=10s #success=1 #failure=3

    Readiness:      exec [/usr/share/container-scripts/mysql/readiness-probe.sh] delay=15s timeout=1s period=10s #success=1 #failure=3

    Environment:

      POD_NAMESPACE:        onap (v1:metadata.namespace)

      MYSQL_USER:           my-user

      MYSQL_PASSWORD:       <set to the key 'user-password' in secret 'dev-appc-appc-db'>  Optional: false

      MYSQL_DATABASE:       my-database

      MYSQL_ROOT_PASSWORD:  <set to the key 'db-root-password' in secret 'dev-appc-appc-db'>  Optional: false

    Mounts:

      /etc/localtime from localtime (ro)

      /var/lib/mysql from dev-appc-appc-db-data (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from default-token-sbhz4 (ro)

Conditions:

  Type              Status

  Initialized       True

  Ready             False

  ContainersReady   False

  PodScheduled      True

Volumes:

  dev-appc-appc-db-data:

    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

    ClaimName:  dev-appc-appc-db-data-dev-appc-appc-db-0

    ReadOnly:   false

  localtime:

    Type:          HostPath (bare host directory volume)

    Path:          /etc/localtime

    HostPathType:

  default-token-sbhz4:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  default-token-sbhz4

    Optional:    false

QoS Class:       BestEffort

Node-Selectors:  <none>

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:

  Type     Reason            Age              From               Message

  ----     ------            ----             ----               -------

  Warning  FailedScheduling  4m (x3 over 4m)  default-scheduler  pod has unbound PersistentVolumeClaims

  Normal   Scheduled         4m               default-scheduler  Successfully assigned onap/dev-appc-appc-db-0 to k8s-dev

  Normal   Pulling           4m               kubelet, k8s-dev   pulling image "nexus3.onap.org:10001/busybox"

  Normal   Pulled            4m               kubelet, k8s-dev   Successfully pulled image "nexus3.onap.org:10001/busybox"

  Normal   Created           4m               kubelet, k8s-dev   Created container

  Normal   Started           4m               kubelet, k8s-dev   Started container

  Normal   Pulled            3m (x4 over 4m)  kubelet, k8s-dev   Container image "nexus3.onap.org:10001/adfinissygroup/k8s-mariadb-galera-centos:v002" already present on machine

  Normal   Created           3m (x4 over 4m)  kubelet, k8s-dev   Created container

  Normal   Started           3m (x4 over 4m)  kubelet, k8s-dev   Started container

  Warning  BackOff           3m (x9 over 4m)  kubelet, k8s-dev   Back-off restarting failed container



And, logs from "appc-db" container:

+ CONTAINER_SCRIPTS_DIR=/usr/share/container-scripts/mysql

+ EXTRA_DEFAULTS_FILE=/etc/my.cnf.d/galera.cnf

+ '[' -z onap ']'

+ echo 'Galera: Finding peers'

Galera: Finding peers

++ hostname -f

++ cut -d. -f2

+ K8S_SVC_NAME=appc-dbhost

+ echo 'Using service name: appc-dbhost'

+ cp /usr/share/container-scripts/mysql/galera.cnf /etc/my.cnf.d/galera.cnf

Using service name: appc-dbhost

+ /usr/bin/peer-finder -on-start=/usr/share/container-scripts/mysql/configure-galera.sh -service=appc-dbhost

2019/03/27 15:27:45 Peer list updated

was []

now [dev-appc-appc-db-0.appc-dbhost.onap.svc.cluster.local]

2019/03/27 15:27:45 execing: /usr/share/container-scripts/mysql/configure-galera.sh with stdin: dev-appc-appc-db-0.appc-dbhost.onap.svc.cluster.local

2019/03/27 15:27:45

2019/03/27 15:27:46 Peer finder exiting

+ '[' '!' -d /var/lib/mysql/mysql ']'

+ exec mysqld

2019-03-27 15:27:46 139879155882240 [Note] mysqld (mysqld 10.1.24-MariaDB) starting as process 1 ...

2019-03-27 15:27:47 139879155882240 [Note] WSREP: Read nil XID from storage engines, skipping position init

2019-03-27 15:27:47 139879155882240 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'

2019-03-27 15:27:47 139879155882240 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy &lt;info@...&gt; loaded successfully.

2019-03-27 15:27:47 139879155882240 [Note] WSREP: CRC-32C: using hardware acceleration.

2019-03-27 15:27:47 139879155882240 [Note] WSREP: Found saved state: 84b0f5c0-12b6-11e9-a817-1b6ad3281ac6:-1, safe_to_bootsrap: 0

2019-03-27 15:27:47 139879155882240 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = dev-appc-appc-db-0.appc-dbhost.onap.svc.cluster.local; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_

2019-03-27 15:27:47 139879155882240 [Note] WSREP: GCache history reset: old(84b0f5c0-12b6-11e9-a817-1b6ad3281ac6:0) -&gt; new(84b0f5c0-12b6-11e9-a817-1b6ad3281ac6:-1)

2019-03-27 15:27:47 139879155882240 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

2019-03-27 15:27:47 139879155882240 [Note] WSREP: wsrep_sst_grab()

2019-03-27 15:27:47 139879155882240 [Note] WSREP: Start replication

2019-03-27 15:27:47 139879155882240 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

2019-03-27 15:27:47 139879155882240 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .

2019-03-27 15:27:47 139879155882240 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7

2019-03-27 15:27:47 139879155882240 [ERROR] Aborting


Seems it is related to "safe_to_bootstrap" feature (some info here). Not sure how to change this parameter and why now deploying APPC has this issue (I am using same images, same environment, everything; just deploy/undeploy).

Would appreciate some help!

Kind regards,
Xoan

Join onap-discuss@lists.onap.org to automatically receive all group messages.