DCAE deployment in R2 (oom or heat)
Michal Ptacek
Hi,
will it be possible to fully deploy DCAE using OOM in R2 ?
It seems that trend is to move to containers from VM's, also DCAE is going into this direction. what I heard here https://wiki.onap.org/display/DW/Meetings?preview=/13598723/31982711/dcae-weekly-20180503.mp4 is that currently DCAE can be spawned on single bootstrap VM (8G-16G of RAM) and all components are running as docker containers, also it should be possible to deploy it fully using OOM. I tried today to deploy latest ONAP in OOM (multinode, single node is not possible anymore with 110 pods per k8s host limitation) but I see just following dcae pods ....
onap beijing-dcae-cloudify-manager-fb9f5d6bd-bss2n 1/1 Running 0 4h
where is the rest ? please advise
have a nice weekend thanks, Michal
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Lusheng Ji
Michal,
toggle quoted message
Show quoted text
When the DCAE is fully deployed by OOM, there should be 8 additional pods deployed. They may be under a different namespace “dcae”, depends on configuration. You may check it out in place, for example the Integration-Jenkins tenant of Intel/Windriver
lab.
Moreover, additional components can be deployed at operation time by CLAMP.
Lusheng
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Michal Ptacek
Thanks Lusheng for your hints, I see some trace of dcae now, but it's in "Error syncing pod" state dcae dep-config-binding-service-7b9cfb76b8-l75mp 0/2 ContainerCreating 0 6h need to troubleshoot this further ....
Michal --------- Original Message ---------
Sender : JI, LUSHENG (LUSHENG) <lji@...> Date : 2018-05-04 19:36 (GMT+1) Title : Re: [onap-discuss] DCAE deployment in R2 (oom or heat) To : Michal Ptacek<m.ptacek@...> CC : null<onap-discuss@...>
Michal,
When the DCAE is fully deployed by OOM, there should be 8 additional pods deployed. They may be under a different namespace “dcae”, depends on configuration. You may check it out in place, for example the Integration-Jenkins tenant of Intel/Windriver lab.
Moreover, additional components can be deployed at operation time by CLAMP.
Lusheng
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Lusheng Ji
In our experience, this is likely due to the deployment of this particular component, dep-config-binding-service-7b9cfb76b8-l75mp,
timed out.
toggle quoted message
Show quoted text
And this is usually due to docker image pulling taking too long. We are working on a work around for this in the DCAE bootstrap container, ETA EoB today.
In the meanwhile, if you could prepull images before starting helm install, you could try to run the helm install with an additional parameter to override the default (which is alway pull), something like this:
helm install local/onap -n dev --namespace onap --set global.pullPolicy=IfNotPresent
to avoid pulling images...
There is a job, called dcae-bootstrap, that deploys these 8 pods. You can check its logs for details. Because it is a job, the pod is gone once it is done. You may have to use -a to see it.
Lusheng
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Michael O'Brien <Frank.Obrien@...>
Hi, an update – the OOM team and Lusheng (ssh into my cluster for example to ident my consul issue) have been working very hard getting DCAEGEN2 up – this morning for example Now remember this is environment specific – in my case my AWS cluster running with EFS/NFS with 3 x 30G R4.xlarge vms is underpowered with 12 vCores – I am bringing up a 2x and 4x system with 96 vCores that is more resilient and performant. However everything is up – except for 1 of the 6 redis containers due to the cluster itself. The dcae namespace pods come up separately via the cloudify container.
I am in the process of retrofitting those scripts to bring up a cluster instead of one single 128g VM (for 2 reasons) – we are past the 110 pod limit Mike/Roger alerted me to - (at 148 pods) and S3P features only run properly on a clustered system with an NFS share behind It – links at the end. I recommend enough memory for 1.3x requirements – which are around 90G – so that your cluster can survive one of the hosts reconnecting (all the pods would need to fit on an n-1 cluster) – so 128g ram across 4 nodes + 1 rancher-only node.
http://jenkins.onap.info/job/oom-cd-master/2863/
14:03:22 dcae dep-config-binding-service-7cccc757bd-gwsmk 2/2 Running 0 1h 10.42.10.144 ip-10-0-0-210.us-east-2.compute.internal 14:03:22 dcae dep-dcae-ves-collector-555b68fb89-8kg4r 2/2 Running 0 1h 10.42.14.108 ip-10-0-0-210.us-east-2.compute.internal 14:03:22 dcae dep-deployment-handler-54bbc89b7d-h4jfw 2/2 Running 0 1h 10.42.28.122 ip-10-0-0-8.us-east-2.compute.internal 14:03:22 dcae dep-inventory-69bfbf8d55-g6tpb 1/1 Running 0 1h 10.42.156.86 ip-10-0-0-66.us-east-2.compute.internal 14:03:22 dcae dep-policy-handler-5d988dc5f-ws7gb 2/2 Running 0 1h 10.42.151.70 ip-10-0-0-210.us-east-2.compute.internal 14:03:22 dcae dep-pstg-write-787c4bb65b-r4j5b 1/1 Running 0 1h 10.42.10.248 ip-10-0-0-8.us-east-2.compute.internal 14:03:22 dcae dep-sa1e86e6d2a4e4b43a755096bd19c4ed7-dcaegen2-analytics-tj26w4 2/2 Running 0 1h 10.42.91.86 ip-10-0-0-210.us-east-2.compute.internal 14:03:22 dcae dep-service-change-handler-548cc6c5f5-zd4hj 1/1 Running 2 1h 10.42.17.244 ip-10-0-0-66.us-east-2.compute.internal
0 1h 10.42.206.174 ip-10-0-0-210.us-east-2.compute.internal 14:03:22 onap onap-consul-767c54c595-9g2qg 1/1 Running 4 1h 10.42.242.86 ip-10-0-0-66.us-east-2.compute.internal 14:03:22 onap onap-consul-server-65c5bdf564-9zz67 1/1 Running 0 1h 10.42.201.55 ip-10-0-0-66.us-east-2.compute.internal 14:03:22 onap onap-dbcl-db-0 1/1 Running 1 1h 10.42.83.243 ip-10-0-0-66.us-east-2.compute.internal 14:03:22 onap onap-dbcl-db-1 1/1 Running 3 1h 10.42.221.10 ip-10-0-0-8.us-east-2.compute.internal 14:03:22 onap onap-dcae-bootstrap-x879p 0/1 Completed 0 1h 10.42.60.49 ip-10-0-0-8.us-east-2.compute.internal 14:03:22 onap onap-dcae-cloudify-manager-854dbcdb4b-24dtb 1/1 Running 0 1h 10.42.70.252 ip-10-0-0-210.us-east-2.compute.internal 14:03:23 onap onap-dcae-db-0 1/1 Running 0 1h 10.42.46.6 ip-10-0-0-66.us-east-2.compute.internal 14:03:23 onap onap-dcae-db-1 1/1 Running 1 1h 10.42.125.54 ip-10-0-0-210.us-east-2.compute.internal 14:03:23 onap onap-dcae-healthcheck-7779c4d877-nf754 1/1 Running 0 1h 10.42.213.124 ip-10-0-0-8.us-east-2.compute.internal 14:03:23 onap onap-dcae-redis-0 1/1 Running 0 1h 10.42.165.40 ip-10-0-0-8.us-east-2.compute.internal 14:03:23 onap onap-dcae-redis-1 1/1 Running 0 1h 10.42.186.109 ip-10-0-0-210.us-east-2.compute.internal 14:03:23 onap onap-dcae-redis-2 1/1 Running 0 1h 10.42.45.188 ip-10-0-0-66.us-east-2.compute.internal 14:03:23 onap onap-dcae-redis-3 1/1 Running 0 1h 10.42.163.26 ip-10-0-0-66.us-east-2.compute.internal 14:03:23 onap onap-dcae-redis-4 1/1 Running 0 50m 10.42.212.250 ip-10-0-0-210.us-east-2.compute.internal 14:03:23 onap onap-dcae-redis-5 0/1 CrashLoopBackOff 13 50m 10.42.144.148 ip-10-0-0-8.us-east-2.compute.internal
Thank you /michael
From: onap-discuss-bounces@... [mailto:onap-discuss-bounces@...]
On Behalf Of JI, LUSHENG (LUSHENG)
Sent: Friday, May 4, 2018 2:36 PM To: m.ptacek@... Cc: onap-discuss@... Subject: Re: [onap-discuss] DCAE deployment in R2 (oom or heat)
Michal,
When the DCAE is fully deployed by OOM, there should be 8 additional pods deployed. They may be under a different namespace “dcae”, depends on configuration. You may check it out in place, for example the Integration-Jenkins tenant of Intel/Windriver lab.
Moreover, additional components can be deployed at operation time by CLAMP.
Lusheng
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Lusheng Ji
Michael,
toggle quoted message
Show quoted text
Thank you very much for your help!
I tried to get in your system about 10 minutes ago to check on the 6th-redis server but noticed that you had redeployed the system, — only seeing 3 redis containers and the 4th one in ContainerCreating stage. So I waited a while and checked again.
Now you have a full cluster running. Congrats!
An extra reference point for resource requirements, I am running a K8S cluster of 4 m2.xlarge VMs (8 core/32G each) with a full ONAP (head of OOM master branch as of last night). All pods in running state except for 1. The Rancher UI (10.12.5.79:8080)
reports CPU usage about 8 cores, and memory usage of 112G.
Thanks,
Lusheng
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Michal Ptacek
Good job guys, I am also having some progress and see nearly whole environment up (inc. DCAE),
NAME READY STATUS RESTARTS AGE
the only remaining component (except aaf - OOM-324), which is having some problem is AAI in mine case (it looks like cassandra core dumped, just analyzing, some dns problem),
I would really appreciate if someone can please comment on following: 1) how to recover/redeploy "sick" component only. In amsterdam each component can be redeployed easily as it was in separated environment. now we have usually single environment (when deployed according to guide) # helm ls -a
2) I noticed that there are more errors when I am working on rpm based platform (e.g centos or rhel), even when using same versions of everything docker,kubectl,rancher,helm, ... I thought that rpm based support is more important in telco industry, what are the main reasons that we're primarily working on xenial ? (what I found so far is that it's because of rancher, but there is also same version of rancher available for centos/rhel)
thanks, Michal
--------- Original Message ---------
Sender : JI, LUSHENG (LUSHENG) <lji@...> Date : 2018-05-05 16:41 (GMT+1) Title : Re: [onap-discuss] DCAE deployment in R2 (oom or heat) To : null<Frank.Obrien@...> CC : Michal Ptacek<m.ptacek@...>, null<onap-discuss@...>
Michael,
Thank you very much for your help!
I tried to get in your system about 10 minutes ago to check on the 6th-redis server but noticed that you had redeployed the system, — only seeing 3 redis containers and the 4th one in ContainerCreating stage. So I waited a while and checked again. Now you have a full cluster running. Congrats!
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Roger Maitland <Roger.Maitland@...>
Hi Michal,
Yes. OOM deploys a few containers including DCAE Cloudify Manager container (beijing-dcae-cloudify-manager-fb9f5d6bd-bss2n) which brings up other containers by communicating directly with K8s. DCAE health check has passed in an OOM deployment and it is being used for Integration testing. Note that DCAE might be deploying containers into another K8s namespace so they might not be visible depending on how you did your query.
Cheers,
From: <onap-discuss-bounces@...> on behalf of Michal Ptacek <m.ptacek@...>
Hi,
will it be possible to fully deploy DCAE using OOM in R2 ?
It seems that trend is to move to containers from VM's, also DCAE is going into this direction. what I heard here https://wiki.onap.org/display/DW/Meetings?preview=/13598723/31982711/dcae-weekly-20180503.mp4 is that currently DCAE can be spawned on single bootstrap VM (8G-16G of RAM) and all components are running as docker containers, also it should be possible to deploy it fully using OOM. I tried today to deploy latest ONAP in OOM (multinode, single node is not possible anymore with 110 pods per k8s host limitation) but I see just following dcae pods ....
onap beijing-dcae-cloudify-manager-fb9f5d6bd-bss2n 1/1 Running 0 4h
where is the rest ? please advise
have a nice weekend thanks, Michal
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|