Main Page

From Essential
Jump to navigation Jump to search

Welcome to my experimental WIKI.

CLOUD LAB

I want to share my LAB project.

Infocepo.drawio.png

INFRA audit

I made ServerDiff.sh script to audit servers. You can track configuration drift. You can check if your environments are the same.

CLOUD migration example

  • 1.5 days: physical and virtual target CLOUD architecture diagram
  • 1.5 days: physical compliance of 2 CLOUD (6 hypervisors, 6TB memory)
  • 1 days: installation of the 2 CLOUD
  • .5 day: stability check
ACTION RESULT OK/NOK
Disable all nodes minus one. (maintenance mode) All resources are started.
Activate all nodes. Power off a different node from the previous test. All resources are started.
Power off simultaneous all nodes. Power on simultaneous all nodes. All resources are started.
  • 1.5 days: CLOUD automation study
  • 1.5 days: 6 templates (2 CLOUD, 2 OS, 8 environments, 2 versions)
  • 1 day: migration diagram

Diagram-migration-ORACLE-KVM-v2.drawio.png

  • 1.5 days: process stabilization
  • 1.5 days: CLOUD benchmark vs old INFRA
  • .5 days: calibration of unavailability time per unit migration
  • 5 minutes (effective load): 82 VM (env, os, application_code, 2 IP)
Total = 15 man-days

CLOUD improvement

WebModelDiagram.drawio.png

  • Formalize your infrastructure as much as possible for more flexibility, low complexity and less technology lock-in.
  • Use a name server able to handle the position of your customers like GDNS.
  • Use a minimal instance and use a network load balancer like LVS. Monitor the global load of your instances and add/delete dynamically as needed.
  • Or, many providers have dynamic computing services. Compare the prices. But take care about the technology lock-in.
  • Use a very efficient TLS decoder like the ATS decoder without blocking.
  • Use very fast http cache like VARNISH.
  • Use a big cache for big files like ATS.
  • ...
  • Use serverless service for standard runtimes like Java, Python and PHP. But beware of certain incompatibilities and a lack of consistency over time.
  • ...
  • Each time you need dynamic computing power think about load balancing or native service from the providers (caution about providers services!)
  • ...
  • Try to use open source STACKs as much as possible
  • ...
  • Use cache for your databases like MEMCACHED

CLOUD vs HW

Function KUBERNETES OPENSTACK AWS Bare-metal HPC CRM OVIRT
DEPLOY HELM/ANSIBLE/SH TERRAFORM/ANSIBLE/SH/JUJU TERRAFORM/CLOUDFOUNDATION/ANSIBLE/JUJU ANSIBLE/SH XCAT/CLUSH ANSIBLE/SH ANSIBLE/PYTHON/SH
BOOTSTRAP API/CLI PXE/API/CLI API/CLI PXE/IPMI PXE/IPMI PXE/IPMI PXE/API
Router API/CLI (kube-router) API/CLI (router/subnet) API/CLI (Route table/subnet) LINUX/OVS/external XCAT/external LINUX/external API
Firewall INGRESS/EGRESS/ISTIO API/CLI (Security groups) API/CLI (Security group) LINUX (NFT) LINUX (NFT) LINUX (NFT) API
Vlan DANM API/CLI (VPC) API/CLI (VPC) OVS/LINUX/external XCAT/external LINUX/external API
Name server coredns dns-nameserver Amazon Route 53 GDNS XCAT LINUX/external API/external
Load balancer kube-proxy/LVS(IPVS) LVS Network Load Balancer LVS SLURM Ldirectord
Storage many SWIFT/CINDER/NOVA S3/EFS/FSX/EBS OPENSTACK SWIFT/XFS/EXT4/RAID10 GPFS SAN NFS/SAN

CLOUD REF

CLOUD REF

CLOUD providers

Infrastructure example

IT salaries

REDHAT package browser

REDHAT package browser

HA COROSYNC+PACEMAKER

Typical architecture

  • 2 rooms
  • 2 power supply
  • 2FC / server (active/active) (SAN)
  • 2*10Gbit/s ethernet / server (active/passive, possible active/active if PXE on native VLAN 0)
  • IPMI VLAN (for the fence)
  • VLAN ADMIN which must be the native VLAN if BOOTSTRAP by PXE (admin, provisioning, heartbeat)
  • USER VLAN (application services)
  • NTP
  • DNS+DHCP+PXE+TFTP+HTTP for auto-provisioning
  • PROXY (for update or otherwise internal REPOSITORY)
  • Choose between 2 or more node clusters.
  • For a 2-node architecture, you need a 2-node configuration on COROSYNC and make sure to configure a 10-second staggered closing for one of the nodes (otherwise, an unstable cluster results).
  • Resources are stateless.

For DB resources it is necessary to provide 4GB per base in general and double for a cluster with 2 nodes (loss of one node). For CPU resources, as a rule there are no big requirements. Tip, for time-critical compressions, use PZSTD.

Typical service pattern

  • MULTIPATH
  • LUN
  • LVM (LVM resource)
  • FS (FS resource)
  • NFS (FS resource)
  • USER
  • IP (IP resource)
  • DNS name
  • PROCESS (PROCESS resource)
  • LISTENER (LISTENER resource)