Main Page: Difference between revisions

From Essential
Jump to navigation Jump to search
No edit summary
 
(272 intermediate revisions by the same user not shown)
Line 1: Line 1:
Welcome to my experimental WIKI.
[[File:Infocepo-picture.png|thumb|right|Discover cloud and AI on infocepo.com]]


I provide here a list of topics related to cloud computing and information technology. Some of the topics mentioned include cloud lab, infra audit, cloud migration, cloud improvement, cloud providers, cloud mapping, infrastructure examples, IT salaries, and high availability using Corosync and Pacemaker.
= infocepo.com – Cloud, AI & Labs =


I'm discussing various tools and technologies that can be used in cloud computing, such as Kubernetes, OpenStack, AWS, and Open Virtualization (OVIRT). I also mention the importance of using open source software and minimizing technology lock-in.
Welcome to the '''infocepo.com''' portal.


I'm discussing the process of migrating to the cloud, including the steps involved in preparing for the migration, as well as the tools and technologies that can be used to automate the process. I also mention the importance of considering the cost and benefits of different cloud providers and the need to carefully evaluate the trade-offs between different options.
This wiki is intended for system administrators, cloud engineers, developers, students, and enthusiasts who want to:


Overall, I'm discussing the various factors that need to be considered when designing and implementing a cloud computing infrastructure. I hope this information is helpful! If you have any specific questions about any of the topics mentioned, please don't hesitate to ask.
* Understand modern architectures (Kubernetes, OpenStack, bare-metal, HPC…)
* Deploy private AI assistants and productivity tools
* Build hands-on labs to learn by doing
* Prepare large-scale audits, migrations, and automations


<br>
The goal: turn theory into '''reusable scripts, diagrams, and architectures'''.
==[https://openai.com/ AI tools]==
==CLOUD LAB==
I want to share my [[LAB project]].<br>
<br>
[[file:Infocepo.drawio.png]]
==INFRA audit==
I made [[ServerDiff.sh]] script to audit servers.
You can track configuration drift.
You can check if your environments are the same.


==CLOUD migration example==
__TOC__
*1.5 days: infra audit (82 clustered services) ([https://infocepo.com/wiki/index.php/ServerDiff.sh audit own tool])


*1.5 days: physical and virtual target CLOUD architecture diagram
----


*1.5 days: physical compliance of 2 CLOUD (6 hypervisors, 6TB memory)
= Getting started quickly =


*1 days: installation of the 2 CLOUD
== Recommended paths ==


*.5 day: stability check
; 1. Build a private AI assistant
{| style="border-spacing:0;width:18.12cm;"
* Deploy a typical stack: '''Open WebUI + Ollama + GPU''' (H100 or consumer-grade GPU)
|- style="background-color:#ffc000;border:0.05pt solid #000000;padding:0.049cm;"
* Add a chat model and a summarization model
| align=center style="color:#000000;" | '''ACTION'''
* Integrate internal data (RAG, embeddings)
| align=center style="color:#000000;" | '''RESULT'''
 
| align=center style="color:#000000;" | '''OK/KO'''
; 2. Launch a Cloud lab
* Create a small cluster (Kubernetes, OpenStack, or bare-metal)
* Set up a deployment pipeline (Helm, Ansible, Terraform…)
* Add an AI service (transcription, summarization, chatbot…)
 
; 3. Prepare an audit / migration
* Inventory servers with '''ServerDiff.sh'''
* Design the target architecture (cloud diagrams)
* Automate the migration with reproducible scripts
 
== Content overview ==
 
* '''AI guides & tools''' : assistants, models, evaluations, GPUs
* '''Cloud & infrastructure''' : HA, HPC, web-scale, DevSecOps
* '''Labs & scripts''' : audit, migration, automation
* '''Comparison tables''' : Kubernetes vs OpenStack vs AWS vs bare-metal, etc.
 
----
 
= future =
[[File:Automation-full-vs-humans.png|thumb|right|The world after automation]]
 
= AI Assistants & Cloud Tools =
 
== AI Assistants ==
 
; '''ChatGPT'''
* https://chatgpt.com ChatGPT – Public conversational assistant, suited for exploration, writing, and rapid experimentation.
 
; '''Self-hosted AI assistants'''
* https://github.com/open-webui/open-webui Open WebUI + https://www.scaleway.com/en/h100-pcie-try-it-now/ H100 GPU + https://ollama.com Ollama 
: Typical stack for private assistants, self-hosted LLMs, and OpenAI-compatible APIs.
* https://github.com/ynotopec/summarize Private summary – Local, fast, offline summarizer for your own data.
 
== Development, models & tracking ==
 
; '''Discovering and tracking models'''
* https://ollama.com/library LLM Trending – Model library (chat, code, RAG…) for local deployment.
* https://huggingface.co/models Models Trending – Model marketplace, filterable by task, size, and license.
* https://huggingface.co/models?pipeline_tag=image-text-to-text&sort=trending Img2txt Trending – Vision-language models (image → text).
* https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena Txt2img Evaluation – Image generation model comparisons.
 
; '''Evaluation & benchmarks'''
* https://lmarena.ai/leaderboard ChatBot Evaluation – Chatbot rankings (open-source and proprietary models).
* https://huggingface.co/spaces/mteb/leaderboard Embedding Leaderboard – Benchmark of embedding models for RAG and semantic search.
* https://ann-benchmarks.com Vectors DB Ranking – Vector database comparison (latency, memory, features).
* https://top500.org/lists/green500/ HPC Efficiency – Ranking of the most energy-efficient supercomputers.
 
; '''Development & fine-tuning tools'''
* https://github.com/search?q=stars%3A%3E15000+forks%3A%3E1500+created%3A%3E2022-06-01&type=repositories&s=updated&o=desc Project Trending – Major recent open-source projects, sorted by popularity and activity.
* https://github.com/hiyouga/LLaMA-Factory LLM Fine Tuning – Advanced framework for LLM fine-tuning (instruction tuning, LoRA, etc.).
* https://www.perplexity.ai Perplexity AI – Advanced research and synthesis oriented as a “research copilot”.
 
== AI Hardware & GPUs ==
 
; '''GPUs & accelerators'''
* https://www.nvidia.com/en-us/data-center/h100/ NVIDIA H100 – Datacenter GPU for Kubernetes clusters and intensive AI workloads.
* NVIDIA 5080 – Consumer GPU for lower-cost private LLM deployments.
* https://www.mouser.fr/ProductDetail/BittWare/RS-GQ-GC1-0109?qs=ST9lo4GX8V2eGrFMeVQmFw%3D%3D GROQ LLM accelerator – Hardware accelerator dedicated to LLM inference.
 
----
 
= Open models & internal endpoints =
 
''(Last update: 2026-02-13)''
 
The models below correspond to '''logical endpoints''' (for example via a proxy or gateway), selected for specific use cases.
 
{| class="wikitable"
! Endpoint !! Description / Primary use case
|-
| '''ai-chat''' || Based on '''gpt-oss-20b''' – General-purpose chat, good cost / quality balance.
|-
| '''ai-translate''' || gpt-oss-20b, temperature = 0 – Deterministic, reproducible translation (FR, EN, other languages).
|-
| '''ai-summary''' || qwen3 – Model optimized for summarizing long texts (reports, documents, transcriptions).
|-
| '''ai-code''' || gpt-oss-20b – Code reasoning, explanation, and refactoring.
|-
|-
| style="border:0.05pt solid #000000;padding:0.049cm;color:#000000;" | Activate maintenance for n/2-1 nodes or 1 node if 2 nodes.
| '''ai-code-completion''' || gpt-oss-20b – Fast code completion, designed for IDE auto-completion.
| style="border:0.05pt solid #000000;padding:0.049cm;color:#000000;" | All resources are started.
| style="background-color:#d8e4bc;border:0.05pt solid #000000;padding:0.049cm;color:#000000;" |
|-
|-
| style="border:0.05pt solid #000000;padding:0.049cm;color:#000000;" | Un-maintenance all nodes. Power off n/2-1 nodes or 1 node if 2 nodes, different from the previous test.
| '''ai-parse''' || qwen3 – Structured extraction, log / JSON / table parsing.
| style="border:0.05pt solid #000000;padding:0.049cm;color:#000000;" | All resources are started.
| style="background-color:#d8e4bc;border:0.05pt solid #000000;padding:0.049cm;color:#000000;" |
|-
|-
| style="border:0.05pt solid #000000;padding:0.049cm;color:#000000;" | Power off simultaneous all nodes. Power on simultaneous all nodes.
| '''ai-RAG-FR''' || qwen3 – RAG usage in French (business knowledge, internal FAQs).
| style="border:0.05pt solid #000000;padding:0.049cm;color:#000000;" | All resources are started.
| style="background-color:#d8e4bc;border:0.05pt solid #000000;padding:0.049cm;color:#000000;" |
|-
|-
| '''gpt-oss-20b''' || Agentic tasks.
|}
|}
*1.5 days: CLOUD automation study


*1.5 days: 6 templates (2 CLOUD, 2 OS, 8 environments, 2 versions)
Usage idea: each endpoint is associated with one or more labs (chat, summary, parsing, RAG, etc.) in the Cloud Lab section.


*1 day: migration diagram
----
[[File:Diagram-migration-ORACLE-KVM-v2.drawio.png]]


*1.5 days: 138 lines of industrialization code for migration ([https://infocepo.com/wiki/index.php/MigrationApp.sh migration own code])
= News & Trends =


*1.5 days: process stabilization
* https://www.youtube.com/@lev-selector/videos Top AI News – Curated AI news videos.
* https://betterprogramming.pub/color-your-captions-streamlining-live-transcriptions-with-diart-and-openais-whisper-6203350234ef Real-time transcription with Diart + Whisper – Example of real-time transcription with speaker detection.
* https://github.com/openai-translator/openai-translator OpenAI Translator – Modern extension / client for LLM-assisted translation.
* https://opensearch.org/docs/latest/search-plugins/conversational-search Opensearch with LLM – Conversational search based on LLMs and OpenSearch.


*1.5 days: CLOUD benchmark vs old INFRA
----


*.5 days: calibration of unavailability time per unit migration
= Training & Learning =


*5 minutes (effective load): 82 VM (env, os, application_code, 2 IP)
* https://www.youtube.com/watch?v=4Bdc55j80l8 Transformers Explained – Introduction to Transformers, the core architecture of LLMs.
* Hands-on labs, scripts, and real-world feedback in the [[LAB project|CLOUD LAB]] project below.


Total = 15 man-days
----
 
= Cloud Lab & Audit Projects =
 
[[File:Infocepo.drawio.png|400px|Cloud Lab reference diagram]]
 
The '''Cloud Lab''' provides reproducible scenarios: infrastructure audits, cloud migration, automation, high availability.
 
== Audit project – Cloud Audit ==
 
; '''[[ServerDiff.sh]]'''
Bash audit script to:
 
* detect configuration drift,
* compare multiple environments,
* prepare a migration or remediation plan.
 
== Example of Cloud migration ==
 
[[File:Diagram-migration-ORACLE-KVM-v2.drawio.png|400px|Cloud migration diagram]]
 
Example: migration of virtual environments to a modernized cloud, including audit, architecture design, and automation.
 
{| class="wikitable"
! Task !! Description !! Duration (days)
|-
| Infrastructure audit || 82 services, automated audit via '''ServerDiff.sh''' || 1.5
|-
| Cloud architecture diagram || Visual design and documentation || 1.5
|-
| Compliance checks || 2 clouds, 6 hypervisors, 6 TB of RAM || 1.5
|-
| Cloud platform installation || Deployment of main target environments || 1.0
|-
| Stability verification || Early functional tests || 0.5
|-
| Automation study || Identification and automation of repetitive tasks || 1.5
|-
| Template development || 6 templates, 8 environments, 2 clouds / OS || 1.5
|-
| Migration diagram || Illustration of the migration process || 1.0
|-
| Migration code writing || 138 lines (see '''MigrationApp.sh''') || 1.5
|-
| Process stabilization || Validation that migration is reproducible || 1.5
|-
| Cloud benchmarking || Performance comparison vs legacy infrastructure || 1.5
|-
| Downtime tuning || Calculation of outage time per migration || 0.5
|-
| VM loading || 82 VMs: OS, code, 2 IPs per VM || 0.1
|-
! colspan=2 align="right"| '''Total''' !! 15 person-days
|}


==CLOUD improvement==
=== Stability checks (minimal HA) ===
[[File:WebModelDiagram.drawio.png]]
*Formalize your infrastructure as much as possible for more flexibility, low complexity and less technology lock-in.
*Use a name server able to handle the position of your customers like GDNS.
*Use a minimal instance and use a network load balancer like LVS. Monitor the global load of your instances and add/delete dynamically as needed.
*Or, many providers have dynamic computing services. Compare the prices. But take care about the technology lock-in.
*Use a very efficient TLS decoder like the HAPROXY decoder.
*Use very fast http cache like VARNISH.
*Use a big cache for big files like Apache Traffic Server.
*...
*Use a REVERSE PROXY with TLS decoder like ENVOY for more services compatibility.
*Use serverless service for standard runtimes like Java, Python and PHP. But beware of certain incompatibilities and a lack of consistency over time.
*...
*Each time you need dynamic computing power think about load balancing or native service from the providers (caution about providers services!)
*...
*Try to use open source STACKs as much as possible.
*...
*Use cache for your databases like MEMCACHED


==CLOUD vs HW==
{| class="wikitable"
{| class="wikitable"
|'''Function'''
! Action !! Expected result
|'''KUBERNETES'''
|'''OPENSTACK'''
|'''AWS'''
|'''Bare-metal'''
|'''HPC'''
|'''CRM'''
|'''OVIRT'''
|-
|-
|DEPLOY
| Shutdown of one node || All services must automatically restart on remaining nodes.
|HELM/ANSIBLE/SH
|TERRAFORM/ANSIBLE/SH/JUJU
|TERRAFORM/CLOUDFOUNDATION/ANSIBLE/JUJU
|ANSIBLE/SH
|XCAT/CLUSH
|ANSIBLE/SH
|ANSIBLE/PYTHON/SH
|-
|-
|BOOTSTRAP
| Simultaneous shutdown / restart of all nodes || All services must recover correctly after reboot.
|API/CLI
|}
|PXE/API/CLI
 
|API/CLI
----
|PXE/IPMI
 
|PXE/IPMI
= Web Architecture & Best Practices =
|PXE/IPMI
 
|PXE/API
[[File:WebModelDiagram.drawio.png|400px|Reference web architecture]]
 
Principles for designing scalable and portable web architectures:
 
* Favor '''simple, modular, and flexible''' infrastructure.
* Follow client location (GDNS or equivalent) to bring content closer.
* Use network load balancers (LVS, IPVS) for scalability.
* Systematically compare costs and beware of '''vendor lock-in'''.
* TLS:
** HAProxy for fast frontends,
** Envoy for compatibility and advanced use cases (mTLS, HTTP/2/3).
* Caching:
** Varnish, Apache Traffic Server for large content volumes.
* Favor open-source stacks and database caches (e.g., Memcached).
* Use message queues, buffers, and quotas to smooth traffic spikes.
* For complete architectures:
** https://wikitech.wikimedia.org/wiki/Wikimedia_infrastructure Wikimedia Cloud Architecture
** https://github.com/systemdesign42/system-design System Design GitHub
 
----
 
= Comparison of major Cloud platforms =
 
{| class="wikitable"
! Feature !! Kubernetes !! OpenStack !! AWS !! Bare-metal !! HPC !! CRM !! oVirt
|-
|-
|
| '''Deployment tools''' || Helm, YAML, ArgoCD, Juju || Ansible, Terraform, Juju || CloudFormation, Terraform, Juju || Ansible, Shell || xCAT, Clush || Ansible, Shell || Ansible, Python
|
|
|
|
|
|
|
|-
|-
|Router
| '''Bootstrap method''' || API || API, PXE || API || PXE, IPMI || PXE, IPMI || PXE, IPMI || PXE, API
|API/CLI (kube-router)
|API/CLI (router/subnet)
|API/CLI (Route table/subnet)
|LINUX/OVS/external
|XCAT/external
|LINUX/external
|API
|-
|-
|Firewall
| '''Router control''' || Kube-router || Router/Subnet API || Route Table / Subnet API || Linux, OVS || xCAT || Linux || API
|INGRESS/EGRESS/ISTIO
|API/CLI (Security groups)
|API/CLI (Security group)
|LINUX (NFT)
|LINUX (NFT)
|LINUX (NFT)
|API
|-
|-
|Vlan
| '''Firewall control''' || Istio, NetworkPolicy || Security Groups API || Security Group API || Linux firewall || Linux firewall || Linux firewall || API
|DANM
|API/CLI (VPC)
|API/CLI (VPC)
|OVS/LINUX/external
|XCAT/external
|LINUX/external
|API
|-
|-
|
| '''Network virtualization''' || VLAN, VxLAN, others || VPC || VPC || OVS, Linux || xCAT || Linux || API
|
|
|
|
|
|
|
|-
|-
|Name server
| '''DNS''' || CoreDNS || DNS-Nameserver || Route 53 || GDNS || xCAT || Linux || API
|coredns
|dns-nameserver
|Amazon Route 53
|GDNS
|XCAT
|LINUX/external
|API/external
|-
|-
|Load balancer
| '''Load Balancer''' || Kube-proxy, LVS || LVS || Network Load Balancer || LVS || SLURM || Ldirectord || N/A
|kube-proxy/LVS(IPVS)
|LVS
|Network Load Balancer
|LVS
|SLURM
|Ldirectord
|
|-
|-
|Storage
| '''Storage options''' || Local, Cloud, PVC || Swift, Cinder, Nova || S3, EFS, EBS, FSx || Swift, XFS, EXT4, RAID10 || GPFS || SAN || NFS, SAN
|many
|SWIFT/CINDER/NOVA
|S3/EFS/FSX/EBS
|OPENSTACK SWIFT/XFS/EXT4/RAID10
|GPFS
|SAN
|NFS/SAN
|}
|}


==[https://landscape.cncf.io/?fullscreen=yes CLOUD REF]==
This table serves as a starting point for choosing the right stack based on:
==[https://cloud.google.com/free/docs/aws-azure-gcp-service-comparison CLOUD providers]==
 
==[https://global-internet-map-2021.telegeography.com/ CLOUD map]==
* Desired level of control (API vs bare-metal),
==[https://wikitech.wikimedia.org/wiki/Wikimedia_infrastructure Infrastructure example]==
* Context (on-prem, public cloud, HPC, CRM…),
==IT salaries==
* Existing automation tooling.
*[http://jobsearchtech.about.com/od/educationfortechcareers/tp/HighestCerts.htm Best IT certifications]
 
*[https://www.silkhom.com/barometre-2021-des-tjm-dans-informatique-digital/ FREELANCE]
----
*[http://www.journaldunet.com/solutions/emploi-rh/salaire-dans-l-informatique-hays/ IT]
 
= Useful Cloud & IT links =
 
* https://cloud.google.com/free/docs/aws-azure-gcp-service-comparison Cloud Providers Compared – AWS / Azure / GCP service mapping.
* https://global-internet-map-2021.telegeography.com/ Global Internet Topology Map – Global Internet mapping.
* https://landscape.cncf.io/?fullscreen=yes CNCF Official Landscape – Overview of cloud-native projects (CNCF).
* https://wikitech.wikimedia.org/wiki/Wikimedia_infrastructure Wikimedia Cloud Wiki – Wikimedia infrastructure, real large-scale example.
* https://openapm.io OpenAPM – SRE Tools – APM / observability tooling.
* https://access.redhat.com/downloads/content/package-browser RedHat Package Browser – Package and version search at Red Hat.
* https://www.silkhom.com/barometre-2021-des-tjm-dans-informatique-digital Barometer of IT freelance daily rates.
* https://www.glassdoor.fr/salaire/Hays-Salaires-E10166.htm IT Salaries (Glassdoor) – Salary indicators.
 
----
 
= Advanced: High Availability, HPC & DevSecOps =
 
== High Availability with Corosync & Pacemaker ==
 
[[File:HA-REF.drawio.png|400px|HA cluster architecture]]
 
Basic principles:
 
* Multi-node or multi-site clusters for redundancy.
* Use of IPMI for fencing, provisioning via PXE/NTP/DNS/TFTP.
* For a 2-node cluster:
  – carefully sequence fencing to avoid split-brain,
  – 3 or more nodes remain recommended for production.
 
=== Common resource patterns ===
 
* Multipath storage, LUNs, LVM, NFS.
* User resources and application processes.
* Virtual IPs, DNS records, network listeners.
 
== HPC ==
 
[[File:HPC.drawio.png|400px|Overview of an HPC cluster]]
 
* Job orchestration (SLURM or equivalent).
* High-performance shared storage (GPFS, Lustre…).
* Possible integration with AI workloads (large-scale training, GPU inference).
 
== DevSecOps ==


==[https://access.redhat.com/downloads/content/package-browser REDHAT package browser]==
[[File:DSO-POC-V3.drawio.png|400px|DevSecOps reference design]]
==HA COROSYNC+PACEMAKER==
===Typical architecture===


*2 rooms
* CI/CD pipelines with built-in security checks (linting, SAST, DAST, SBOM).
*2 power supply
* Observability (logs, metrics, traces) integrated from design time.
*2FC / server (active/active) (SAN)
* Automated vulnerability scanning, secret management, policy-as-code.
*2*10Gbit/s ethernet / server (active/passive, possible active/active if PXE on native VLAN 0)
*IPMI VLAN (for the fence)
*VLAN ADMIN which must be the native VLAN if BOOTSTRAP by PXE (admin, provisioning, heartbeat)
*USER VLAN (application services)
*NTP
*DNS+DHCP+PXE+TFTP+HTTP for auto-provisioning
*PROXY (for update or otherwise internal REPOSITORY)


*Choose between 2 or more node clusters.
----


*For a 2-node architecture, you need a 2-node configuration on COROSYNC and make sure to configure a 10-second staggered closing for one of the nodes (otherwise, an unstable cluster results).
= About & Contributions =


*Resources are stateless.
For more examples, scripts, diagrams, and feedback, see:


For DB resources it is necessary to provide 4GB per base in general.
* https://infocepo.com infocepo.com
For CPU resources, as a rule there are no big requirements. Tip, for time-critical compressions, use PZSTD.


===Typical service pattern===
Suggestions for corrections, diagram improvements, or new labs are welcome. 
*MULTIPATH
This wiki aims to remain a '''living laboratory''' for AI, cloud, and automation.
*LUN
*LVM (LVM resource)
*FS (FS resource)
*NFS (FS resource)
*USER
*IP (IP resource)
*DNS name
*PROCESS (PROCESS resource)
*LISTENER (LISTENER resource)

Latest revision as of 01:24, 13 February 2026

Discover cloud and AI on infocepo.com

infocepo.com – Cloud, AI & Labs

Welcome to the infocepo.com portal.

This wiki is intended for system administrators, cloud engineers, developers, students, and enthusiasts who want to:

  • Understand modern architectures (Kubernetes, OpenStack, bare-metal, HPC…)
  • Deploy private AI assistants and productivity tools
  • Build hands-on labs to learn by doing
  • Prepare large-scale audits, migrations, and automations

The goal: turn theory into reusable scripts, diagrams, and architectures.


Getting started quickly

Recommended paths

1. Build a private AI assistant
  • Deploy a typical stack: Open WebUI + Ollama + GPU (H100 or consumer-grade GPU)
  • Add a chat model and a summarization model
  • Integrate internal data (RAG, embeddings)
2. Launch a Cloud lab
  • Create a small cluster (Kubernetes, OpenStack, or bare-metal)
  • Set up a deployment pipeline (Helm, Ansible, Terraform…)
  • Add an AI service (transcription, summarization, chatbot…)
3. Prepare an audit / migration
  • Inventory servers with ServerDiff.sh
  • Design the target architecture (cloud diagrams)
  • Automate the migration with reproducible scripts

Content overview

  • AI guides & tools : assistants, models, evaluations, GPUs
  • Cloud & infrastructure : HA, HPC, web-scale, DevSecOps
  • Labs & scripts : audit, migration, automation
  • Comparison tables : Kubernetes vs OpenStack vs AWS vs bare-metal, etc.

future

The world after automation

AI Assistants & Cloud Tools

AI Assistants

ChatGPT
  • https://chatgpt.com ChatGPT – Public conversational assistant, suited for exploration, writing, and rapid experimentation.
Self-hosted AI assistants
Typical stack for private assistants, self-hosted LLMs, and OpenAI-compatible APIs.

Development, models & tracking

Discovering and tracking models
Evaluation & benchmarks
Development & fine-tuning tools

AI Hardware & GPUs

GPUs & accelerators

Open models & internal endpoints

(Last update: 2026-02-13)

The models below correspond to logical endpoints (for example via a proxy or gateway), selected for specific use cases.

Endpoint Description / Primary use case
ai-chat Based on gpt-oss-20b – General-purpose chat, good cost / quality balance.
ai-translate gpt-oss-20b, temperature = 0 – Deterministic, reproducible translation (FR, EN, other languages).
ai-summary qwen3 – Model optimized for summarizing long texts (reports, documents, transcriptions).
ai-code gpt-oss-20b – Code reasoning, explanation, and refactoring.
ai-code-completion gpt-oss-20b – Fast code completion, designed for IDE auto-completion.
ai-parse qwen3 – Structured extraction, log / JSON / table parsing.
ai-RAG-FR qwen3 – RAG usage in French (business knowledge, internal FAQs).
gpt-oss-20b Agentic tasks.

Usage idea: each endpoint is associated with one or more labs (chat, summary, parsing, RAG, etc.) in the Cloud Lab section.


News & Trends


Training & Learning


Cloud Lab & Audit Projects

Cloud Lab reference diagram

The Cloud Lab provides reproducible scenarios: infrastructure audits, cloud migration, automation, high availability.

Audit project – Cloud Audit

ServerDiff.sh

Bash audit script to:

  • detect configuration drift,
  • compare multiple environments,
  • prepare a migration or remediation plan.

Example of Cloud migration

Cloud migration diagram

Example: migration of virtual environments to a modernized cloud, including audit, architecture design, and automation.

Task Description Duration (days)
Infrastructure audit 82 services, automated audit via ServerDiff.sh 1.5
Cloud architecture diagram Visual design and documentation 1.5
Compliance checks 2 clouds, 6 hypervisors, 6 TB of RAM 1.5
Cloud platform installation Deployment of main target environments 1.0
Stability verification Early functional tests 0.5
Automation study Identification and automation of repetitive tasks 1.5
Template development 6 templates, 8 environments, 2 clouds / OS 1.5
Migration diagram Illustration of the migration process 1.0
Migration code writing 138 lines (see MigrationApp.sh) 1.5
Process stabilization Validation that migration is reproducible 1.5
Cloud benchmarking Performance comparison vs legacy infrastructure 1.5
Downtime tuning Calculation of outage time per migration 0.5
VM loading 82 VMs: OS, code, 2 IPs per VM 0.1
Total 15 person-days

Stability checks (minimal HA)

Action Expected result
Shutdown of one node All services must automatically restart on remaining nodes.
Simultaneous shutdown / restart of all nodes All services must recover correctly after reboot.

Web Architecture & Best Practices

Reference web architecture

Principles for designing scalable and portable web architectures:

  • Favor simple, modular, and flexible infrastructure.
  • Follow client location (GDNS or equivalent) to bring content closer.
  • Use network load balancers (LVS, IPVS) for scalability.
  • Systematically compare costs and beware of vendor lock-in.
  • TLS:
    • HAProxy for fast frontends,
    • Envoy for compatibility and advanced use cases (mTLS, HTTP/2/3).
  • Caching:
    • Varnish, Apache Traffic Server for large content volumes.
  • Favor open-source stacks and database caches (e.g., Memcached).
  • Use message queues, buffers, and quotas to smooth traffic spikes.
  • For complete architectures:

Comparison of major Cloud platforms

Feature Kubernetes OpenStack AWS Bare-metal HPC CRM oVirt
Deployment tools Helm, YAML, ArgoCD, Juju Ansible, Terraform, Juju CloudFormation, Terraform, Juju Ansible, Shell xCAT, Clush Ansible, Shell Ansible, Python
Bootstrap method API API, PXE API PXE, IPMI PXE, IPMI PXE, IPMI PXE, API
Router control Kube-router Router/Subnet API Route Table / Subnet API Linux, OVS xCAT Linux API
Firewall control Istio, NetworkPolicy Security Groups API Security Group API Linux firewall Linux firewall Linux firewall API
Network virtualization VLAN, VxLAN, others VPC VPC OVS, Linux xCAT Linux API
DNS CoreDNS DNS-Nameserver Route 53 GDNS xCAT Linux API
Load Balancer Kube-proxy, LVS LVS Network Load Balancer LVS SLURM Ldirectord N/A
Storage options Local, Cloud, PVC Swift, Cinder, Nova S3, EFS, EBS, FSx Swift, XFS, EXT4, RAID10 GPFS SAN NFS, SAN

This table serves as a starting point for choosing the right stack based on:

  • Desired level of control (API vs bare-metal),
  • Context (on-prem, public cloud, HPC, CRM…),
  • Existing automation tooling.

Useful Cloud & IT links


Advanced: High Availability, HPC & DevSecOps

High Availability with Corosync & Pacemaker

HA cluster architecture

Basic principles:

  • Multi-node or multi-site clusters for redundancy.
  • Use of IPMI for fencing, provisioning via PXE/NTP/DNS/TFTP.
  • For a 2-node cluster:
 – carefully sequence fencing to avoid split-brain,
 – 3 or more nodes remain recommended for production.

Common resource patterns

  • Multipath storage, LUNs, LVM, NFS.
  • User resources and application processes.
  • Virtual IPs, DNS records, network listeners.

HPC

Overview of an HPC cluster

  • Job orchestration (SLURM or equivalent).
  • High-performance shared storage (GPFS, Lustre…).
  • Possible integration with AI workloads (large-scale training, GPU inference).

DevSecOps

DevSecOps reference design

  • CI/CD pipelines with built-in security checks (linting, SAST, DAST, SBOM).
  • Observability (logs, metrics, traces) integrated from design time.
  • Automated vulnerability scanning, secret management, policy-as-code.

About & Contributions

For more examples, scripts, diagrams, and feedback, see:

Suggestions for corrections, diagram improvements, or new labs are welcome. This wiki aims to remain a living laboratory for AI, cloud, and automation.