Tmp: Difference between revisions

From Essential
Jump to navigation Jump to search
No edit summary
No edit summary
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Il s'agit des notes d'Antonio.
<pre>
== Liens utiles ==
#https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07
\\
Fine Tune Large Language Model (LLM) on a Custom Dataset with QLoRA
== Wiki ==
Suman Das
http://localhost:8585/index.php/Notes_antonio


==Test lien AS2==
Suman Das
*Configurer lien test_as2 dans EDISEND du WebAccess
·
(echo "TEST GENERIX (<User>-<DEV|PROD>)">>~/work/test.txt)
 
MyAS2Name=RICARD_AS2
Follow
15 min read
MyEntry="NAME=test_as2" ;MyBase="edisend" ;MyDump=~/work/${MyBase}.${MyEntry}.$(date '+%Y%m%d-%H').dump;logview -ls ~/$MyBase -f "${MyEntry}" -D $MyDump;grep -e "^NAME=" $MyDump;ls -l $MyDump
·
logchange -s ~/edisend -f "NAME=test_as2" -v "RECIPIENT=${MyAS2Name}" ;sleep 1 ;edisend test_as2 ~/work/test.txt
Jan 25, 2024
logchange -s ~/$MyBase -f "NAME=test_as2" -V $MyDump ;rm -f $MyDump
 
Fine-Tuning LLM
(MyAS2Name=DANONE_FRANCE_AS2 ;logchange -s ~/edisend -f "NAME=test_as2" -v "RECIPIENT=${MyAS2Name}" ;edisend test_as2 ~/work/test.txt)
 
*Voir résultat dans SYSLOG du WebAccess
The field of natural language processing has been revolutionized by large language models (LLMs), which showcase advanced capabilities and sophisticated solutions. Trained on extensive text datasets, these models excel in tasks like text generation, translation, summarization, and question-answering. Despite their power, LLMs may not always align with specific tasks or domains.
==Configuration parefeu==
 
*Quand ce n'est pas TRADEXPRESS, par exemple HUBXPRESS, envoyer les IP et ports au BO, sinon :
In this tutorial, we will explore how fine-tuning LLMs can significantly improve model performance, reduce training costs, and enable more accurate and context-specific results.
*Se connecter au serveur TOD-AS2-<DEV|PROD>
What is LLM Fine-tuning?
*Faire un backup de /etc/httpd/acl-in
 
cd /etc/httpd; cp -p acl-in acl-in.$(date '+%Y%m%d-%H') && vi acl-in;diff acl-in acl-in.$(date '+%Y%m%d-%H')
Fine-tuning LLM involves the additional training of a pre-existing model, which has previously acquired patterns and features from an extensive dataset, using a smaller, domain-specific dataset. In the context of “LLM Fine-Tuning,” LLM denotes a “Large Language Model,” such as the GPT series by OpenAI. This approach holds significance as training a large language model from the ground up is highly resource-intensive in terms of both computational power and time. Utilizing the existing knowledge embedded in the pre-trained model allows for achieving high performance on specific tasks with substantially reduced data and computational requirements.
*Ajouter l'IP dans /etc/httpd/acl-in
 
*Envoyer mail d'info au SAV :
Below are some of the key steps involved in LLM Fine-tuning:
Nous vous informons d’actions « root » sur les serveurs suivants :
 
tod-as2-pp1, tod-as2-prod1
    Select a pre-trained model: For LLM Fine-tuning first step is to carefully select a base pre-trained model that aligns with our desired architecture and functionalities. Pre-trained models are generic purpose models that have been trained on a large corpus of unlabeled data.
*Relancer le service http:
    Gather relevant Dataset: Then we need to gather a dataset that is relevant to our task. The dataset should be labeled or structured in a way that the model can learn from it.
service httpd restart
    Preprocess Dataset: Once the dataset is ready, we need to do some preprocessing for fine-tuning by cleaning it, splitting it into training, validation, and test sets, and ensuring it’s compatible with the model on which we want to fine-tune.
*Mail
    Fine-tuning: After selecting a pre-trained model we need to fine tune it on our preprocessed relevant dataset which is more specific to the task at hand. The dataset which we will select might be related to a particular domain or application, allowing the model to adapt and specialize for that context.
Ajout autorisation adresse 93.174.36.82 pour les liens AS2 de l'environnement de production "H1.fbdgroup".
    Task-specific adaptation: During fine-tuning, the model’s parameters are adjusted based on the new dataset, helping it better understand and generate content relevant to the specific task. This process retains the general language knowledge gained during pre-training while tailoring the model to the nuances of the target domain.
==Export dump==
 
MyEntry="NAME=IMPORT_CLIENTS" ;MyBase="edisend" ;MyDump=~/work/${MyBase}.${MyEntry}.$(date '+%Y%m%d-%H').dump;logview -ls ~/$MyBase -f "${MyEntry}" -D $MyDump;grep -e "^NAME=" $MyDump;ls -l $MyDump
Fine-tuning LLMs is commonly used in natural language processing tasks such as sentiment analysis, named entity recognition, summarization, translation, or any other application where understanding context and generating coherent language is crucial. It helps leverage the knowledge encoded in pre-trained models for more specialized and domain-specific tasks.
==Import dump==
Fine-tuning methods
MyPP=tod-central-pp2
 
scp h2-$(echo ${USER}|sed "s/^h[123]-//g")@${MyPP}:~/work/*$(date '+%Y%m%d')*.dump ~/work/.;ls -t1  ~/work/*$(date '+%Y%m%d')*.dump
Fine-tuning a Large Language Model (LLM) involves a supervised learning process. In this method, a dataset comprising labeled examples is utilized to adjust the model’s weights, enhancing its proficiency in specific tasks. Now, let’s delve into some noteworthy techniques employed in the fine-tuning process.
 
    Full Fine Tuning (Instruction fine-tuning): Instruction fine-tuning is a strategy to enhance a model’s performance across various tasks by training it on examples that guide its responses to queries. The choice of the dataset is crucial and tailored to the specific task, such as summarization or translation. This approach, known as full fine-tuning, updates all model weights, creating a new version with improved capabilities. However, it demands sufficient memory and computational resources, similar to pre-training, to handle the storage and processing of gradients, optimizers, and other components during training.
    Parameter Efficient Fine-Tuning (PEFT) is a form of instruction fine-tuning that is much more efficient than full fine-tuning. Training a language model, especially for full LLM fine-tuning, demands significant computational resources. Memory allocation is not only required for storing the model but also for essential parameters during training, presenting a challenge for simple hardware. PEFT addresses this by updating only a subset of parameters, effectively “freezing” the rest. This reduces the number of trainable parameters, making memory requirements more manageable and preventing catastrophic forgetting. Unlike full fine-tuning, PEFT maintains the original LLM weights, avoiding the loss of previously learned information. This approach proves beneficial for handling storage issues when fine-tuning for multiple tasks. There are various ways of achieving Parameter efficient fine-tuning. Low-Rank Adaptation LoRA & QLoRA are the most widely used and effective.
 
What is LoRa?
 
LoRA is an improved finetuning method where instead of finetuning all the weights that constitute the weight matrix of the pre-trained large language model, two smaller matrices that approximate this larger matrix are fine-tuned. These matrices constitute the LoRA adapter. This fine-tuned adapter is then loaded into the pre-trained model and used for inference.
 
After LoRA fine-tuning for a specific task or use case, the outcome is an unchanged original LLM and the emergence of a considerably smaller “LoRA adapter,” often representing a single-digit percentage of the original LLM size (in MBs rather than GBs).
 
During inference, the LoRA adapter must be combined with its original LLM. The advantage lies in the ability of many LoRA adapters to reuse the original LLM, thereby reducing overall memory requirements when handling multiple tasks and use cases.
What is Quantized LoRA (QLoRA)?
 
QLoRA represents a more memory-efficient iteration of LoRA. QLoRA takes LoRA a step further by also quantizing the weights of the LoRA adapters (smaller matrices) to lower precision (e.g., 4-bit instead of 8-bit). This further reduces the memory footprint and storage requirements. In QLoRA, the pre-trained model is loaded into GPU memory with quantized 4-bit weights, in contrast to the 8-bit used in LoRA. Despite this reduction in bit precision, QLoRA maintains a comparable level of effectiveness to LoRA.
 
In this tutorial, we will use Parameter-efficient fine-tuning with QLoRA.
 
Now let’s explore how we can fine-tune LLM on a custom dataset using QLoRA on a single GPU.
 
    Setting up the NoteBook
    Install required libraries
    Loading dataset
    Create Bitsandbytes configuration
    Loading the Pre-Trained model
    Tokenization
    Test the Model with Zero Shot Inferencing
    Pre-processing dataset
    Preparing the model for QLoRA
    Setup PEFT for Fine-Tuning
    Train PEFT Adapter
    Evaluate the Model Qualitatively (Human Evaluation)
    Evaluate the Model Quantitatively (with ROUGE Metric)
 
1. Setting up the NoteBook.
 
While we will utilize a Kaggle notebook for this demonstration, feel free to use any Jupyter notebook environment. Kaggle offers a generous allowance of 30 hours of free GPU usage per week, which is ample for our experimentation. To begin, let’s open a new notebook, establish some headings, and then proceed to connect to the runtime.
notebook-with-headings
 
Here, we will select the GPU P100 as the ACCELERATOR. Feel free to try other GPU options available in Kaggle or any other environment.
 
In this tutorial, we will be using HuggingFace libraries to download and train the model. To download models from HuggingFace, we will need an Access Token. If you’ve already signed up with HuggingFace, you can generate a new Access Token from the settings section or use any existing Access Token.
2. Install required libraries
 
Now, let’s install the necessary libraries for this experiment.
 
!pip install -q -U bitsandbytes transformers peft accelerate datasets scipy einops evaluate trl rouge_score
 
Let’s understand the importance of some of these libraries.
 
    Bitsandbytes: An excellent package that provides a lightweight wrapper around custom CUDA functions that make LLMs go faster — optimizers, matrix multiplication, and quantization. In this tutorial, we’ll be using this library to load our model as efficiently as possible.
    transformers: A library by Hugging Face (🤗) that provides pre-trained models and training utilities for various natural language processing tasks.
    peft: A library by Hugging Face (🤗) that enables parameter-efficient fine-tuning.
    accelerate: Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16 and leave the rest of your code unchanged.
    datasets: Another library by Hugging Face (🤗) that provides easy access to a wide range of datasets.
    einops: A library that simplifies tensor operations.
 
Loading the required libraries
 
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login
 
interpreter_login()
 
For this tutorial we are not going to track our training metrics, so let’s disable Weights and Biases. The W&B Platform constitutes a fundamental collection of robust components for monitoring, visualizing data and models, and conveying the results. To deactivate Weights and Biases during the fine-tuning process, set the below environment property.
 
import os
# disable Weights and Biases
os.environ['WANDB_DISABLED']="true"
 
If you have an account with Weights and Biases, feel free to enable it and experiment with it.
3. Loading dataset
 
Numerous datasets are available for fine-tuning the model. In this instance, we will utilize the DialogSum DataSet from HuggingFace for the fine-tuning process. DialogSum is an extensive dialogue summarization dataset, featuring 13,460 dialogues along with manually labeled summaries and topics.
 
There is no specific reason for selecting this dataset. Feel free to try this experiment with any custom dataset.
 
Let’s execute the below code to load the above dataset from HuggingFace.
 
huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name)
 
Once the dataset is loaded, we can take a look at it to understand what it contains:
a sample row of the dataset
 
It contains the below fields.
 
    dialogue: text of the dialogue.
    summary: human-written summary of the dialogue.
    topic: human written topic/one-liner of the dialogue.
    id: unique file id of an example.
 
4. Create Bitsandbytes configuration
 
To load the model, we need a configuration class that specifies how we want the quantization to be performed. We’ll be using BitsAndBytesConfig to load our model in 4-bit format. This will reduce memory consumption considerably, at a cost of some accuracy.
 
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )
 
5. Loading the Pre-Trained model
 
Microsoft recently open-sourced the Phi-2, a Small Language Model(SLM) with 2.7 billion parameters. Here, we will use Phi-2 for the fine-tuning process. This language model exhibits remarkable reasoning and language understanding capabilities, achieving state-of-the-art performance among base language models.
 
Let’s now load Phi-2 using 4-bit quantization from HuggingFace.
 
model_name='microsoft/phi-2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name,
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)
 
The model is loaded in 4-bit using the `BitsAndBytesConfig` from the bitsandbytes library. This is a part of the QLoRA process, which involves quantizing the pre-trained weights of the model to 4-bit and keeping them fixed during fine-tuning.
6. Tokenization
 
Now, let’s configure the tokenizer, incorporating left-padding to optimize memory usage during training.
 
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
 
7. Test the Model with Zero Shot Inferencing
 
We will evaluate the base model that we loaded above using a few sample inputs.
 
%%time
from transformers import set_seed
seed = 42
set_seed(seed)
 
index = 10
 
prompt = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']
 
formatted_prompt = f"Instruct: Summarize the following conversation.\n{prompt}\nOutput:\n"
res = gen(original_model,formatted_prompt,100,)
#print(res[0])
output = res[0].split('Output:\n')[1]
 
dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')
 
base model output
 
From the observation above, it’s evident that the model faces challenges in summarizing the dialogue compared to the baseline summary. However, it manages to extract essential information from the text, suggesting the potential for fine-tuning the model for the specific task at hand.
8. Pre-processing dataset
 
The dataset cannot be directly employed for fine-tuning. It is essential to format the prompt in a way that the model can comprehend. Referring to the HuggingFace model documentation, it is evident that a prompt needs to be generated using dialogue and summary in the specified format below.
Prompt Format
 
We’ll create some helper functions to format our input dataset, ensuring its suitability for the fine-tuning process. Here, we need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM.
 
def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"
   
    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"
   
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]
 
    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt
 
    return sample
 
The above function can be used to convert our input into prompt format.
 
Now, we will use our model tokenizer to process these prompts into tokenized ones.
 
Our aim here is to generate input sequences with consistent lengths, which is beneficial for fine-tuning the language model by optimizing efficiency and minimizing computational overhead. It is essential to ensure that these sequences do not surpass the model’s maximum token limit.
 
from functools import partial
 
# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length
 
 
def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )
 
# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
   
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
   
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )
 
    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
   
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)
 
    return dataset
 
By utilizing these functions, our dataset will be prepared for the fine-tuning process!
 
## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)
 
train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])
 
9. Preparing the model for QLoRA
 
# 2 - Using the prepare_model_for_kbit_training method from PEFT
# Preparing the Model for QLoRA
original_model = prepare_model_for_kbit_training(original_model)
 
Here, the model is prepared for QLoRA training using the `prepare_model_for_kbit_training()` function. This function initializes the model for QLoRA by setting up the necessary configurations.
10. Setup PEFT for Fine-Tuning


MyDump=/usr/local/tx/users/h1-conforama/work/RoutageDemat.INDEX=1.20150907-14.dump ;MyBase="RoutageDemat" ;MyName=$(grep -e "^NAME=" $MyDump);logview -ls ~/$MyBase -f "${MyName}" -D ~/work/${MyBase}.${MyName}.$(date '+%Y%m%d-%H').dump.old && logadd -s ~/$MyBase -D $MyDump
Let us now define the LoRA config for Fine-tuning the base model.
==[Install files]==
MyScript=MyPatchInstall.sh;mkdir -p ~/work/adasilva;cd ~/work/adasilva;touch $MyScript;chmod u+x $MyScript;nano $MyScript;./$MyScript
#!/bin/bash


MyDir=bin ;MyPreprod=tod-central-pp2
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
MyUser=`echo ${USER}|sed "s/^h[123]-//g"`


nano ~/work/adasilva/MyList.txt
config = LoraConfig(
cd ~/$MyDir
    r=32, #Rank
for MyLine in $(cat ~/work/adasilva/MyList.txt)
    lora_alpha=32,
do
    target_modules=[
  if [ -f $MyLine ] && [ ! -f "N-1/${MyLine}.$(date '+%Y%m%d')" ]
        'q_proj',
  then
        'k_proj',
cp -p $MyLine "N-1/${MyLine}.$(date '+%Y%m%d')" || (echo "error with $MyLine"; exit)
        'v_proj',
  fi
        'dense'
  scp h2-${MyUser}@${MyPreprod}:${MyDir}/$MyLine .
    ],
  ls -l ~/${MyDir}/$MyLine
    bias="none",
done
    lora_dropout=0.05,  # Conventional
==[soir]Signature PDF==
    task_type="CAUSAL_LM",
*Lancer cmd en admin
route PRINT
(
route delete 129.185.33.0
route delete 212.142.249.0
)
)
route add 129.185.33.0 mask 255.255.255.0 192.168.137.1 metric 2
route add 212.142.249.0 mask 255.255.255.0 192.168.137.1 metric 2
Logiciel Sinadura
*Faire une copie locale
PROD5:/usr/local/tx/users/h1-europde/outbox/pdf_s/in
PROD4:/usr/local/tx/users/h1-eurotunnel/RGS2/envoi_demat
PROD6:/usr/local/tx/users/h1-sod/outbox/pdf/in
PROD6:/usr/local/tx/users/h1-husqvarna/outbox/pdf/in
(PP5:/usr/local/tx/users/h2-husqvarna/outbox/pdf/in)
*Signer
*Supprimer des serveurs
*Déposer sur les serveurs les documents signés
outbox/pdf_s/out
h1-eurotunnel/RGS2/retour_demat
*Si NOK envoyer mail à:
RIOU Sebastien;KLEIN Christophe;GHESQUIER Julien;MEFTAH Ahmed Kamel
COPIE
COLLEUC Souad;KYPRIANOU Kypros
==Carrefour Ajout Partenaire AS2==
*Remplir fichier de suivi
*Conversion certificat
*Déposer les certificats sur les serveurs AS2-SERVER-PP1 et AS2-SERVER-PROD1 du répertoire /usr/local/tx/users/h[12]-as2server/ssl/certs
===Configuration http://h1.as2server.tradexpressondemand.com ===
====Configuration certificat====
*Ajouter les certificats depuis la table security (/../ssl/certs/RICARD.cer) et faire action vérification si NOK ajouter CA (et puis ROOT)
(Merci de nous fournir les clés CA/ROOT pour le certificat SSL.
Please, send us your CA/ROOT keys.)
====Création du partenaire====
*Lors de la création du partenaire sur WebExpress ajouter tx: au nom de certificat (pour qu'il aille chercher le certificat sur le serveur distant)
(-type dedicated)
===[Configuration parefeu]===
grep 62.23.69.188 /etc/httpd/acl-in
===Test émission/réception===
*Envoyer mail de coordination:
We are ready for test the link on the production environment.
Please, send us test files, as soon as possible.
And send us an e-mail when we can start sending test files.
Nous sommes prêts pour tester la connexion sur l'environnement de production.
Envoyez-nous les fichiers de test dès que possible, s'il vous plaît.
Et envoyez-nous un e-mail quand nous pouvons commencer à envoyer des fichiers de test.
[Test lien AS2]
*Vérification réception
We received your test file.
Have you received our test file ?


Nous validons donc le lien "ULRIC_DE_VARENS" AS2.
# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
Les tests d’émission/réception sont bon :
original_model.gradient_checkpointing_enable()
Merci.
 
We validate the AS2 link named "YPLON MCBRIDE".
peft_model = get_peft_model(original_model, config)
Thanks.
 
==Carrefour Process (from Karim)==
Note the rank (r) hyper-parameter, which defines the rank/dimension of the adapter to be trained. r is the rank of the low-rank matrix used in the adapters, which thus controls the number of parameters trained. A higher rank will allow for more expressivity, but there is a compute tradeoff.
http://prod.mprp.tradexpressondemand.com (192.168.86.40)
 
http://test.mprp.tradexpressondemand.com (192.168.82.40)
alpha here is the scaling factor for the learned weights. The weight matrix is scaled by alpha/r, and thus a higher value for alpha assigns more weight to the LoRA activations.
 
Once everything is set up and the PEFT is prepared, we can use the print_trainable_parameters() helper function to see how many trainable parameters are in the model.
 
print(print_number_of_trainable_model_parameters(peft_model))
 
trainable parameters
11. Train PEFT Adapter
 
Define training arguments and create Trainer instance.


- AS2:
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
import transformers
 
peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=1000,
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)


h1.as2server.tradexpressondemand.com (192.168.84.7)
peft_model.config.use_cache = False
h2.as2server.tradexpressondemand.com (192.168.82.7)


- X400HUBX: (192.168.181.101)
peft_trainer = transformers.Trainer(
h1.x4srv.tradexpressondemand.com (192.168.181.101)
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)


- (MPRP:
Here, we have used 1000 training steps. It seems to be good enough for our custom dataset. We need to try out different numbers before finalizing with training steps. Also, the hyperparameters used above might vary depending on the dataset/model we are trying to fine-tune. This is just to show the capability of fine-tuning.
monitoring.tradexpressondemand.com
monitoring-ppd.tradexpressondemand.com)


Procédure:
Let’s start the training now. Training the model will take some time depending upon the hyperparameters used in TrainingArguments.


*Créer le partenaire AS2 sur h1.as2server.tradexpressondemand.com
peft_trainer.train()
*Créer le partenaire X400 sur h1.x4srv.tradexpressondemand.com
*créer l'entité et le recipient et le routage sur MPRP
- Le nom du partenaire dans AS2SERVER/X4SRV doit être le même que celui de l'entité
- Les fournisseurs avec plusieurs GLN les mettre en RVA dans l'entité
- le code annuaire doit être le GLN sauf pour les partenaires non EDIFACT


*Dans recipient déclarer tous les fournisseurs avec entité code GLN ou NOM PARTENAIRE NON EDIFACT
Once the model is trained successfully, we can use it for inference. Let’s now prepare the inference model by adding an adapter to the original Phi-2 model. Here, we are setting is_trainable=False because the plan is only to perform inference with this PEFT model.
- Pour le routage soit EDIFACT avec GLN ou TEXT avec un CODE fourni


*créer le partenaire sur l'environnement TOD dans BUSINESSPARTNER
import torch
- Le NOM du partenaire sur BUSINESSPARTNER doit être le même que sur la base RECIPIENT de MPRP
from transformers import AutoTokenizer, AutoModelForCausalLM


*faire un test manuel:
base_model_id = "microsoft/phi-2"
*Sur h1-csif01@tod-csif-prod1 ou h1-csif02@tod-csif-prod1
base_model = AutoModelForCausalLM.from_pretrained(base_model_id,  
edisend RECIPIENT=MPRP MPRP.RECIPIENT_CODE=3010270300103 TEST_TRANSPORT  work/test.text
                                                      device_map='auto',
*voir syslog carrefour, machine de COM, et la base suivi de MPRP
                                                      quantization_config=bnb_config,
==Export dump==
                                                      trust_remote_code=True,
MyEntry="NAME=IMPORT_CLIENTS" ;MyBase="edisend" ;MyDump=~/work/${MyBase}.${MyEntry}.$(date '+%Y%m%d-%H').dump;logview -ls ~/$MyBase -f "${MyEntry}" -D $MyDump;grep -e "^NAME=" $MyDump;ls -l $MyDump
                                                      use_auth_token=True)
==Import dump==
MyPP=tod-central-pp2
scp h2-$(echo ${USER}|sed "s/^h[123]-//g")@${MyPP}:~/work/*$(date '+%Y%m%d')*.dump ~/work/.;ls -lrt  ~/work/*$(date '+%Y%m%d')*.dump


MyDump=/usr/local/tx/users/h1-husqvarna/work/edisend.NAME=IMPORT_CLIENTS.20150827-14.dump ;MyBase="partner" ;MyName=$(grep -e "^NAME=" $MyDump);logview -ls ~/$MyBase -f "${MyName}" -D ~/work/${MyBase}.${MyName}.$(date '+%Y%m%d-%H').dump.old && logadd -s ~/$MyBase -D $MyDump
eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
==Infra Generix==
eval_tokenizer.pad_token = eval_tokenizer.eos_token
preprodhubx
prodhubx1
prodhubx2


tod-central-pp2
from peft import PeftModel
tod-central-prod2
tod-central-prod3
tod-central-prod4
tod-brcdp


tod-central-pp5
ft_model = PeftModel.from_pretrained(base_model, "/kaggle/working/peft-dialogue-summary-training-1705417060/checkpoint-1000",torch_dtype=torch.float16,is_trainable=False)
tod-central-prod5
tod-central-prod6
==Creation Base==
MyBase=RoutageDemat ;MyPreprod=tod-central-pp2
MyUser=`echo ${USER}|sed "s/^h[123]-//g"`
mkdir -p ~/$MyBase ;scp h2-${MyUser}@${MyBase}:~/${MyBase}/*.cfg ~/${MyBase}/. ;logcreate -s $MyBase
==[Creation nouveau client]==
===Génération du fichier de configuration===
*Lancer depuis root@preprodhubx :
/root/bin/new-env.proc.sh
*Ouvrir le mail de configuration reçu
===Configuration DNS===
====Configuration serveur primaire (tod-srv2)====
MyCustomer=future-home
cd $NAMED;cp -p named.conf named.conf.$(date '+%Y%m%d')&&vi named.conf; diff named.conf named.conf.$(date '+%Y%m%d')
*Créer des entrées à l'image de formenti


=====MASTER=====
Fine-tuning is often an iterative process. Based on the validation and test sets results, we may need to make further adjustments to the model’s architecture, hyperparameters, or training data to improve its performance. Let’s now see how to evaluate the results of Fine-tuned LLM.
*Créer des copies des fichers et les modifiers dans les répertoires internal et external à l'image de formenti. Ne pas oublier de modifier la date.
12. Evaluate the Model Qualitatively (Human Evaluation)
cd $MASTER
ls -lrt internal/*formenti*; ls -lrt external/*formenti*
cp -p internal/master.internal.com.tradexpressondemand.formenti internal/master.internal.com.tradexpressondemand.$MyCustomer
cp -p external/master.tradexpressondemand.com.external.formenti external/master.tradexpressondemand.com.external.$MyCustomer
sed -i "s/formenti/${MyCustomer}/g" "internal/master.internal.com.tradexpressondemand.$MyCustomer"
sed -i "s/formenti/${MyCustomer}/g" "external/master.tradexpressondemand.com.external.$MyCustomer"
sed -i "s/2014072109/$(date '+%Y%m%d%H')/g" "internal/master.internal.com.tradexpressondemand.$MyCustomer"
sed -i "s/2014072109/$(date '+%Y%m%d%H')/g" "external/master.tradexpressondemand.com.external.$MyCustomer"
ls -l "internal/master.internal.com.tradexpressondemand.$MyCustomer" ;cat "internal/master.internal.com.tradexpressondemand.$MyCustomer"
ls -l "external/master.tradexpressondemand.com.external.$MyCustomer" ;cat "external/master.tradexpressondemand.com.external.$MyCustomer"


====Configuration serveur secondaire (tod-srv1)====
Now, let’s perform inference using the same input but with the PEFT model, as we did previously in step 7 with the original model.
*Configurer avec le même processus que le primaire sauf $MASTER
=====Configuration Apache=====
cd /etc/httpd/conf.d
*Créer des copies des fichers à l'image de formenti
cp -p user-formenti.conf "user-${MyCustomer}.conf"
*Modifier les fichiers
sed -i "s/formenti/${MyCustomer}/g" "user-${MyCustomer}.conf"
ls -l "user-${MyCustomer}.conf" ;grep "${MyCustomer}" "user-${MyCustomer}.conf"
=====Configuration Mail=====
cd /etc/postfix;cp -p canonical canonical.$(date '+%Y%m%d')&&vi canonical; diff canonical canonical.$(date '+%Y%m%d')
*Ajouter les entrées
cp -p virtual virtual.$(date '+%Y%m%d')&&vi virtual; diff virtual virtual.$(date '+%Y%m%d')
* Envoyer mail :
Nous vous informons le début d’actions « root » sur les serveurs suivants :
tod-srv-1, tod-srv-2, tod-central-dev1, tod-central-pp2, tod-central-prod4, tod-as2-pp1, tod-as2-prod1, ftphubx


*Charger la nouvelle configuration:
%%time
postmap canonical;postmap virtual
from transformers import set_seed
====Relance des services sur le DNS primaire puis secondaire====
set_seed(seed)
service named restart; service httpd restart
===Création environnement sur chaque serveur h[123]===
su - ediroot


MyCustomer=future-home
index = 5
ediadm
dialogue = dataset['test'][index]['dialogue']
*Utiliser le nom h[123]-<CustomerName> pour l'utilisateur Unix
summary = dataset['test'][index]['summary']


*Mettre le mot de passe h[123]-<CustomerName> pour le user :
prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
MyHCustomer=`grep -m 1 "^h.-${MyCustomer}" /etc/passwd|cut -d ":" -f 1`;MyH=`echo ${MyHCustomer}|cut -d "-" -f 1`
passwd $MyHCustomer


====h3 uniquement====
peft_model_res = gen(ft_model,prompt,100,)
*Ajouter le client dans le fichier ename-to-uname :
peft_model_output = peft_model_res[0].split('Output:\n')[1]
cd /etc/httpd && cp -p ename-to-uname ename-to-uname.$(date '+%Y%m%d') && vi ename-to-uname ;diff ename-to-uname ename-to-uname.$(date '+%Y%m%d')
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')


cd /usr/local/tx/www/conf/group
dash_line = '-'.join('' for x in range(100))
cp -p edigroups.w3 edigroups.w3.$(date '+%Y%m%d') ;echo " h3-${MyCustomer}">>edigroups.w3;vi edigroups.w3 ;diff edigroups.w3 edigroups.w3.$(date '+%Y%m%d')
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')


cp -p ediusers.w3 ediusers.w3.$(date '+%Y%m%d') ;htpasswd -b ediusers.w3 h3-${MyCustomer} h3-${MyCustomer} ;diff ediusers.w3 ediusers.w3.$(date '+%Y%m%d')
PEFT model output
13. Evaluate the Model Quantitatively (with ROUGE Metric)


====h1 et h2====
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.
cd /etc/httpd/conf.d && cp -p prod.conf prod.conf.$(date '+%Y%m%d') && echo "Use WebAccesTX ${MyH}.${MyCustomer}.tradexpressondemand.com ${MyCustomer} ${MyHCustomer}">>prod.conf ;diff prod.conf prod.conf.$(date '+%Y%m%d')


====h1,h2 et h3====
Let’s now use the ROUGE metric to quantify the validity of summarizations produced by models. It compares summarizations to a “baseline” summary which is usually created by a human. While it’s not a perfect metric, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.
*Création du dossier de log :
cd /var/log/httpd/users && mkdir ${MyCustomer} && chmod 775 ${MyCustomer}; ls -ld /var/log/httpd/users/${MyCustomer}
su - ${MyHCustomer}


MyCustomer=`echo ${USER}|sed "s/^h[123]-//g"`;MyHCustomer=$USER;MyH=`echo ${MyHCustomer}|cut -d "-" -f 1`
To demonstrate the capability of ROUGE Metric Evaluation we will use some sample inputs to evaluate.
cd ~/www && rm log ; ln -s /var/log/httpd/users/$MyCustomer log;ls -l log


*Modifier w3i_confedi
original_model = AutoModelForCausalLM.from_pretrained(base_model_id,
cd ~/www/conf
                                                      device_map='auto',
sed -i "s/h2-default/${MyHCustomer}/g" ~/www/conf/w3i_confedi.cfg; sed -i "s/client/${MyCustomer}/g" w3i_confedi.cfg
                                                      quantization_config=bnb_config,
sed -i "s/h2-${MyCustomer}/${MyHCustomer}/g" w3i_confedi.cfg
                                                      trust_remote_code=True,
grep "${MyHCustomer}" w3i_confedi.cfg
                                                      use_auth_token=True)
====h1 et h2====
echo " ${USER} ${MyCustomer}">>edigroups.w3;vi edigroups.w3;cat edigroups.w3


htpasswd -b ediusers.w3 ${MyCustomer} <voir mail>
import pandas as pd
====h2 uniquement====
htpasswd -b ediusers.w3 ${MyHCustomer} 71gen75
====h1 uniquement====
htpasswd -b ediusers.w3 ${MyHCustomer} <voir mail>
sed -i "s/WAGUI=1/WAGUI=3/g" ~/www/conf/w3i_confedi.cfg
====h1,h2 et h3 en root====
(exit)
* Relance
service httpd restart
===Ajout protocol AS2===
====Sur les serveurs PP et PROD====
cd /usr/local/tx/bin && cp -p as2receive.cgi.conf as2receive.cgi.conf.$(date '+%Y%m%d') && vi as2receive.cgi.conf ; diff as2receive.cgi.conf as2receive.cgi.conf.$(date '+%Y%m%d')
====Sur les serveur AS2 PP et PROD====
cd /etc/httpd
cp -p as2_id-to-host as2_id-to-host.$(date '+%Y%m%d') && vi as2_id-to-host ; diff as2_id-to-host as2_id-to-host.$(date '+%Y%m%d')


cp -p as2_id-to-user as2_id-to-user.$(date '+%Y%m%d') && vi as2_id-to-user ; diff as2_id-to-user as2_id-to-user.$(date '+%Y%m%d')
dialogues = dataset['test'][0:10]['dialogue']
===Ajout protocol X400===
human_baseline_summaries = dataset['test'][0:10]['summary']
====Sur les serveurs PP et PROD2====
*Modifier le fichier users avec des tabulations
cd /usr/local/tx/x400 && cp -p users users.$(date '+%Y%m%d') && vi users ;diff users users.$(date '+%Y%m%d')


*Edition Prod>HUB1 Preprod>HUB2
original_model_summaries = []
MyCustomer=future-home
instruct_model_summaries = []
peft_model_summaries = []


MyHCustomer=`grep -m 1 "^h.-${MyCustomer}" /etc/passwd|cut -d ":" -f 1` && su - ${MyHCustomer}
for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
   
    original_model_res = gen(original_model,prompt,100,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]
   
    peft_model_res = gen(ft_model,prompt,100,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')


MyCustomer=`echo ${USER}|sed "s/^h[123]-//g"`;MyHCustomer=$USER;MyH=`echo ${MyHCustomer}|cut -d "-" -f 1`
    original_model_summaries.append(original_model_text_output)
sed -i "s/h2-default/${MyHCustomer}/g" .x4rc ;sed -i "s/FROM=S=TODPP2/FROM=S=`echo ${MyCustomer}|tr '[:lower:]' '[:upper:]'`/g" .x4rc ;grep -i $MyCustomer .x4rc
    peft_model_summaries.append(peft_model_text_output)
====h1 uniquement====
sed -i "s/HUB2/HUB1/g" .x4rc ;grep -i HUB1 .x4rc
===Ajout protocol FTP===
*connexion via FTPHubX


  MyCustomer=future-home
zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
   
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df


cd /usr/local/tx/users/h11ftp/public
import evaluate
mkdir -p "${MyCustomer}/${MyCustomer}/dev/in" ;mkdir -p "${MyCustomer}/${MyCustomer}/dev/out"
mkdir -p "${MyCustomer}/${MyCustomer}/prod/in" ;mkdir -p "${MyCustomer}/${MyCustomer}/prod/out"
chown h11ftp:ediusers -R ${MyCustomer} ;chmod og-rwx -R ${MyCustomer} ;find ${MyCustomer} -type d -exec ls -ld {} \;


cd /usr/local/tx/users/h11ftp/private
rouge = evaluate.load('rouge')
mkdir -p "${MyCustomer}/${MyCustomer}/dev/in" ;mkdir -p "${MyCustomer}/${MyCustomer}/dev/out"
mkdir -p "${MyCustomer}/${MyCustomer}/prod/in" ;mkdir -p "${MyCustomer}/${MyCustomer}/prod/out"
chown h11ftp:ediusers -R ${MyCustomer} ;chmod og-rwx -R ${MyCustomer} ;find ${MyCustomer} -type d -exec ls -ld {} \;


* Se connecter au Web Interne
original_model_results = rouge.compute(
http://ftphubx.hubxpress.net/tx/cgi/w3i_first.cgi
    predictions=original_model_summaries,
*copier 3 entrées partner d'un autre client et modifier sur Webexpress avec mot de passe mail
    references=human_baseline_summaries[0:len(original_model_summaries)],
* Envoyer mail fin d'actions :
    use_aggregator=True,
Nous vous informons de la fin des actions « root » sur les serveurs suivants :
    use_stemmer=True,
tod-srv-1, tod-srv-2, tod-central-dev1, tod-central-pp2, tod-central-prod4, tod-as2-pp1, tod-as2-prod1, ftphubx
)
==Configuration mprp==
*Accèder à http://<test|prod>.mprp.tradexpressondemand.com/
==Process AS2==
*envoyer un mail avec nos paramètres de l'infra TRADEXPRESS ou infra HUBXPRESS et demander les leurs
*recevoir la configuration Partenaire
*Ajouter le certificat dans ssl/certs
*Créer le partenaire dans WebExpress/PARTNER
*Faire [test lien AS2]
*configurer notre parefeu
[Configuration parefeu]
*Si NOK vérifier que le partenaire est prêt
==Test lien X400==
(echo "TEST GENERIX (<User>-<DEV|PROD>)">>~/work/test.txt)
edisend test_X400 ~/work/test.txt
==[Creation IMOD][316297 ]DOLE / IMOD / maj / date : 12/08==
*La demande doit venir d'un consultant ou un commercial Generix sinon renvoyer le mail à csm-cgi et attendre validation
*Se connecter à http://invoicemanagerondemand.com/
===Sélection environnement===
*Aller dans l'onglet "Invoice Manager"
*Sélectionner l'environnement. S'il n'existe pas dans le menu voir point "Affectation environnement".
====(Affectation environnement)====
*Aller dans "Invoice Manager/Environnements/Affectation"
*Choisir la personne à qui affecté
*Rechercher le client par nom ou par code
*Affecter les environnements voulus
===Ajout partenaire===
*Aller dans "Invoice Manager/Paramétrage/Partenaires"
*Sélectionner "Ajouter un partenaire" et paramètrer selon demande client (parfois déjà existant)


*Type de code : EAN sinon ZZZ
peft_model_results = rouge.compute(
===Ajout profil===
    predictions=peft_model_summaries,
*Aller dans "Invoice Manager/Paramétrage/Profils"
    references=human_baseline_summaries[0:len(peft_model_summaries)],
*Sur Doublon non préciser voir d'autres profils
    use_aggregator=True,
==configuration AS2/EDISENT==
    use_stemmer=True,
Startup: external
)
Type : EDIFACT
Transport to the Recipient : enable
Log Level : 2
Alarms : OFF
==Mise à jour certificat==
===Script===
MyScript=MyCertUpdate.sh ;mkdir -p ~/work/adasilva ; cd ~/work/adasilva;touch $MyScript; chmod u+x $MyScript;nano $MyScript;./$MyScript
#!/bin/bash


MyCertOldMd5=6b8922e22af632085404391f13b90cac
print('ORIGINAL MODEL:')
MyCertNew=CertNew.cer
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)


chown root:root CertNew.cer
print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")
chmod a+r CertNew.cer


>/tmp/$$tmp
improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
#!/bin/bash
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')


MyCertOldMd5=`md5sum old.cer`
Rouge metric evaluation
MyCertNew=new.cer


chown root:root $MyCertNew
As we can see in the above results, there is a significant improvement in the PEFT model as compared to the original model denoted in terms of percentage.
chmod a+r $MyCertNew


>/tmp/$$tmp
If you’d like to access the complete notebook, please refer to the repository below.
for MyFile in /usr/local/tx/users/*/ssl/certs/*.cer
FineTune Phi-2 on Custom DataSet
do
Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources
  echo "md5sum $MyFile >>/tmp/$$tmp"
  md5sum $MyFile >>/tmp/$$tmp
done


for MyFile in $(grep -e "^${MyCertOldMd5}" /tmp/$$tmp|cut -d " " -f3)
www.kaggle.com
do
Conclusion
  MyBackup=`echo $MyFile|sed 's/\/ssl\/certs\//\/ssl\/certs\/N-1\//g'`.$(date '+%Y%m%d')


  echo "cp -p ${MyFile} ${MyBackup} && cp -p CertNew.cer ${MyFile}"
Fine-tuning Large Language Models (LLMs) has become essential for enterprises seeking to optimize their operational processes. While the initial training of LLMs imparts a broad language understanding, the fine-tuning process refines these models into specialized tools capable of handling specific topics and providing more accurate results. Tailoring LLMs for distinct tasks, industries, or datasets extends the capabilities of these models, ensuring their relevance and value in a dynamic digital landscape. Looking ahead, ongoing exploration and innovation in LLMs, coupled with refined fine-tuning methodologies, are poised to advance the development of smarter, more efficient, and contextually aware AI systems.
  cp -p ${MyFile} ${MyBackup} && cp -p CertNew.cer ${MyFile}
References
done
microsoft/phi-2 · Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.


rm -f /tmp/$$tmp
huggingface.
==Creation Partner AS2 depuis PARTNER==
Fine-tuning large language models (LLMs) in 2024 | SuperAnnotate
===tradexpressondemand.com===
Dive into LLM fine-tuning: its importance, types, methods, and best practices for optimizing language model…
*Configurer certificat
*Configurer parefeu
[Configuration parefeu]
Partner name :
Transport mechanism : HTTP
EDI Syntax : NONE ou EDIFACT (si GLN)
Partner info : AS2 <PartnerName>


URL :
www.superannotate.com
Proxy Address : http://tod-as2-<pp1|prod1>.tradexpressondemand.com
microsoft/phi-2 · How to fine-tune this? + Training code
File disposition : attanchment
I have tried fine-tuning the model with LoRA (peft) using the following target modules: 'lm_head.linear'…
Keep HTTP Headers on output : enable
Proxy Port : 8080
Force HTTP 1.0: enable
Security Method : EDIINT


Scenario step 1 : pkcs7-signature
huggingface.co
Scenario step 2 : zlib-compressed
Phi-2: The surprising power of small language models
Scenario step 3 : pkcs7-encrypted
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training…


Subject : <CustomerName> TO <PartnerName> (<PROD|DEV>)
www.microsoft.com
AS2 From : <IdNameClient>-<PROD|DEV>
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one…
AS2 To: <IdNamePartner>
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one use? Many papers use…


MDN : Synchronous
ai.stackexchange.com
MDN Signature : pkcs7-signature
LoRA
MDN Receipt URL : http://as2-<ppd|prod>.tradexpressondemand.com
We're on a journey to advance and democratize artificial intelligence through open source and open science.
MDN Time out : 600
Private signature key : generix
Private signature passphrase : <voir SSL generix>


Recipient encryption certificate : <Nom certificat Partner>
huggingface.co
Recipient signature certificate : <Nom certificat Partner>
ROUGE - a Hugging Face Space by evaluate-metric
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for…


Encryption alogorithm : des3
huggingface.co
Signature Hash : sha1
GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for…
===hubxpress.net===
Accessible large language models via k-bit quantization for PyTorch. - GitHub - TimDettmers/bitsandbytes: Accessible…
* Idem que pour tradexpressondemand.com sauf:
pas de proxy
MDN Receipt URL : http://as2hubx.hubxpress.net/ediint-<IdNameClient>-SYNC-<prod|dev>
==Cora/X400 creation==
(*Connexion à l'URL WAB selon keepass)
*Lancer la connection rdp "VNC@ALLEGRO-CORA-PROD1"
*Connexion au client (CTRL+ALT+FIN)
*Lancer Allegro (Attention, cette action crée des blocages. Il faut fermer dès que possible. Que nos actions soit fini ou pas, une protection ferme l'appli après 10 minutes)
*Selection appli. partenaire
*Voir si O/R déjà existant et l'associer au nouveau partenaire par création depuis liste partenaire
*Sinon, créer un lien X400_P2 ou ALLEGRO si O=ALLEGRO:
Mné + Ident. = GLN
Qualif = 14
O/R :
pays + dom. admin + Organ. + Nom perso + OK
*attention fermer


2. Config Web Access
github.com
*Faire une copie d'un partner MALONGO (X400)
Llm
*Modifier Nom + GLN partner x2
Genai
Machine Learning
Artificial Intelligence
</pre>
[[File:Linwinmac.jpg‎]]

Latest revision as of 09:21, 7 March 2024

#https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07
Fine Tune Large Language Model (LLM) on a Custom Dataset with QLoRA
Suman Das

Suman Das
·

Follow
15 min read
·
Jan 25, 2024

Fine-Tuning LLM

The field of natural language processing has been revolutionized by large language models (LLMs), which showcase advanced capabilities and sophisticated solutions. Trained on extensive text datasets, these models excel in tasks like text generation, translation, summarization, and question-answering. Despite their power, LLMs may not always align with specific tasks or domains.

In this tutorial, we will explore how fine-tuning LLMs can significantly improve model performance, reduce training costs, and enable more accurate and context-specific results.
What is LLM Fine-tuning?

Fine-tuning LLM involves the additional training of a pre-existing model, which has previously acquired patterns and features from an extensive dataset, using a smaller, domain-specific dataset. In the context of “LLM Fine-Tuning,” LLM denotes a “Large Language Model,” such as the GPT series by OpenAI. This approach holds significance as training a large language model from the ground up is highly resource-intensive in terms of both computational power and time. Utilizing the existing knowledge embedded in the pre-trained model allows for achieving high performance on specific tasks with substantially reduced data and computational requirements.

Below are some of the key steps involved in LLM Fine-tuning:

    Select a pre-trained model: For LLM Fine-tuning first step is to carefully select a base pre-trained model that aligns with our desired architecture and functionalities. Pre-trained models are generic purpose models that have been trained on a large corpus of unlabeled data.
    Gather relevant Dataset: Then we need to gather a dataset that is relevant to our task. The dataset should be labeled or structured in a way that the model can learn from it.
    Preprocess Dataset: Once the dataset is ready, we need to do some preprocessing for fine-tuning by cleaning it, splitting it into training, validation, and test sets, and ensuring it’s compatible with the model on which we want to fine-tune.
    Fine-tuning: After selecting a pre-trained model we need to fine tune it on our preprocessed relevant dataset which is more specific to the task at hand. The dataset which we will select might be related to a particular domain or application, allowing the model to adapt and specialize for that context.
    Task-specific adaptation: During fine-tuning, the model’s parameters are adjusted based on the new dataset, helping it better understand and generate content relevant to the specific task. This process retains the general language knowledge gained during pre-training while tailoring the model to the nuances of the target domain.

Fine-tuning LLMs is commonly used in natural language processing tasks such as sentiment analysis, named entity recognition, summarization, translation, or any other application where understanding context and generating coherent language is crucial. It helps leverage the knowledge encoded in pre-trained models for more specialized and domain-specific tasks.
Fine-tuning methods

Fine-tuning a Large Language Model (LLM) involves a supervised learning process. In this method, a dataset comprising labeled examples is utilized to adjust the model’s weights, enhancing its proficiency in specific tasks. Now, let’s delve into some noteworthy techniques employed in the fine-tuning process.

    Full Fine Tuning (Instruction fine-tuning): Instruction fine-tuning is a strategy to enhance a model’s performance across various tasks by training it on examples that guide its responses to queries. The choice of the dataset is crucial and tailored to the specific task, such as summarization or translation. This approach, known as full fine-tuning, updates all model weights, creating a new version with improved capabilities. However, it demands sufficient memory and computational resources, similar to pre-training, to handle the storage and processing of gradients, optimizers, and other components during training.
    Parameter Efficient Fine-Tuning (PEFT) is a form of instruction fine-tuning that is much more efficient than full fine-tuning. Training a language model, especially for full LLM fine-tuning, demands significant computational resources. Memory allocation is not only required for storing the model but also for essential parameters during training, presenting a challenge for simple hardware. PEFT addresses this by updating only a subset of parameters, effectively “freezing” the rest. This reduces the number of trainable parameters, making memory requirements more manageable and preventing catastrophic forgetting. Unlike full fine-tuning, PEFT maintains the original LLM weights, avoiding the loss of previously learned information. This approach proves beneficial for handling storage issues when fine-tuning for multiple tasks. There are various ways of achieving Parameter efficient fine-tuning. Low-Rank Adaptation LoRA & QLoRA are the most widely used and effective.

What is LoRa?

LoRA is an improved finetuning method where instead of finetuning all the weights that constitute the weight matrix of the pre-trained large language model, two smaller matrices that approximate this larger matrix are fine-tuned. These matrices constitute the LoRA adapter. This fine-tuned adapter is then loaded into the pre-trained model and used for inference.

After LoRA fine-tuning for a specific task or use case, the outcome is an unchanged original LLM and the emergence of a considerably smaller “LoRA adapter,” often representing a single-digit percentage of the original LLM size (in MBs rather than GBs).

During inference, the LoRA adapter must be combined with its original LLM. The advantage lies in the ability of many LoRA adapters to reuse the original LLM, thereby reducing overall memory requirements when handling multiple tasks and use cases.
What is Quantized LoRA (QLoRA)?

QLoRA represents a more memory-efficient iteration of LoRA. QLoRA takes LoRA a step further by also quantizing the weights of the LoRA adapters (smaller matrices) to lower precision (e.g., 4-bit instead of 8-bit). This further reduces the memory footprint and storage requirements. In QLoRA, the pre-trained model is loaded into GPU memory with quantized 4-bit weights, in contrast to the 8-bit used in LoRA. Despite this reduction in bit precision, QLoRA maintains a comparable level of effectiveness to LoRA.

In this tutorial, we will use Parameter-efficient fine-tuning with QLoRA.

Now let’s explore how we can fine-tune LLM on a custom dataset using QLoRA on a single GPU.

    Setting up the NoteBook
    Install required libraries
    Loading dataset
    Create Bitsandbytes configuration
    Loading the Pre-Trained model
    Tokenization
    Test the Model with Zero Shot Inferencing
    Pre-processing dataset
    Preparing the model for QLoRA
    Setup PEFT for Fine-Tuning
    Train PEFT Adapter
    Evaluate the Model Qualitatively (Human Evaluation)
    Evaluate the Model Quantitatively (with ROUGE Metric)

1. Setting up the NoteBook.

While we will utilize a Kaggle notebook for this demonstration, feel free to use any Jupyter notebook environment. Kaggle offers a generous allowance of 30 hours of free GPU usage per week, which is ample for our experimentation. To begin, let’s open a new notebook, establish some headings, and then proceed to connect to the runtime.
notebook-with-headings

Here, we will select the GPU P100 as the ACCELERATOR. Feel free to try other GPU options available in Kaggle or any other environment.

In this tutorial, we will be using HuggingFace libraries to download and train the model. To download models from HuggingFace, we will need an Access Token. If you’ve already signed up with HuggingFace, you can generate a new Access Token from the settings section or use any existing Access Token.
2. Install required libraries

Now, let’s install the necessary libraries for this experiment.

!pip install -q -U bitsandbytes transformers peft accelerate datasets scipy einops evaluate trl rouge_score

Let’s understand the importance of some of these libraries.

    Bitsandbytes: An excellent package that provides a lightweight wrapper around custom CUDA functions that make LLMs go faster — optimizers, matrix multiplication, and quantization. In this tutorial, we’ll be using this library to load our model as efficiently as possible.
    transformers: A library by Hugging Face (🤗) that provides pre-trained models and training utilities for various natural language processing tasks.
    peft: A library by Hugging Face (🤗) that enables parameter-efficient fine-tuning.
    accelerate: Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16 and leave the rest of your code unchanged.
    datasets: Another library by Hugging Face (🤗) that provides easy access to a wide range of datasets.
    einops: A library that simplifies tensor operations.

Loading the required libraries

from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login

interpreter_login()

For this tutorial we are not going to track our training metrics, so let’s disable Weights and Biases. The W&B Platform constitutes a fundamental collection of robust components for monitoring, visualizing data and models, and conveying the results. To deactivate Weights and Biases during the fine-tuning process, set the below environment property.

import os
# disable Weights and Biases
os.environ['WANDB_DISABLED']="true"

If you have an account with Weights and Biases, feel free to enable it and experiment with it.
3. Loading dataset

Numerous datasets are available for fine-tuning the model. In this instance, we will utilize the DialogSum DataSet from HuggingFace for the fine-tuning process. DialogSum is an extensive dialogue summarization dataset, featuring 13,460 dialogues along with manually labeled summaries and topics.

There is no specific reason for selecting this dataset. Feel free to try this experiment with any custom dataset.

Let’s execute the below code to load the above dataset from HuggingFace.

huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name)

Once the dataset is loaded, we can take a look at it to understand what it contains:
a sample row of the dataset

It contains the below fields.

    dialogue: text of the dialogue.
    summary: human-written summary of the dialogue.
    topic: human written topic/one-liner of the dialogue.
    id: unique file id of an example.

4. Create Bitsandbytes configuration

To load the model, we need a configuration class that specifies how we want the quantization to be performed. We’ll be using BitsAndBytesConfig to load our model in 4-bit format. This will reduce memory consumption considerably, at a cost of some accuracy.

compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

5. Loading the Pre-Trained model

Microsoft recently open-sourced the Phi-2, a Small Language Model(SLM) with 2.7 billion parameters. Here, we will use Phi-2 for the fine-tuning process. This language model exhibits remarkable reasoning and language understanding capabilities, achieving state-of-the-art performance among base language models.

Let’s now load Phi-2 using 4-bit quantization from HuggingFace.

model_name='microsoft/phi-2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name, 
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

The model is loaded in 4-bit using the `BitsAndBytesConfig` from the bitsandbytes library. This is a part of the QLoRA process, which involves quantizing the pre-trained weights of the model to 4-bit and keeping them fixed during fine-tuning.
6. Tokenization

Now, let’s configure the tokenizer, incorporating left-padding to optimize memory usage during training.

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token

7. Test the Model with Zero Shot Inferencing

We will evaluate the base model that we loaded above using a few sample inputs.

%%time
from transformers import set_seed
seed = 42
set_seed(seed)

index = 10

prompt = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

formatted_prompt = f"Instruct: Summarize the following conversation.\n{prompt}\nOutput:\n"
res = gen(original_model,formatted_prompt,100,)
#print(res[0])
output = res[0].split('Output:\n')[1]

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

base model output

From the observation above, it’s evident that the model faces challenges in summarizing the dialogue compared to the baseline summary. However, it manages to extract essential information from the text, suggesting the potential for fine-tuning the model for the specific task at hand.
8. Pre-processing dataset

The dataset cannot be directly employed for fine-tuning. It is essential to format the prompt in a way that the model can comprehend. Referring to the HuggingFace model documentation, it is evident that a prompt needs to be generated using dialogue and summary in the specified format below.
Prompt Format

We’ll create some helper functions to format our input dataset, ensuring its suitability for the fine-tuning process. Here, we need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM.

def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"
    
    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"
    
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt

    return sample

The above function can be used to convert our input into prompt format.

Now, we will use our model tokenizer to process these prompts into tokenized ones.

Our aim here is to generate input sequences with consistent lengths, which is beneficial for fine-tuning the language model by optimizing efficiency and minimizing computational overhead. It is essential to ensure that these sequences do not surpass the model’s maximum token limit.

from functools import partial

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

By utilizing these functions, our dataset will be prepared for the fine-tuning process!

## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)

train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])

9. Preparing the model for QLoRA

# 2 - Using the prepare_model_for_kbit_training method from PEFT
# Preparing the Model for QLoRA
original_model = prepare_model_for_kbit_training(original_model)

Here, the model is prepared for QLoRA training using the `prepare_model_for_kbit_training()` function. This function initializes the model for QLoRA by setting up the necessary configurations.
10. Setup PEFT for Fine-Tuning

Let us now define the LoRA config for Fine-tuning the base model.

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
original_model.gradient_checkpointing_enable()

peft_model = get_peft_model(original_model, config)

Note the rank (r) hyper-parameter, which defines the rank/dimension of the adapter to be trained. r is the rank of the low-rank matrix used in the adapters, which thus controls the number of parameters trained. A higher rank will allow for more expressivity, but there is a compute tradeoff.

alpha here is the scaling factor for the learned weights. The weight matrix is scaled by alpha/r, and thus a higher value for alpha assigns more weight to the LoRA activations.

Once everything is set up and the PEFT is prepared, we can use the print_trainable_parameters() helper function to see how many trainable parameters are in the model.

print(print_number_of_trainable_model_parameters(peft_model))

trainable parameters
11. Train PEFT Adapter

Define training arguments and create Trainer instance.

output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
import transformers

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=1000,
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

Here, we have used 1000 training steps. It seems to be good enough for our custom dataset. We need to try out different numbers before finalizing with training steps. Also, the hyperparameters used above might vary depending on the dataset/model we are trying to fine-tune. This is just to show the capability of fine-tuning.

Let’s start the training now. Training the model will take some time depending upon the hyperparameters used in TrainingArguments.

peft_trainer.train()

Once the model is trained successfully, we can use it for inference. Let’s now prepare the inference model by adding an adapter to the original Phi-2 model. Here, we are setting is_trainable=False because the plan is only to perform inference with this PEFT model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "microsoft/phi-2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "/kaggle/working/peft-dialogue-summary-training-1705417060/checkpoint-1000",torch_dtype=torch.float16,is_trainable=False)

Fine-tuning is often an iterative process. Based on the validation and test sets results, we may need to make further adjustments to the model’s architecture, hyperparameters, or training data to improve its performance. Let’s now see how to evaluate the results of Fine-tuned LLM.
12. Evaluate the Model Qualitatively (Human Evaluation)

Now, let’s perform inference using the same input but with the PEFT model, as we did previously in step 7 with the original model.

%%time
from transformers import set_seed
set_seed(seed)

index = 5
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"

peft_model_res = gen(ft_model,prompt,100,)
peft_model_output = peft_model_res[0].split('Output:\n')[1]
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')

PEFT model output
13. Evaluate the Model Quantitatively (with ROUGE Metric)

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.

Let’s now use the ROUGE metric to quantify the validity of summarizations produced by models. It compares summarizations to a “baseline” summary which is usually created by a human. While it’s not a perfect metric, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.

To demonstrate the capability of ROUGE Metric Evaluation we will use some sample inputs to evaluate.

original_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

import pandas as pd

dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
    
    original_model_res = gen(original_model,prompt,100,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]
    
    peft_model_res = gen(ft_model,prompt,100,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df

import evaluate

rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Rouge metric evaluation

As we can see in the above results, there is a significant improvement in the PEFT model as compared to the original model denoted in terms of percentage.

If you’d like to access the complete notebook, please refer to the repository below.
FineTune Phi-2 on Custom DataSet
Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

www.kaggle.com
Conclusion

Fine-tuning Large Language Models (LLMs) has become essential for enterprises seeking to optimize their operational processes. While the initial training of LLMs imparts a broad language understanding, the fine-tuning process refines these models into specialized tools capable of handling specific topics and providing more accurate results. Tailoring LLMs for distinct tasks, industries, or datasets extends the capabilities of these models, ensuring their relevance and value in a dynamic digital landscape. Looking ahead, ongoing exploration and innovation in LLMs, coupled with refined fine-tuning methodologies, are poised to advance the development of smarter, more efficient, and contextually aware AI systems.
References
microsoft/phi-2 · Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.
Fine-tuning large language models (LLMs) in 2024 | SuperAnnotate
Dive into LLM fine-tuning: its importance, types, methods, and best practices for optimizing language model…

www.superannotate.com
microsoft/phi-2 · How to fine-tune this? + Training code
I have tried fine-tuning the model with LoRA (peft) using the following target modules: 'lm_head.linear'…

huggingface.co
Phi-2: The surprising power of small language models
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training…

www.microsoft.com
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one…
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one use? Many papers use…

ai.stackexchange.com
LoRA
We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co
ROUGE - a Hugging Face Space by evaluate-metric
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for…

huggingface.co
GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for…
Accessible large language models via k-bit quantization for PyTorch. - GitHub - TimDettmers/bitsandbytes: Accessible…

github.com
Llm
Genai
Machine Learning
Artificial Intelligence

Linwinmac.jpg