Tmp: Difference between revisions

From Essential
Jump to navigation Jump to search
No edit summary
No edit summary
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Infobox programming language
<pre>
| name                  = Haskell
#https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07
| logo                  = [[File:Haskell-Logo.svg|120px|Logo of Haskell]]
Fine Tune Large Language Model (LLM) on a Custom Dataset with QLoRA
| paradigm              = [[functional programming|functional]], lazy/[[non-strict programming language|non-strict]], [[modular programming|modular]]
Suman Das
| year                  = 1990
| designer              = [[Simon Peyton Jones]], [http://www.cs.yale.edu/homes/hudak-paul/ Paul Hudak], [[Philip Wadler]], et al.
| developer              =
| latest release version = Haskell 2010<ref>http://www.haskell.org/pipermail/haskell/2009-November/021750.html</ref>
| latest release date    = {{start date and age|2009|11|24}}
| latest test version    = Haskell 2011
| latest test date      =
| typing                = [[static typing|static]], [[strong typing|strong]], [[type inference|inferred]]
| implementations        = [[Glasgow Haskell Compiler|GHC]], [[Hugs]], [http://www.cs.york.ac.uk/fp/nhc98/ NHC], [http://repetae.net/john/computer/jhc/ JHC], [[Yhc]]
| dialects              = [[Helium (Haskell)|Helium]], [[Gofer (software)|Gofer]]
| influenced            = [[Agda (theorem prover)|Agda]], [[Bluespec, Inc.|Bluespec]], [[Clojure]], [[C Sharp (programming language)|C#]], [[CAL (Quark Framework)|CAL]], [[Cat (programming language)|Cat]], [[Cayenne (programming language)|Cayenne]], [[Clean (programming language)|Clean]], [[Curry (programming language)|Curry]], [[Epigram (programming language)|Epigram]], [[Escher (programming language)|Escher]], [[F Sharp (programming language)|F#]], [[Factor (programming language)|Factor]], [[Isabelle theorem prover|Isabelle]], [[Generics in Java|Java Generics]], [[Kaya (programming language)|Kaya]], [[Language Integrated Query|LINQ]], [[Mercury (programming language)|Mercury]], [[Ωmega interpreter|Omega]], [[Perl 6]], [[Python (programming language)|Python]], [[Qi (programming language)|Qi]], [[Scala (programming language)|Scala]], [[Timber (programming language)|Timber]], [[Visual Basic .NET|Visual Basic 9.0]]
| influenced by          = [[Alfl]], [[APL (programming language)|APL]], [[FP (programming language)|FP]], [[Hope (programming language)|Hope, Hope+]], [[Id (programming language)|Id]], [[ISWIM]], [[Kent Recursive Calculator|KRC]], [[Lisp (programming language)|Lisp]], [[Miranda (programming language)|Miranda]], [[ML (programming language)|ML, Standard ML]], [[Lazy ML]], [[Orwell (programming language)|Orwell]], [[Ponder (programming language)|Ponder]], [[SASL (programming language)|SASL]], [[SISAL]], [[Scheme (programming language)|Scheme]]
| operating system      = [[Cross-platform]]
| license                =
| website                = {{url|http://haskell.org}}
| file ext              = <code>.hs</code>, <code>.lhs</code>
}}


'''Haskell''' ({{pron-en|ˈhæskəl}})<ref>http://www.haskell.org/pipermail/haskell-cafe/2008-January/038756.html</ref><ref>http://www.haskell.org/pipermail/haskell-cafe/2008-January/038758.html</ref> is a standardized, general-purpose [[purely functional]] [[programming language]], with [[non-strict programming language|non-strict semantics]] and [[Strong typing|strong]] [[Type system#Static typing|static typing]]. It is named after [[logician]] [[Haskell Curry]]. In Haskell, "a function is a [[first-class object|first-class citizen]]"<ref>Rod Burstall, "Christopher Strachey—Understanding Programming Languages", ''Higher-Order and Symbolic Computation'' '''13''':52 (2000)</ref> of the programming language. As a functional programming language, the primary control construct is the [[subroutine|function]]; the language is rooted in the observations of Haskell Curry<ref>{{Citation | last1=Curry | first1=Haskell | title=Proceedings of the National Academy of Sciences | chapter=Functionality in Combinatory Logic | year=1934  | volume=20 | pages=584–590}}</ref><ref name="CurryFeys_paragraph9E">{{Citation | last1=Curry | first1=Haskell B. | last2=Feys | first2=Robert | other1-last=Craig | other1-first=William | title=Combinatory Logic Vol. I | publisher=North-Holland | location=Amsterdam | year=1958}}, with 2 sections by William Craig, see paragraph 9E</ref> and his intellectual descendants,<ref>De Bruijn, Nicolaas (1968), ''Automath, a language for mathematics'', Department of Mathematics, Eindhoven University of Technology, TH-report 68-WSK-05. Reprinted in revised form, with two pages commentary, in: ''Automation and Reasoning, vol 2, Classical papers on computational logic 1967-1970'', Springer Verlag, 1983, pp. 159-200.</ref><ref>{{Citation | last1=Howard | first1=William A.
Suman Das
| chapter=The formulae-as-types notion of construction
·
| pages=479–490
| editor1-last=Seldin | editor1-first=Jonathan P.
| editor1-link=Jonathan P. Seldin
| editor2-last=Hindley | editor2-first=J. Roger
| editor2-link=J. Roger Hindley
| title=To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism
| origyear=original paper manuscript from 1969
| publisher=[[Academic Press]] | location=Boston, MA | isbn=978-0-12-349050-6 | month=09 | year=1980}}.</ref> that "[[Curry&ndash;Howard isomorphism#Origin, scope, and consequences|a proof is a program; the formula it proves is a type for the program]]".


== History ==
Follow
15 min read
·
Jan 25, 2024


Following the release of [[Miranda (programming language)|Miranda]] by Research Software Ltd, in 1985, interest in [[Lazy evaluation|lazy functional languages]] grew: by 1987, more than a dozen [[non-strict]], [[purely functional]] programming languages existed. Of these, Miranda was the most widely used, but was not in the public domain. At the conference on Functional Programming Languages and Computer Architecture (FPCA '87) in [[Portland, Oregon]], a meeting was held during which participants formed a strong consensus that a committee should be formed to define an [[open standard]] for such languages. The committee's purpose was to consolidate the existing [[functional languages]] into a common one that would serve as a basis for future research in functional-language design.<ref name="Pref98">{{cite web|url=http://haskell.org/onlinereport/index.html|title=Haskell 98 Language and Libraries: The Revised Report|year=2002|month=December}}</ref>
Fine-Tuning LLM


=== Haskell 1.0 ===
The field of natural language processing has been revolutionized by large language models (LLMs), which showcase advanced capabilities and sophisticated solutions. Trained on extensive text datasets, these models excel in tasks like text generation, translation, summarization, and question-answering. Despite their power, LLMs may not always align with specific tasks or domains.
The first version of Haskell ("Haskell 1.0") was defined in 1990.<ref>{{cite web|url=http://www.haskell.org/haskellwiki/History_of_Haskell|title=The History of Haskell}}</ref> The committee's efforts resulted in a series of language definitions.


=== Haskell 98 ===
In this tutorial, we will explore how fine-tuning LLMs can significantly improve model performance, reduce training costs, and enable more accurate and context-specific results.
In late 1997, the series culminated in '''Haskell 98''', intended to specify a stable, minimal, portable version of the language and an accompanying standard [[library (computer science)|library]] for teaching, and as a base for future extensions. The committee expressly welcomed the creation of extensions and variants of Haskell 98 via adding and incorporating experimental features.<ref name=Pref98/>
What is LLM Fine-tuning?


In February 1999, the Haskell 98 language standard was originally published as "The Haskell 98 Report".<ref name=Pref98/> In January 2003, a revised version was published as "Haskell 98 Language and Libraries: The Revised Report".<ref name="RevisedReport">{{cite web|url=http://haskell.org/onlinereport/|title=Haskell 98 Language and Libraries: The Revised Report|year=2002|month=December|author=Simon Peyton Jones (editor)|authorlink=Simon Peyton Jones}}</ref> The language continues to evolve rapidly, with the [[Glasgow Haskell Compiler|Glasgow Haskell Compiler (GHC)]] implementation representing the current ''de facto'' standard.
Fine-tuning LLM involves the additional training of a pre-existing model, which has previously acquired patterns and features from an extensive dataset, using a smaller, domain-specific dataset. In the context of “LLM Fine-Tuning,” LLM denotes a “Large Language Model,” such as the GPT series by OpenAI. This approach holds significance as training a large language model from the ground up is highly resource-intensive in terms of both computational power and time. Utilizing the existing knowledge embedded in the pre-trained model allows for achieving high performance on specific tasks with substantially reduced data and computational requirements.


=== Haskell Prime ===
Below are some of the key steps involved in LLM Fine-tuning:
In early 2006, the process of defining a successor to the Haskell 98 standard, informally named '''Haskell&prime;''' ("Haskell Prime"), was begun.<ref>{{cite web|url=http://hackage.haskell.org/trac/haskell-prime|title=Welcome to Haskell'|work=The Haskell' Wiki}}</ref> This is an ongoing incremental process to revise the language definition, producing a new revision once per year.  The first revision, named Haskell 2010, was announced in November 2009.<ref>Simon Marlow, Tue Nov 24 05:50:49 EST 2009: "[Haskell] Announcing [http://www.haskell.org/pipermail/haskell/2009-November/021750.html Haskell 2010]"</ref>


==== Haskell 2010 ====
    Select a pre-trained model: For LLM Fine-tuning first step is to carefully select a base pre-trained model that aligns with our desired architecture and functionalities. Pre-trained models are generic purpose models that have been trained on a large corpus of unlabeled data.
Haskell 2010 adds the Foreign Function Interface (FFI) to Haskell, allowing for bindings to other programming languages, fixes some syntax issues (changes in the formal grammar) and bans so-called "n-plus-k-patterns", that is, definitions of the form <code>fak (n+1) = (n+1) * fak n</code> are no longer allowed. It introduces the Language-Pragma-Syntax-Extension which allows for designating a haskell source as Haskell 2010 or requiring certain Extensions to the Haskell Language. The names of the extensions introduced in Haskell 2010 are
    Gather relevant Dataset: Then we need to gather a dataset that is relevant to our task. The dataset should be labeled or structured in a way that the model can learn from it.
DoAndIfThenElse, HierarchicalModules, EmptyDataDeclarations, FixityResolution, ForeignFunctionInterface, LineCommentSyntax, PatternGuards, RelaxedDependencyAnalysis, LanguagePragma, NoNPlusKPatterns.<ref>http://www.haskell.org/pipermail/haskell/2009-November/021750.html Haskell 2010 announcement</ref>
    Preprocess Dataset: Once the dataset is ready, we need to do some preprocessing for fine-tuning by cleaning it, splitting it into training, validation, and test sets, and ensuring it’s compatible with the model on which we want to fine-tune.
<!-- To do: Describe the change-over from I/O based on lazy streams to monadic I/O. -->
    Fine-tuning: After selecting a pre-trained model we need to fine tune it on our preprocessed relevant dataset which is more specific to the task at hand. The dataset which we will select might be related to a particular domain or application, allowing the model to adapt and specialize for that context.
    Task-specific adaptation: During fine-tuning, the model’s parameters are adjusted based on the new dataset, helping it better understand and generate content relevant to the specific task. This process retains the general language knowledge gained during pre-training while tailoring the model to the nuances of the target domain.


== Features ==
Fine-tuning LLMs is commonly used in natural language processing tasks such as sentiment analysis, named entity recognition, summarization, translation, or any other application where understanding context and generating coherent language is crucial. It helps leverage the knowledge encoded in pre-trained models for more specialized and domain-specific tasks.
{{Main|Haskell features}}
Fine-tuning methods
{{See also|Glasgow Haskell Compiler#Extensions to Haskell}}


Haskell features [[lazy evaluation]], [[pattern matching]], [[list comprehensions]], typeclasses, and [[type polymorphism]]. It is a [[purely functional]] language, which means that in general, functions in Haskell do not have [[Side effect (computer science)|side effects]]. There is a distinct type for representing side effects, [[Orthogonal#Computer science|orthogonal]] to the type of functions. A pure function may return a side effect which is subsequently executed, modeling the impure functions of other languages.
Fine-tuning a Large Language Model (LLM) involves a supervised learning process. In this method, a dataset comprising labeled examples is utilized to adjust the model’s weights, enhancing its proficiency in specific tasks. Now, let’s delve into some noteworthy techniques employed in the fine-tuning process.


Haskell has a [[strongly typed programming language|strong]], [[static type#Static typing|static]] type system based on [[Hindley–Milner type inference]]. Haskell's principal innovation in this area is to add [[type class]]es, which were originally conceived as a principled way to add overloading to the language,<ref name="wadler89">{{cite journal|last1=Wadler|first1=P.|first2=S. |last2=Blott|year=1989|title=How to make ad-hoc polymorphism less ad hoc|journal=Proceedings of the 16th ACM [[SIGPLAN]]-[[SIGACT]] [[Symposium on Principles of Programming Languages]]|publisher=[[Association for Computing Machinery|ACM]]|pages=60–76|doi=10.1145/75277.75283}}</ref> but have since found many more uses.<ref name="hallgren01">{{cite journal|last=Hallgren|first=T.|date=January 2001|title=Fun with Functional Dependencies, or Types as Values in Static Computations in Haskell|journal=Proceedings of the Joint CS/CE Winter Meeting|location=Varberg, Sweden|url=http://www.cs.chalmers.se/~hallgren/Papers/wm01.html}}</ref>
    Full Fine Tuning (Instruction fine-tuning): Instruction fine-tuning is a strategy to enhance a model’s performance across various tasks by training it on examples that guide its responses to queries. The choice of the dataset is crucial and tailored to the specific task, such as summarization or translation. This approach, known as full fine-tuning, updates all model weights, creating a new version with improved capabilities. However, it demands sufficient memory and computational resources, similar to pre-training, to handle the storage and processing of gradients, optimizers, and other components during training.
    Parameter Efficient Fine-Tuning (PEFT) is a form of instruction fine-tuning that is much more efficient than full fine-tuning. Training a language model, especially for full LLM fine-tuning, demands significant computational resources. Memory allocation is not only required for storing the model but also for essential parameters during training, presenting a challenge for simple hardware. PEFT addresses this by updating only a subset of parameters, effectively “freezing” the rest. This reduces the number of trainable parameters, making memory requirements more manageable and preventing catastrophic forgetting. Unlike full fine-tuning, PEFT maintains the original LLM weights, avoiding the loss of previously learned information. This approach proves beneficial for handling storage issues when fine-tuning for multiple tasks. There are various ways of achieving Parameter efficient fine-tuning. Low-Rank Adaptation LoRA & QLoRA are the most widely used and effective.


The type which represents side effects is an example of a [[monad (functional programming)|monad]]. Monads are a general framework which can model different kinds of computation, including error handling, [[Nondeterministic algorithm|nondeterminism]], [[parsing]], and [[software transactional memory]]. Monads are defined as ordinary datatypes, but Haskell provides some [[syntactic sugar]] for their use.
What is LoRa?


The language has an open, published specification,<ref name=RevisedReport/> and [[#Implementations|multiple implementations exist]].
LoRA is an improved finetuning method where instead of finetuning all the weights that constitute the weight matrix of the pre-trained large language model, two smaller matrices that approximate this larger matrix are fine-tuned. These matrices constitute the LoRA adapter. This fine-tuned adapter is then loaded into the pre-trained model and used for inference.


There is an active community around the language, and more than 2600 third-party open-source libraries and tools are available in the online package repository Hackage.<ref name="hackage-stats">http://hackage.haskell.org/cgi-bin/hackage-scripts/stats</ref>
After LoRA fine-tuning for a specific task or use case, the outcome is an unchanged original LLM and the emergence of a considerably smaller “LoRA adapter,” often representing a single-digit percentage of the original LLM size (in MBs rather than GBs).


The main implementation of Haskell, [[Glasgow Haskell Compiler|GHC]], is both an [[Interpreter (computing)|interpreter]] and native-code [[compiler]] that runs on most platforms. GHC is noted for its high-performance implementation of concurrency and parallelism,<ref name="shootout">[http://shootout.alioth.debian.org/ Computer Language Benchmarks Game]</ref> and for having a rich type system incorporating recent innovations such as [[generalized algebraic data type]]s and [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html Type Families].
During inference, the LoRA adapter must be combined with its original LLM. The advantage lies in the ability of many LoRA adapters to reuse the original LLM, thereby reducing overall memory requirements when handling multiple tasks and use cases.
What is Quantized LoRA (QLoRA)?


== Code examples ==
QLoRA represents a more memory-efficient iteration of LoRA. QLoRA takes LoRA a step further by also quantizing the weights of the LoRA adapters (smaller matrices) to lower precision (e.g., 4-bit instead of 8-bit). This further reduces the memory footprint and storage requirements. In QLoRA, the pre-trained model is loaded into GPU memory with quantized 4-bit weights, in contrast to the 8-bit used in LoRA. Despite this reduction in bit precision, QLoRA maintains a comparable level of effectiveness to LoRA.
{{See also|Haskell features#Examples}}
The following is a [[Hello world program]] written in Haskell (note that except for the last line all lines can be omitted):
<source lang="haskell">
module Main where


main :: IO ()
In this tutorial, we will use Parameter-efficient fine-tuning with QLoRA.
main = putStrLn "Hello, World!"
</source>


Here is the factorial function in Haskell, defined in five different ways::
Now let’s explore how we can fine-tune LLM on a custom dataset using QLoRA on a single GPU.


<source lang="haskell">
    Setting up the NoteBook
-- type
    Install required libraries
factorial :: Integer -> Integer
    Loading dataset
    Create Bitsandbytes configuration
    Loading the Pre-Trained model
    Tokenization
    Test the Model with Zero Shot Inferencing
    Pre-processing dataset
    Preparing the model for QLoRA
    Setup PEFT for Fine-Tuning
    Train PEFT Adapter
    Evaluate the Model Qualitatively (Human Evaluation)
    Evaluate the Model Quantitatively (with ROUGE Metric)


-- using recursion
1. Setting up the NoteBook.
factorial 0 = 1
factorial n = n * factorial (n - 1)


-- using lists
While we will utilize a Kaggle notebook for this demonstration, feel free to use any Jupyter notebook environment. Kaggle offers a generous allowance of 30 hours of free GPU usage per week, which is ample for our experimentation. To begin, let’s open a new notebook, establish some headings, and then proceed to connect to the runtime.
factorial n = product [1..n]
notebook-with-headings


-- using recursion but written without pattern matching
Here, we will select the GPU P100 as the ACCELERATOR. Feel free to try other GPU options available in Kaggle or any other environment.
factorial n = if n > 0 then n * factorial (n-1) else 1


-- using fold
In this tutorial, we will be using HuggingFace libraries to download and train the model. To download models from HuggingFace, we will need an Access Token. If you’ve already signed up with HuggingFace, you can generate a new Access Token from the settings section or use any existing Access Token.
factorial n = foldl (*) 1 [1..n]
2. Install required libraries


-- using only prefix notation and n+k-patterns (no longer allowed in Haskell 2010)
Now, let’s install the necessary libraries for this experiment.
factorial 0 = 1
factorial (n+1) = (*) (n+1) (factorial n)
</source>


An efficient implementation of the [[Fibonacci numbers]], as an infinite list, is this:
!pip install -q -U bitsandbytes transformers peft accelerate datasets scipy einops evaluate trl rouge_score
<source lang="haskell">
-- Point-free style
fib :: Integer -> Integer
fib = (fibs !!)
      where fibs = 0 : scanl (+) 1 fibs


-- Explicit
Let’s understand the importance of some of these libraries.
fib :: Integer -> Integer
fib n = fibs !! n
        where fibs = 0 : scanl (+) 1 fibs
</source>


== Implementations ==
    Bitsandbytes: An excellent package that provides a lightweight wrapper around custom CUDA functions that make LLMs go faster — optimizers, matrix multiplication, and quantization. In this tutorial, we’ll be using this library to load our model as efficiently as possible.
    transformers: A library by Hugging Face (🤗) that provides pre-trained models and training utilities for various natural language processing tasks.
    peft: A library by Hugging Face (🤗) that enables parameter-efficient fine-tuning.
    accelerate: Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16 and leave the rest of your code unchanged.
    datasets: Another library by Hugging Face (🤗) that provides easy access to a wide range of datasets.
    einops: A library that simplifies tensor operations.


The following all comply fully, or very nearly, with the Haskell 98 standard, and are distributed under [[open source]] licenses. There are currently no proprietary Haskell implementations.
Loading the required libraries


* The '''[[Glasgow Haskell Compiler]]''' (GHC) compiles to native code on a number of different architectures—as well as to [[ANSI C]]—using [[C--]] as an [[intermediate language]]. GHC is probably the most popular Haskell compiler, and there are quite a few useful libraries (e.g. bindings to [[OpenGL]]) that will work only with GHC. GHC is also distributed along with the [[Haskell platform]].
from datasets import load_dataset
* '''[[Gofer (software)|Gofer]]''' was an educational dialect of Haskell, with a feature called "constructor classes", developed by Mark Jones. It was supplanted by Hugs (see below).
from transformers import (
* '''[http://www.cs.chalmers.se/~augustss/hbc/hbc.html HBC]''' is another native-code Haskell compiler. It has not been actively developed for some time but is still usable.
    AutoModelForCausalLM,
* '''[[Helium (Haskell)|Helium]]''' is a newer dialect of Haskell. The focus is on making it easy to learn by providing clearer error messages. It currently lacks full support for type classes, rendering it incompatible with many Haskell programs.
    AutoTokenizer,
* The '''[http://www.cs.uu.nl/wiki/UHC Utrecht Haskell Compiler]''' (UHC) is a Haskell implementation from [[Utrecht University]]. UHC supports almost all Haskell 98 features plus many experimental extensions. It is implemented using [[attribute grammar]]s and is currently mainly used for research into generated type systems and language extensions.
    BitsAndBytesConfig,
* '''[[Hugs]]''', the '''Haskell User's Gofer System''', is a [[bytecode]] interpreter. It offers fast compilation of programs and reasonable execution speed. It also comes with a simple graphics library. Hugs is good for people learning the basics of Haskell, but is by no means a "toy" implementation. It is the most portable and lightweight of the Haskell implementations.
    HfArgumentParser,
* '''[http://repetae.net/john/computer/jhc/ Jhc]''' is a Haskell compiler written by John Meacham emphasising speed and efficiency of generated programs as well as exploration of new program transformations. [[LHC (Haskell compiler)|LHC]] is a recent fork of Jhc.
    AutoTokenizer,
* '''[http://www.cs.york.ac.uk/fp/nhc98/ nhc98]''' is another bytecode compiler, but the bytecode runs significantly faster than with Hugs. Nhc98 focuses on minimizing memory usage, and is a particularly good choice for older, slower machines.
    TrainingArguments,
* '''[[Yhc]]''', the '''York Haskell Compiler''' was a fork of nhc98, with the goals of being simpler, more portable and more efficient, and integrating support for [http://www.haskell.org/hat/ Hat], the Haskell tracer.  It also featured a [[JavaScript]] backend allowing users to run [http://haskell.org/haskellwiki/Haskell_in_web_browser Haskell programs in a web browser].
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login


== Applications ==
interpreter_login()
Haskell is increasingly being used in commercial situations.<ref>See [http://industry.haskell.org/index Industrial Haskell Group] for collaborative development, [http://cufp.galois.com/ Commercial Users of Functional Programming] for specific projects and [http://www.haskell.org/haskellwiki/Haskell_in_industry Haskell in industry] for a list of companies using Haskell commercially</ref> [[Audrey Tang]]'s [[Pugs]] is an implementation for the long-forthcoming [[Perl 6]] language with an interpreter and compilers that proved useful after just a few months of its writing; similarly, GHC is often a testbed for advanced functional programming features and optimizations. [[Darcs]] is a revision control system written in Haskell, with several innovative features. [[Linspire]] GNU/Linux chose Haskell for system tools development.<ref>{{cite web|url=http://urchin.earth.li/pipermail/debian-haskell/2006-May/000169.html|title=Linspire/Freespire Core OS Team and Haskell|work=Debian Haskell mailing list|year=2006|month=May}}</ref> [[Xmonad]] is a [[window manager]] for the [[X Window System]], written entirely in Haskell.


Bluespec SystemVerilog is a language for semiconductor design that is an extension of Haskell. Additionally, [[Bluespec, Inc.]]'s tools are implemented in Haskell. [[Cryptol]], a language and toolchain for developing and verifying cryptographic algorithms, is implemented in Haskell. Notably, the first formally verified [[microkernel]], [[SeL4#Current research and development|seL4]] was verified using Haskell.
For this tutorial we are not going to track our training metrics, so let’s disable Weights and Biases. The W&B Platform constitutes a fundamental collection of robust components for monitoring, visualizing data and models, and conveying the results. To deactivate Weights and Biases during the fine-tuning process, set the below environment property.


== Related languages ==
import os
# disable Weights and Biases
os.environ['WANDB_DISABLED']="true"


[[Clean (programming language)|Concurrent Clean]] is a close relative of Haskell. Its biggest deviation from Haskell is in the use of [[uniqueness type]]s instead of [[monad (functional programming)|monads]] for I/O and side-effects.
If you have an account with Weights and Biases, feel free to enable it and experiment with it.
3. Loading dataset


A series of languages inspired by Haskell, but with different type systems, have been developed, including:
Numerous datasets are available for fine-tuning the model. In this instance, we will utilize the DialogSum DataSet from HuggingFace for the fine-tuning process. DialogSum is an extensive dialogue summarization dataset, featuring 13,460 dialogues along with manually labeled summaries and topics.


* [[Epigram (programming language)|Epigram]], a functional language with dependent types suitable for proving properties of programs
There is no specific reason for selecting this dataset. Feel free to try this experiment with any custom dataset.
* [[Agda (theorem prover)|Agda]], a functional language with dependent types


Other related languages include:
Let’s execute the below code to load the above dataset from HuggingFace.


* [[Curry (programming language)|Curry]], a language based on Haskell
huggingface_dataset_name = "neil-code/dialogsum-test"
* [[Jaskell]], a functional scripting programming language that runs in Java VM
dataset = load_dataset(huggingface_dataset_name)


Haskell has served as a testbed for many new ideas in language design. There have been a wide number of Haskell variants produced, exploring new language ideas, including:
Once the dataset is loaded, we can take a look at it to understand what it contains:
a sample row of the dataset


* Parallel Haskell:
It contains the below fields.
** From [[Glasgow University]].<ref>[http://www.macs.hw.ac.uk/~dsg/gph/ Glasgow Parallel Haskell]</ref> supports clusters of machines or single multiprocessors.<ref>[http://www.haskell.org/ghc/docs/6.6/html/users_guide/lang-parallel.html GHC Language Features: Parallel Haskell]</ref>  Also within Haskell is support for Symmetric Multiprocessor parallelism.<ref>[http://www.haskell.org/ghc/docs/6.6/html/users_guide/sec-using-smp.html Using GHC: Using SML parallelism]</ref>
** From [[Massachusetts Institute of Technology|MIT]]<ref>[http://csg.csail.mit.edu/projects/languages/ph.shtml MIT Parallel Haskell]</ref>
* Distributed Haskell (formerly Goffin) and [[Eden (computing)|Eden]].{{Citation needed|date=April 2009}}
* [[Eager Haskell]], based on [[speculative execution|speculatively evaluation]].
* Several [[object-oriented programming|object-oriented]] versions: Haskell++, O'Haskell, and Mondrian.
* [[generic programming#Generic Haskell|Generic Haskell]], a version of Haskell with type system support for [[generic programming]].
* O'Haskell, an extension of Haskell adding [[object-oriented programming|object-orientation]] and [[concurrent programming]] support.
* Disciple, an explicitly lazy dialect of Haskell which supports destructive update, computational effects, type directed field projections and allied functional goodness.


== Criticism ==
    dialogue: text of the dialogue.
    summary: human-written summary of the dialogue.
    topic: human written topic/one-liner of the dialogue.
    id: unique file id of an example.


Jan-Willem Maessen, in 2002, and [[Simon Peyton Jones]], in 2003, discussed problems associated with lazy evaluation while also acknowledging the theoretical motivation for it,<ref>Jan-Willem Maessen. ''Eager Haskell: Resource-bounded execution yields efficient iteration''. Proceedings of the 2002 [[Association for Computing Machinery|ACM]] SIGPLAN workshop on Haskell.</ref><ref>Simon Peyton Jones. [http://research.microsoft.com/~simonpj/papers/haskell-retrospective ''Wearing the hair shirt: a retrospective on Haskell'']. Invited talk at [[POPL]] 2003.</ref> in addition to purely practical considerations such as improved performance.<ref>Lazy evaluation can lead to excellent performance, such as in The Computer Language Benchmarks Game [http://www.haskell.org/pipermail/haskell/2006-June/018127.html]</ref> They note that, in addition to adding some performance overhead, laziness makes it more difficult for programmers to reason about the performance of their code (particularly its space usage).
4. Create Bitsandbytes configuration


Bastiaan Heeren, Daan Leijen, and Arjan van IJzendoorn in 2003 also observed some stumbling blocks for Haskell learners: "The subtle syntax and sophisticated type system of Haskell are a double edged sword — highly appreciated by experienced programmers but also a source of frustration among beginners, since the generality of Haskell often leads to cryptic error messages."<ref>{{cite journal|first1=Bastiaan |last1=Heeren |first2=Daan |last2=Leijen |first3=Arjan |last3=van IJzendoorn|year=2003|title=Helium, for learning Haskell|journal=Proceedings of the 2003 [[Association for Computing Machinery|ACM]] [[SIGPLAN]] workshop on Haskell|url=http://www.cs.uu.nl/~bastiaan/heeren-helium.pdf}}</ref> To address these, they developed an advanced interpreter called [[Helium (Haskell)|Helium]] which improved the user-friendliness of error messages by limiting the generality of some Haskell features, and in particular removing support for [[type class]]es.
To load the model, we need a configuration class that specifies how we want the quantization to be performed. We’ll be using BitsAndBytesConfig to load our model in 4-bit format. This will reduce memory consumption considerably, at a cost of some accuracy.


== Conferences and workshops ==
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )


The Haskell community meets regularly for research and development activities. The primary events are:
5. Loading the Pre-Trained model
* [http://www.haskell.org/haskell-symposium/ The Haskell Symposium] (formerly the Haskell Workshop)
* [http://haskell.org/haskellwiki/HaskellImplementorsWorkshop The Haskell Implementors Workshop]
* [http://www.icfpconference.org/ The International Conference on Functional Programming]


Since 2006 there has been a series of organized "hackathons", the [http://haskell.org/haskellwiki/Hackathon Hac] series, aimed at improving the programming language tools and libraries<ref>http://haskell.org/haskellwiki/Hackathon</ref>:
Microsoft recently open-sourced the Phi-2, a Small Language Model(SLM) with 2.7 billion parameters. Here, we will use Phi-2 for the fine-tuning process. This language model exhibits remarkable reasoning and language understanding capabilities, achieving state-of-the-art performance among base language models.


*Ghent, Nov 2010
Let’s now load Phi-2 using 4-bit quantization from HuggingFace.
*Kiev, Oct 2010
*Australian Hackathon, Jul 2010
*Philadelphia, May 2010
*Zurich, Mar 2010
*Portland, OR, Sep 2009
*Edinburgh, Aug 2009
*Philadelphia, Jul 2009
*Utrecht, Apr 2009
*Leipzig, Apr 2008
*Göteborg, Apr 2008
*Freiburg, Oct 2007
*Oxford, Jan 2007
*Portland, Sep 2006


Since 2005, a growing number of [http://haskell.org/haskellwiki/User_groups Haskell User Groups] have formed, in the United States, Canada, Australia, South America, Europe and Asia.
model_name='microsoft/phi-2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name,  
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)


== References ==
The model is loaded in 4-bit using the `BitsAndBytesConfig` from the bitsandbytes library. This is a part of the QLoRA process, which involves quantizing the pre-trained weights of the model to 4-bit and keeping them fixed during fine-tuning.
{{reflist}}
6. Tokenization


== External links ==
Now, let’s configure the tokenizer, incorporating left-padding to optimize memory usage during training.
* {{official|http://haskell.org}}, HaskellWiki
* {{dmoz|Computers/Programming/Languages/Haskell|Haskell}}
* [http://portal.acm.org/citation.cfm?doid=1238844.1238856  A History of Haskell: being lazy with class]
* [http://www.willamette.edu/~fruehr/haskell/evolution.html The Evolution of a Haskell Programmer], slightly humorous overview of different programming styles available in Haskell
* [http://haskell.readscheme.org/ Online Bibliography of Haskell Research]
* [http://www.se-radio.net/podcast/2008-08/episode-108-simon-peyton-jones-functional-programming-and-haskell SE-Radio Podcast with Simon Peyton Jones on Haskell]
* [http://www.techworld.com.au/article/261007/-z_programming_languages_haskell?pp=1 Techworld interview on innovations of Haskell]
* [http://sequence.complete.org/ The Haskell Sequence], weekly news site
* [http://themonadreader.wordpress.com/ Monad Reader], quarterly magazine on Haskell topics
=== Tutorials ===
{{wikibooks|Haskell}}
{{wikibooks|Write Yourself a Scheme in 48 Hours}}
* [http://tryhaskell.org/ Try Haskell!], interactive tutorial, runs in browser
* [http://book.realworldhaskell.org/ Real World Haskell], fast-moving book focusing on practical examples, published with [[Creative Commons]] license
* [http://learnyouahaskell.com/ Learn You a Haskell For Great Good!], humorous introductory tutorial with illustrations
* [http://www.umiacs.umd.edu/~hal/docs/daume02yaht.pdf Yet Another Haskell Tutorial], by Hal Daume III; assumes far less prior knowledge than official tutorial
* [http://haskell.org/tutorial/ A Gentle Introduction to Haskell 98], more advanced tutorial, also available as [http://www.haskell.org/tutorial/haskell-98-tutorial.pdf pdf file]
* [http://cheatsheet.codeslower.com/ The Haskell Cheatsheet], compact language reference and mini-tutorial
* [http://www.doc.ic.ac.uk/teaching/distinguished-projects/2009/w.jones.pdf Warp speed Haskell]
* [http://haskell.org/sitewiki/images/8/85/TMR-Issue13.pdf The Typeclassopedia], about Haskell's type classes, start at p. 17 of pdf


[[Category:Haskell programming language family|*]]
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
[[Category:Functional languages]]
tokenizer.pad_token = tokenizer.eos_token
[[Category:Articles with example Haskell code]]
[[Category:Programming languages created in 1990]]
[[Category:Educational programming languages]]


[[bg:Haskell]]
7. Test the Model with Zero Shot Inferencing
[[ca:Haskell]]
 
[[cs:Haskell]]
We will evaluate the base model that we loaded above using a few sample inputs.
[[de:Haskell (Programmiersprache)]]
 
[[et:Haskell]]
%%time
[[el:Haskell]]
from transformers import set_seed
[[es:Haskell]]
seed = 42
[[eo:Haskell]]
set_seed(seed)
[[fa:هسکل (زبان برنامه‌نویسی)]]
 
[[fr:Haskell]]
index = 10
[[gl:Haskell]]
 
[[id:Haskell]]
prompt = dataset['test'][index]['dialogue']
[[ko:하스켈]]
summary = dataset['test'][index]['summary']
[[hr:Haskell (programski jezik)]]
 
[[it:Haskell (linguaggio)]]
formatted_prompt = f"Instruct: Summarize the following conversation.\n{prompt}\nOutput:\n"
[[he:Haskell]]
res = gen(original_model,formatted_prompt,100,)
[[la:Haskell]]
#print(res[0])
[[lv:Haskell]]
output = res[0].split('Output:\n')[1]
[[hu:Haskell]]
 
[[ms:Haskell]]
dash_line = '-'.join('' for x in range(100))
[[nl:Haskell (programmeertaal)]]
print(dash_line)
[[ja:Haskell]]
print(f'INPUT PROMPT:\n{formatted_prompt}')
[[pl:Haskell]]
print(dash_line)
[[pt:Haskell (linguagem de programação)]]
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
[[ro:Haskell]]
print(dash_line)
[[ru:Haskell]]
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')
[[sk:Haskell (programovací jazyk)]]
 
[[sl:Haskell]]
base model output
[[fi:Haskell]]
 
[[sv:Haskell]]
From the observation above, it’s evident that the model faces challenges in summarizing the dialogue compared to the baseline summary. However, it manages to extract essential information from the text, suggesting the potential for fine-tuning the model for the specific task at hand.
[[tg:Haskell]]
8. Pre-processing dataset
[[tr:Haskell]]
 
[[uk:Haskell]]
The dataset cannot be directly employed for fine-tuning. It is essential to format the prompt in a way that the model can comprehend. Referring to the HuggingFace model documentation, it is evident that a prompt needs to be generated using dialogue and summary in the specified format below.
[[zh:Haskell]]
Prompt Format
 
We’ll create some helper functions to format our input dataset, ensuring its suitability for the fine-tuning process. Here, we need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM.
 
def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"
   
    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"
   
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]
 
    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt
 
    return sample
 
The above function can be used to convert our input into prompt format.
 
Now, we will use our model tokenizer to process these prompts into tokenized ones.
 
Our aim here is to generate input sequences with consistent lengths, which is beneficial for fine-tuning the language model by optimizing efficiency and minimizing computational overhead. It is essential to ensure that these sequences do not surpass the model’s maximum token limit.
 
from functools import partial
 
# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length
 
 
def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )
 
# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
   
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
   
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )
 
    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
   
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)
 
    return dataset
 
By utilizing these functions, our dataset will be prepared for the fine-tuning process!
 
## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)
 
train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])
 
9. Preparing the model for QLoRA
 
# 2 - Using the prepare_model_for_kbit_training method from PEFT
# Preparing the Model for QLoRA
original_model = prepare_model_for_kbit_training(original_model)
 
Here, the model is prepared for QLoRA training using the `prepare_model_for_kbit_training()` function. This function initializes the model for QLoRA by setting up the necessary configurations.
10. Setup PEFT for Fine-Tuning
 
Let us now define the LoRA config for Fine-tuning the base model.
 
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
 
config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)
 
# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
original_model.gradient_checkpointing_enable()
 
peft_model = get_peft_model(original_model, config)
 
Note the rank (r) hyper-parameter, which defines the rank/dimension of the adapter to be trained. r is the rank of the low-rank matrix used in the adapters, which thus controls the number of parameters trained. A higher rank will allow for more expressivity, but there is a compute tradeoff.
 
alpha here is the scaling factor for the learned weights. The weight matrix is scaled by alpha/r, and thus a higher value for alpha assigns more weight to the LoRA activations.
 
Once everything is set up and the PEFT is prepared, we can use the print_trainable_parameters() helper function to see how many trainable parameters are in the model.
 
print(print_number_of_trainable_model_parameters(peft_model))
 
trainable parameters
11. Train PEFT Adapter
 
Define training arguments and create Trainer instance.
 
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
import transformers
 
peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=1000,
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)
 
peft_model.config.use_cache = False
 
peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
 
Here, we have used 1000 training steps. It seems to be good enough for our custom dataset. We need to try out different numbers before finalizing with training steps. Also, the hyperparameters used above might vary depending on the dataset/model we are trying to fine-tune. This is just to show the capability of fine-tuning.
 
Let’s start the training now. Training the model will take some time depending upon the hyperparameters used in TrainingArguments.
 
peft_trainer.train()
 
Once the model is trained successfully, we can use it for inference. Let’s now prepare the inference model by adding an adapter to the original Phi-2 model. Here, we are setting is_trainable=False because the plan is only to perform inference with this PEFT model.
 
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
 
base_model_id = "microsoft/phi-2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id,
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)
 
eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token
 
from peft import PeftModel
 
ft_model = PeftModel.from_pretrained(base_model, "/kaggle/working/peft-dialogue-summary-training-1705417060/checkpoint-1000",torch_dtype=torch.float16,is_trainable=False)
 
Fine-tuning is often an iterative process. Based on the validation and test sets results, we may need to make further adjustments to the model’s architecture, hyperparameters, or training data to improve its performance. Let’s now see how to evaluate the results of Fine-tuned LLM.
12. Evaluate the Model Qualitatively (Human Evaluation)
 
Now, let’s perform inference using the same input but with the PEFT model, as we did previously in step 7 with the original model.
 
%%time
from transformers import set_seed
set_seed(seed)
 
index = 5
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']
 
prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
 
peft_model_res = gen(ft_model,prompt,100,)
peft_model_output = peft_model_res[0].split('Output:\n')[1]
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')
 
dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')
 
PEFT model output
13. Evaluate the Model Quantitatively (with ROUGE Metric)
 
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.
 
Let’s now use the ROUGE metric to quantify the validity of summarizations produced by models. It compares summarizations to a “baseline” summary which is usually created by a human. While it’s not a perfect metric, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.
 
To demonstrate the capability of ROUGE Metric Evaluation we will use some sample inputs to evaluate.
 
original_model = AutoModelForCausalLM.from_pretrained(base_model_id,
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)
 
import pandas as pd
 
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']
 
original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []
 
for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
   
    original_model_res = gen(original_model,prompt,100,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]
   
    peft_model_res = gen(ft_model,prompt,100,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')
 
    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)
 
zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df
 
import evaluate
 
rouge = evaluate.load('rouge')
 
original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)
 
peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)
 
print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)
 
print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")
 
improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')
 
Rouge metric evaluation
 
As we can see in the above results, there is a significant improvement in the PEFT model as compared to the original model denoted in terms of percentage.
 
If you’d like to access the complete notebook, please refer to the repository below.
FineTune Phi-2 on Custom DataSet
Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources
 
www.kaggle.com
Conclusion
 
Fine-tuning Large Language Models (LLMs) has become essential for enterprises seeking to optimize their operational processes. While the initial training of LLMs imparts a broad language understanding, the fine-tuning process refines these models into specialized tools capable of handling specific topics and providing more accurate results. Tailoring LLMs for distinct tasks, industries, or datasets extends the capabilities of these models, ensuring their relevance and value in a dynamic digital landscape. Looking ahead, ongoing exploration and innovation in LLMs, coupled with refined fine-tuning methodologies, are poised to advance the development of smarter, more efficient, and contextually aware AI systems.
References
microsoft/phi-2 · Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
 
huggingface.
Fine-tuning large language models (LLMs) in 2024 | SuperAnnotate
Dive into LLM fine-tuning: its importance, types, methods, and best practices for optimizing language model…
 
www.superannotate.com
microsoft/phi-2 · How to fine-tune this? + Training code
I have tried fine-tuning the model with LoRA (peft) using the following target modules: 'lm_head.linear'…
 
huggingface.co
Phi-2: The surprising power of small language models
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training…
 
www.microsoft.com
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one…
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one use? Many papers use…
 
ai.stackexchange.com
LoRA
We're on a journey to advance and democratize artificial intelligence through open source and open science.
 
huggingface.co
ROUGE - a Hugging Face Space by evaluate-metric
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for…
 
huggingface.co
GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for…
Accessible large language models via k-bit quantization for PyTorch. - GitHub - TimDettmers/bitsandbytes: Accessible…
 
github.com
Llm
Genai
Machine Learning
Artificial Intelligence
</pre>
[[File:Linwinmac.jpg‎]]

Latest revision as of 09:21, 7 March 2024

#https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07
Fine Tune Large Language Model (LLM) on a Custom Dataset with QLoRA
Suman Das

Suman Das
·

Follow
15 min read
·
Jan 25, 2024

Fine-Tuning LLM

The field of natural language processing has been revolutionized by large language models (LLMs), which showcase advanced capabilities and sophisticated solutions. Trained on extensive text datasets, these models excel in tasks like text generation, translation, summarization, and question-answering. Despite their power, LLMs may not always align with specific tasks or domains.

In this tutorial, we will explore how fine-tuning LLMs can significantly improve model performance, reduce training costs, and enable more accurate and context-specific results.
What is LLM Fine-tuning?

Fine-tuning LLM involves the additional training of a pre-existing model, which has previously acquired patterns and features from an extensive dataset, using a smaller, domain-specific dataset. In the context of “LLM Fine-Tuning,” LLM denotes a “Large Language Model,” such as the GPT series by OpenAI. This approach holds significance as training a large language model from the ground up is highly resource-intensive in terms of both computational power and time. Utilizing the existing knowledge embedded in the pre-trained model allows for achieving high performance on specific tasks with substantially reduced data and computational requirements.

Below are some of the key steps involved in LLM Fine-tuning:

    Select a pre-trained model: For LLM Fine-tuning first step is to carefully select a base pre-trained model that aligns with our desired architecture and functionalities. Pre-trained models are generic purpose models that have been trained on a large corpus of unlabeled data.
    Gather relevant Dataset: Then we need to gather a dataset that is relevant to our task. The dataset should be labeled or structured in a way that the model can learn from it.
    Preprocess Dataset: Once the dataset is ready, we need to do some preprocessing for fine-tuning by cleaning it, splitting it into training, validation, and test sets, and ensuring it’s compatible with the model on which we want to fine-tune.
    Fine-tuning: After selecting a pre-trained model we need to fine tune it on our preprocessed relevant dataset which is more specific to the task at hand. The dataset which we will select might be related to a particular domain or application, allowing the model to adapt and specialize for that context.
    Task-specific adaptation: During fine-tuning, the model’s parameters are adjusted based on the new dataset, helping it better understand and generate content relevant to the specific task. This process retains the general language knowledge gained during pre-training while tailoring the model to the nuances of the target domain.

Fine-tuning LLMs is commonly used in natural language processing tasks such as sentiment analysis, named entity recognition, summarization, translation, or any other application where understanding context and generating coherent language is crucial. It helps leverage the knowledge encoded in pre-trained models for more specialized and domain-specific tasks.
Fine-tuning methods

Fine-tuning a Large Language Model (LLM) involves a supervised learning process. In this method, a dataset comprising labeled examples is utilized to adjust the model’s weights, enhancing its proficiency in specific tasks. Now, let’s delve into some noteworthy techniques employed in the fine-tuning process.

    Full Fine Tuning (Instruction fine-tuning): Instruction fine-tuning is a strategy to enhance a model’s performance across various tasks by training it on examples that guide its responses to queries. The choice of the dataset is crucial and tailored to the specific task, such as summarization or translation. This approach, known as full fine-tuning, updates all model weights, creating a new version with improved capabilities. However, it demands sufficient memory and computational resources, similar to pre-training, to handle the storage and processing of gradients, optimizers, and other components during training.
    Parameter Efficient Fine-Tuning (PEFT) is a form of instruction fine-tuning that is much more efficient than full fine-tuning. Training a language model, especially for full LLM fine-tuning, demands significant computational resources. Memory allocation is not only required for storing the model but also for essential parameters during training, presenting a challenge for simple hardware. PEFT addresses this by updating only a subset of parameters, effectively “freezing” the rest. This reduces the number of trainable parameters, making memory requirements more manageable and preventing catastrophic forgetting. Unlike full fine-tuning, PEFT maintains the original LLM weights, avoiding the loss of previously learned information. This approach proves beneficial for handling storage issues when fine-tuning for multiple tasks. There are various ways of achieving Parameter efficient fine-tuning. Low-Rank Adaptation LoRA & QLoRA are the most widely used and effective.

What is LoRa?

LoRA is an improved finetuning method where instead of finetuning all the weights that constitute the weight matrix of the pre-trained large language model, two smaller matrices that approximate this larger matrix are fine-tuned. These matrices constitute the LoRA adapter. This fine-tuned adapter is then loaded into the pre-trained model and used for inference.

After LoRA fine-tuning for a specific task or use case, the outcome is an unchanged original LLM and the emergence of a considerably smaller “LoRA adapter,” often representing a single-digit percentage of the original LLM size (in MBs rather than GBs).

During inference, the LoRA adapter must be combined with its original LLM. The advantage lies in the ability of many LoRA adapters to reuse the original LLM, thereby reducing overall memory requirements when handling multiple tasks and use cases.
What is Quantized LoRA (QLoRA)?

QLoRA represents a more memory-efficient iteration of LoRA. QLoRA takes LoRA a step further by also quantizing the weights of the LoRA adapters (smaller matrices) to lower precision (e.g., 4-bit instead of 8-bit). This further reduces the memory footprint and storage requirements. In QLoRA, the pre-trained model is loaded into GPU memory with quantized 4-bit weights, in contrast to the 8-bit used in LoRA. Despite this reduction in bit precision, QLoRA maintains a comparable level of effectiveness to LoRA.

In this tutorial, we will use Parameter-efficient fine-tuning with QLoRA.

Now let’s explore how we can fine-tune LLM on a custom dataset using QLoRA on a single GPU.

    Setting up the NoteBook
    Install required libraries
    Loading dataset
    Create Bitsandbytes configuration
    Loading the Pre-Trained model
    Tokenization
    Test the Model with Zero Shot Inferencing
    Pre-processing dataset
    Preparing the model for QLoRA
    Setup PEFT for Fine-Tuning
    Train PEFT Adapter
    Evaluate the Model Qualitatively (Human Evaluation)
    Evaluate the Model Quantitatively (with ROUGE Metric)

1. Setting up the NoteBook.

While we will utilize a Kaggle notebook for this demonstration, feel free to use any Jupyter notebook environment. Kaggle offers a generous allowance of 30 hours of free GPU usage per week, which is ample for our experimentation. To begin, let’s open a new notebook, establish some headings, and then proceed to connect to the runtime.
notebook-with-headings

Here, we will select the GPU P100 as the ACCELERATOR. Feel free to try other GPU options available in Kaggle or any other environment.

In this tutorial, we will be using HuggingFace libraries to download and train the model. To download models from HuggingFace, we will need an Access Token. If you’ve already signed up with HuggingFace, you can generate a new Access Token from the settings section or use any existing Access Token.
2. Install required libraries

Now, let’s install the necessary libraries for this experiment.

!pip install -q -U bitsandbytes transformers peft accelerate datasets scipy einops evaluate trl rouge_score

Let’s understand the importance of some of these libraries.

    Bitsandbytes: An excellent package that provides a lightweight wrapper around custom CUDA functions that make LLMs go faster — optimizers, matrix multiplication, and quantization. In this tutorial, we’ll be using this library to load our model as efficiently as possible.
    transformers: A library by Hugging Face (🤗) that provides pre-trained models and training utilities for various natural language processing tasks.
    peft: A library by Hugging Face (🤗) that enables parameter-efficient fine-tuning.
    accelerate: Accelerate abstracts exactly and only the boilerplate code related to multi-GPUs/TPU/fp16 and leave the rest of your code unchanged.
    datasets: Another library by Hugging Face (🤗) that provides easy access to a wide range of datasets.
    einops: A library that simplifies tensor operations.

Loading the required libraries

from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login

interpreter_login()

For this tutorial we are not going to track our training metrics, so let’s disable Weights and Biases. The W&B Platform constitutes a fundamental collection of robust components for monitoring, visualizing data and models, and conveying the results. To deactivate Weights and Biases during the fine-tuning process, set the below environment property.

import os
# disable Weights and Biases
os.environ['WANDB_DISABLED']="true"

If you have an account with Weights and Biases, feel free to enable it and experiment with it.
3. Loading dataset

Numerous datasets are available for fine-tuning the model. In this instance, we will utilize the DialogSum DataSet from HuggingFace for the fine-tuning process. DialogSum is an extensive dialogue summarization dataset, featuring 13,460 dialogues along with manually labeled summaries and topics.

There is no specific reason for selecting this dataset. Feel free to try this experiment with any custom dataset.

Let’s execute the below code to load the above dataset from HuggingFace.

huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name)

Once the dataset is loaded, we can take a look at it to understand what it contains:
a sample row of the dataset

It contains the below fields.

    dialogue: text of the dialogue.
    summary: human-written summary of the dialogue.
    topic: human written topic/one-liner of the dialogue.
    id: unique file id of an example.

4. Create Bitsandbytes configuration

To load the model, we need a configuration class that specifies how we want the quantization to be performed. We’ll be using BitsAndBytesConfig to load our model in 4-bit format. This will reduce memory consumption considerably, at a cost of some accuracy.

compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

5. Loading the Pre-Trained model

Microsoft recently open-sourced the Phi-2, a Small Language Model(SLM) with 2.7 billion parameters. Here, we will use Phi-2 for the fine-tuning process. This language model exhibits remarkable reasoning and language understanding capabilities, achieving state-of-the-art performance among base language models.

Let’s now load Phi-2 using 4-bit quantization from HuggingFace.

model_name='microsoft/phi-2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name, 
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

The model is loaded in 4-bit using the `BitsAndBytesConfig` from the bitsandbytes library. This is a part of the QLoRA process, which involves quantizing the pre-trained weights of the model to 4-bit and keeping them fixed during fine-tuning.
6. Tokenization

Now, let’s configure the tokenizer, incorporating left-padding to optimize memory usage during training.

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token

7. Test the Model with Zero Shot Inferencing

We will evaluate the base model that we loaded above using a few sample inputs.

%%time
from transformers import set_seed
seed = 42
set_seed(seed)

index = 10

prompt = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

formatted_prompt = f"Instruct: Summarize the following conversation.\n{prompt}\nOutput:\n"
res = gen(original_model,formatted_prompt,100,)
#print(res[0])
output = res[0].split('Output:\n')[1]

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

base model output

From the observation above, it’s evident that the model faces challenges in summarizing the dialogue compared to the baseline summary. However, it manages to extract essential information from the text, suggesting the potential for fine-tuning the model for the specific task at hand.
8. Pre-processing dataset

The dataset cannot be directly employed for fine-tuning. It is essential to format the prompt in a way that the model can comprehend. Referring to the HuggingFace model documentation, it is evident that a prompt needs to be generated using dialogue and summary in the specified format below.
Prompt Format

We’ll create some helper functions to format our input dataset, ensuring its suitability for the fine-tuning process. Here, we need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM.

def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"
    
    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"
    
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt

    return sample

The above function can be used to convert our input into prompt format.

Now, we will use our model tokenizer to process these prompts into tokenized ones.

Our aim here is to generate input sequences with consistent lengths, which is beneficial for fine-tuning the language model by optimizing efficiency and minimizing computational overhead. It is essential to ensure that these sequences do not surpass the model’s maximum token limit.

from functools import partial

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

By utilizing these functions, our dataset will be prepared for the fine-tuning process!

## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)

train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])

9. Preparing the model for QLoRA

# 2 - Using the prepare_model_for_kbit_training method from PEFT
# Preparing the Model for QLoRA
original_model = prepare_model_for_kbit_training(original_model)

Here, the model is prepared for QLoRA training using the `prepare_model_for_kbit_training()` function. This function initializes the model for QLoRA by setting up the necessary configurations.
10. Setup PEFT for Fine-Tuning

Let us now define the LoRA config for Fine-tuning the base model.

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
original_model.gradient_checkpointing_enable()

peft_model = get_peft_model(original_model, config)

Note the rank (r) hyper-parameter, which defines the rank/dimension of the adapter to be trained. r is the rank of the low-rank matrix used in the adapters, which thus controls the number of parameters trained. A higher rank will allow for more expressivity, but there is a compute tradeoff.

alpha here is the scaling factor for the learned weights. The weight matrix is scaled by alpha/r, and thus a higher value for alpha assigns more weight to the LoRA activations.

Once everything is set up and the PEFT is prepared, we can use the print_trainable_parameters() helper function to see how many trainable parameters are in the model.

print(print_number_of_trainable_model_parameters(peft_model))

trainable parameters
11. Train PEFT Adapter

Define training arguments and create Trainer instance.

output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
import transformers

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=1000,
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

Here, we have used 1000 training steps. It seems to be good enough for our custom dataset. We need to try out different numbers before finalizing with training steps. Also, the hyperparameters used above might vary depending on the dataset/model we are trying to fine-tune. This is just to show the capability of fine-tuning.

Let’s start the training now. Training the model will take some time depending upon the hyperparameters used in TrainingArguments.

peft_trainer.train()

Once the model is trained successfully, we can use it for inference. Let’s now prepare the inference model by adding an adapter to the original Phi-2 model. Here, we are setting is_trainable=False because the plan is only to perform inference with this PEFT model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "microsoft/phi-2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "/kaggle/working/peft-dialogue-summary-training-1705417060/checkpoint-1000",torch_dtype=torch.float16,is_trainable=False)

Fine-tuning is often an iterative process. Based on the validation and test sets results, we may need to make further adjustments to the model’s architecture, hyperparameters, or training data to improve its performance. Let’s now see how to evaluate the results of Fine-tuned LLM.
12. Evaluate the Model Qualitatively (Human Evaluation)

Now, let’s perform inference using the same input but with the PEFT model, as we did previously in step 7 with the original model.

%%time
from transformers import set_seed
set_seed(seed)

index = 5
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"

peft_model_res = gen(ft_model,prompt,100,)
peft_model_output = peft_model_res[0].split('Output:\n')[1]
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')

PEFT model output
13. Evaluate the Model Quantitatively (with ROUGE Metric)

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.

Let’s now use the ROUGE metric to quantify the validity of summarizations produced by models. It compares summarizations to a “baseline” summary which is usually created by a human. While it’s not a perfect metric, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.

To demonstrate the capability of ROUGE Metric Evaluation we will use some sample inputs to evaluate.

original_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

import pandas as pd

dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
    
    original_model_res = gen(original_model,prompt,100,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]
    
    peft_model_res = gen(ft_model,prompt,100,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df

import evaluate

rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Rouge metric evaluation

As we can see in the above results, there is a significant improvement in the PEFT model as compared to the original model denoted in terms of percentage.

If you’d like to access the complete notebook, please refer to the repository below.
FineTune Phi-2 on Custom DataSet
Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

www.kaggle.com
Conclusion

Fine-tuning Large Language Models (LLMs) has become essential for enterprises seeking to optimize their operational processes. While the initial training of LLMs imparts a broad language understanding, the fine-tuning process refines these models into specialized tools capable of handling specific topics and providing more accurate results. Tailoring LLMs for distinct tasks, industries, or datasets extends the capabilities of these models, ensuring their relevance and value in a dynamic digital landscape. Looking ahead, ongoing exploration and innovation in LLMs, coupled with refined fine-tuning methodologies, are poised to advance the development of smarter, more efficient, and contextually aware AI systems.
References
microsoft/phi-2 · Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.
Fine-tuning large language models (LLMs) in 2024 | SuperAnnotate
Dive into LLM fine-tuning: its importance, types, methods, and best practices for optimizing language model…

www.superannotate.com
microsoft/phi-2 · How to fine-tune this? + Training code
I have tried fine-tuning the model with LoRA (peft) using the following target modules: 'lm_head.linear'…

huggingface.co
Phi-2: The surprising power of small language models
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training…

www.microsoft.com
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one…
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one use? Many papers use…

ai.stackexchange.com
LoRA
We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co
ROUGE - a Hugging Face Space by evaluate-metric
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for…

huggingface.co
GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for…
Accessible large language models via k-bit quantization for PyTorch. - GitHub - TimDettmers/bitsandbytes: Accessible…

github.com
Llm
Genai
Machine Learning
Artificial Intelligence

Linwinmac.jpg