7.4 C
New Jersey
Wednesday, October 16, 2024

Import a query answering fine-tuned mannequin into Amazon Bedrock as a customized mannequin


Amazon Bedrock is a completely managed service that gives a selection of high-performing basis fashions (FMs) from main AI corporations like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon by means of a single API, together with a broad set of capabilities to construct generative AI purposes with safety, privateness, and accountable AI.

Widespread generative AI use circumstances, together with however not restricted to chatbots, digital assistants, conversational search, and agent assistants, use FMs to supply responses. Retrieval Increase Technology (RAG) is a method to optimize the output of FMs by offering context across the questions for these use circumstances. Effective-tuning the FM is advisable to additional optimize the output to comply with the model and business voice or vocabulary.

Customized Mannequin Import for Amazon Bedrock, in preview now, lets you import custom-made FMs created in different environments, resembling Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2) cases, and on premises, into Amazon Bedrock. This submit is a part of a collection that demonstrates varied structure patterns for importing fine-tuned FMs into Amazon Bedrock.

On this submit, we offer a step-by-step strategy of fine-tuning a Mistral mannequin utilizing SageMaker and import it into Amazon Bedrock utilizing the Customized Import Mannequin characteristic. We use the OpenOrca dataset to fine-tune the Mistral mannequin and use the SageMaker FMEval library to guage the fine-tuned mannequin imported into Amazon Bedrock.

Key Options

Among the key options of Customized Mannequin Import for Amazon Bedrock are:

  1. This characteristic lets you deliver your fine-tuned fashions and leverage the totally managed serverless capabilities of Amazon Bedrock
  2. At present we’re supporting Llama 2, Llama 3, Flan, Mistral Mannequin architectures utilizing this characteristic with a precisions of FP32, FP16 and BF16 with additional quantizations coming quickly.
  3. To leverage this characteristic you’ll be able to run the import course of (lined later within the weblog) along with your mannequin weights being in Amazon Easy Storage Service (Amazon S3).
  4. You may even leverage your fashions created utilizing Amazon SageMaker by referencing the Amazon SageMaker mannequin Amazon Useful resource Names (ARN) which supplies for a seamless integration with SageMaker.
  5. Amazon Bedrock will routinely scale your mannequin as your site visitors sample will increase and when not in use, scale your mannequin all the way down to 0 thus lowering your prices.

Allow us to dive right into a use-case and see how simple it’s to make use of this characteristic.

Answer overview

On the time of writing, the Customized Mannequin Import characteristic in Amazon Bedrock helps fashions following the architectures and patterns within the following determine.

On this submit, we stroll by means of the next high-level steps:

  1. Effective-tune the mannequin utilizing SageMaker.
  2. Import the fine-tuned mannequin into Amazon Bedrock.
  3. Check the imported mannequin.
  4. Consider the imported mannequin utilizing the FMEval library.

The next diagram illustrates the answer structure.

The method consists of the next steps:

  1. We use a SageMaker coaching job to fine-tune the mannequin utilizing a SageMaker JupyterLab pocket book. This coaching job reads the dataset from Amazon Easy Storage Service (Amazon S3) and writes the mannequin again into Amazon S3. This mannequin will then be imported into Amazon Bedrock.
  2. To import the fine-tuned mannequin, you need to use the Amazon Bedrock console, the Boto3 library, or APIs.
  3. An import job orchestrates the method to import the mannequin and make the mannequin accessible from the client account.
    1. The import job copies all of the mannequin artifacts from the person’s account into an AWS managed S3 bucket.
  4. When the import job is full, the fine-tuned mannequin is made accessible for invocation out of your AWS account.
  5. We use the SageMaker FMEval library in a SageMaker pocket book to guage the imported mannequin.

The copied mannequin artifacts will stay within the Amazon Bedrock account till the customized imported mannequin is deleted from Amazon Bedrock. Deleting mannequin artifacts in your AWS account S3 bucket doesn’t delete the mannequin or the associated artifacts within the Amazon Bedrock managed account. You may delete an imported mannequin from Amazon Bedrock together with all of the copied artifacts utilizing both the Amazon Bedrock console, Boto3 library, or APIs.

Moreover, all information (together with the mannequin) stays inside the chosen AWS Area. The mannequin artifacts are imported into the AWS operated deployment account utilizing a digital non-public cloud (VPC) endpoint, and you’ll encrypt your mannequin information utilizing an AWS Key Administration Service (AWS KMS) buyer managed key.

Within the following sections, we dive deep into every of those steps to deploy, take a look at, and consider the mannequin.

Conditions

We use Mistral-7B-v0.3 on this submit as a result of it makes use of an prolonged vocabulary in comparison with its prior model produced by Mistral AI. This mannequin is simple to fine-tune, and Mistral AI has offered instance fine-tuned fashions. We use Mistral for this use case as a result of this mannequin helps a 32,000-token context capability and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Combination of Consultants (MoE) characteristic, it might obtain increased accuracy for buyer help use circumstances.

Mistral-7B-v0.3 is a gated mannequin on the Hugging Face mannequin repository. You should overview the phrases and situations and request entry to the mannequin by submitting your particulars.

We use Amazon SageMaker Studio to preprocess the info and fine-tune the Mistral mannequin utilizing a SageMaker coaching job. To arrange SageMaker Studio, seek advice from Launch Amazon SageMaker Studio. Discuss with the SageMaker JupyterLab documentation to arrange and launch a JupyterLab pocket book. You’ll submit a SageMaker coaching job to fine-tune the Mistral mannequin from the SageMaker JupyterLab pocket book, which may discovered on the GitHub repo.

Effective-tune the mannequin utilizing QLoRA

To fine-tune the Mistral mannequin, we apply QLoRA and Parameter-Environment friendly Effective-Tuning (PEFT) optimization strategies. Within the offered pocket book, you utilize the Totally Sharded Information Parallel (FSDP) PyTorch API to carry out distributed mannequin tuning. You employ supervised fine-tuning (SFT) to fine-tune the Mistral mannequin.

Put together the dataset

Step one within the fine-tuning course of is to arrange and format the dataset. After you remodel the dataset into the Mistral Default Instruct format, you add it as a JSONL file into the S3 bucket utilized by the SageMaker session, as proven within the following code:

# Load dataset from the hub
dataset = load_dataset("Open-Orca/OpenOrca")
flan_dataset = dataset.filter(lambda instance, indice: "flan" in instance["id"], with_indices=True)
flan_dataset = flan_dataset["train"].train_test_split(test_size=0.01, train_size=0.035)

columns_to_remove = checklist(dataset["train"].options)
flan_dataset = flan_dataset.map(create_conversation, remove_columns=columns_to_remove, batched=False)

# save datasets to s3
flan_dataset["train"].to_json(f"{training_input_path}/train_dataset.json", orient="data", force_ascii=False)
flan_dataset["test"].to_json(f"{training_input_path}/test_dataset.json", orient="data", force_ascii=False)

You remodel the dataset into Mistral Default Instruct format inside the SageMaker coaching job as instructed within the coaching script (run_fsdp_qlora.py):

    ################
    # Dataset
    ################
    
    train_dataset = load_dataset(
        "json",
        data_files=os.path.be a part of(script_args.dataset_path, "train_dataset.json"),
        break up="practice",
    )
    test_dataset = load_dataset(
        "json",
        data_files=os.path.be a part of(script_args.dataset_path, "test_dataset.json"),
        break up="practice",
    )

    ################
    # Mannequin & Tokenizer
    ################

    # Tokenizer        
    tokenizer = AutoTokenizer.from_pretrained(script_args.model_id, use_fast=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.chat_template = MISTRAL_CHAT_TEMPLATE
    
    # template dataset
    def template_dataset(examples):
        return{"textual content":  tokenizer.apply_chat_template(examples["messages"], tokenize=False)}
    
    train_dataset = train_dataset.map(template_dataset, remove_columns=["messages"])
    test_dataset = test_dataset.map(template_dataset, remove_columns=["messages"])

Optimize fine-tuning utilizing QLoRA

You optimize your fine-tuning utilizing QLoRA and with the precision offered as enter into the coaching script as SageMaker coaching job parameters. QLoRA is an environment friendly fine-tuning strategy that reduces reminiscence utilization to fine-tune a 65-billion-parameter mannequin on a single 48 GB GPU, preserving the total 16-bit fine-tuning process efficiency. On this pocket book, you utilize the bitsandbytes library to arrange quantization configurations, as proven within the following code:

    # Mannequin    
    torch_dtype = torch.bfloat16 if training_args.bf16 else torch.float32
    quant_storage_dtype = torch.bfloat16

    if script_args.use_qlora:
        print(f"Utilizing QLoRA - {torch_dtype}")
        quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch_dtype,
                bnb_4bit_quant_storage=quant_storage_dtype,
            )
    else:
        quantization_config = None

You employ the LoRA config based mostly on the QLoRA paper and Sebastian Raschka experiment, as proven within the following code. Two key factors to think about from the Raschka experiment are that QLoRA provides 33% reminiscence financial savings at the price of an 39% enhance in runtime, and to ensure LoRA is utilized to all layers to maximise mannequin efficiency.

################
# PEFT
################
# LoRA config based mostly on QLoRA paper & Sebastian Raschka experiment
peft_config = LoraConfig(
    lora_alpha=8,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
    )

You employ SFTTrainer to fine-tune the Mistral mannequin:

    ################
    # Coaching
    ################
    coach = SFTTrainer(
        mannequin=mannequin,
        args=training_args,
        train_dataset=train_dataset,
        dataset_text_field="textual content",
        eval_dataset=test_dataset,
        peft_config=peft_config,
        max_seq_length=script_args.max_seq_length,
        tokenizer=tokenizer,
        packing=True,
        dataset_kwargs={
            "add_special_tokens": False,  # We template with particular tokens
            "append_concat_token": False,  # No want so as to add further separator token
        },
    )

On the time of writing, solely merged adapters are supported utilizing the Customized Mannequin Import characteristic for Amazon Bedrock. Let’s have a look at methods to merge the adapter with the bottom mannequin subsequent.

Merge the adapters

Adapters are new modules added between layers of a pre-trained community. Creation of those new modules is feasible by back-propagating gradients by means of a frozen, 4-bit quantized pre-trained language mannequin into low-rank adapters within the fine-tuning course of. To import the Mistral mannequin into Amazon Bedrock, the adapters have to be merged with the bottom mannequin and saved in Safetensors format. Use the next code to merge the mannequin adapters and save them in Safetensors format:

        # load PEFT mannequin in fp16
        mannequin = AutoPeftModelForCausalLM.from_pretrained(
            training_args.output_dir,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16
        )
        # Merge LoRA and base mannequin and save
        mannequin = mannequin.merge_and_unload()
        mannequin.save_pretrained(
            sagemaker_save_dir, safe_serialization=True, max_shard_size="2GB"
        )

To import the Mistral mannequin into Amazon Bedrock, the mannequin must be in an uncompressed listing inside an S3 bucket accessible by the Amazon Bedrock service function used within the import job.

Import the fine-tuned mannequin into Amazon Bedrock

Now that you’ve got fine-tuned the mannequin, you’ll be able to import the mannequin into Amazon Bedrock. On this part, we exhibit methods to import the mannequin utilizing the Amazon Bedrock console or the SDK.

Import the mannequin utilizing the Amazon Bedrock console

To import the mannequin utilizing the Amazon Bedrock console, see Import a mannequin with Customized Mannequin Import. You employ the Import mannequin web page as proven within the following screenshot to import the mannequin from the S3 bucket.

After you efficiently import the fine-tuned mannequin, you’ll be able to see the mannequin listed on the Amazon Bedrock console.

Import the mannequin utilizing the SDK

The AWS Boto3 library helps importing customized fashions into Amazon Bedrock. You need to use the next code to import a fine-tuned mannequin from inside the pocket book into Amazon Bedrock. That is an asynchronous technique.

import boto3
import datetime
br_client = boto3.consumer('bedrock', region_name="")
pt_model_nm = ""
pt_imp_jb_nm = f"{pt_model_nm}-{datetime.datetime.now().strftime('%YpercentmpercentdpercentMpercentHpercentS')}"
role_arn = "<>"
pt_model_src = {"s3DataSource": {"s3Uri": f"{pt_pubmed_model_s3_path}"}}
resp = br_client.create_model_import_job(jobName=pt_imp_jb_nm,
                                  importedModelName=pt_model_nm,
                                  roleArn=role_arn,
                                  modelDataSource=pt_model_src)

Check the imported mannequin

Now that you’ve got imported the fine-tuned mannequin into Amazon Bedrock, you’ll be able to take a look at the mannequin. On this part, we exhibit methods to take a look at the mannequin utilizing the Amazon Bedrock console or the SDK.

Check the mannequin on the Amazon Bedrock console

You may take a look at the imported mannequin utilizing an Amazon Bedrock playground, as illustrated within the following screenshot.

Check the mannequin utilizing the SDK

You can even use the Amazon Bedrock Invoke Mannequin API to run the fine-tuned imported mannequin, as proven within the following code:

consumer = boto3.consumer("bedrock-runtime", region_name="us-west-2")
model_id = "<>"


def call_invoke_model_and_print(native_request):
    request = json.dumps(native_request)

    attempt:
        # Invoke the mannequin with the request.
        response = consumer.invoke_model(modelId=model_id, physique=request)
        model_response = json.masses(response["body"].learn())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    besides (ClientError, Exception) as e:
        print(f"ERROR: Cannot invoke '{model_id}'. Cause: {e}")
        exit(1)

immediate = "will there be a season 5 of shadowhunters"
formatted_prompt = f"[INST] {immediate} [/INST]"
native_request = {
"immediate": formatted_prompt,
"max_tokens": 64,
"top_p": 0.9,
"temperature": 0.91
}
call_invoke_model_and_print(native_request)

The customized Mistral mannequin that you simply imported utilizing Amazon Bedrock helps temperature, top_p, and max_gen_len parameters when invoking the mannequin for inferencing. The inference parameters top_k, max_seq_len, max_batch_size, and max_new_tokens will not be supported for a customized Mistral fine-tuned mannequin.

Consider the imported mannequin

Now that you’ve got imported and examined the mannequin, let’s consider the imported mannequin utilizing the SageMaker FMEval library. For extra particulars, seek advice from Consider Bedrock Imported Fashions. To guage the query answering process, we use the metrics F1 Rating, Actual Match Rating, Quasi Actual Match Rating, Precision Over Phrases, and Recall Over Phrases. The important thing metrics for the query answering duties are Actual Match, Quasi-Actual Match, and F1 over phrases evaluated by evaluating the mannequin predicted solutions towards the bottom fact solutions. The FMEval library helps out-of-the-box analysis algorithms for metrics resembling accuracy, QA Accuracy, and others detailed within the FMEval documentation. Since you fine-tuned the Mistral mannequin for query answering, you need to use the QA Accuracy algorithm, as proven within the following code. The FMEval library helps these metrics for the QA Accuracy algorithm.

config = DataConfig(
    dataset_name="trex_sample",
    dataset_uri="information/test_dataset.json",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="query",
    target_output_location="reply"
)
bedrock_model_runner = BedrockModelRunner(
    model_id=model_id,
    output="outputs[0].textual content",
    content_template="{"immediate": $immediate, "max_tokens": 500}",
)

eval_algo = QAAccuracy()
eval_output = eval_algo.consider(mannequin=bedrock_model_runner, dataset_config=config, 
                                    prompt_template="[INST]$model_input[/INST]", save=True)

You may get the consolidated metrics for the imported mannequin as follows:

for op in eval_output:
    print(f"Eval Identify: {op.eval_name}")
    for rating in op.dataset_scores:
        print(f"{rating.title} : {rating.worth}")

Clear up

To delete the imported mannequin from Amazon Bedrock, navigate to the mannequin on the Amazon Bedrock console. On the choices menu (three dots), select Delete.

To delete the SageMaker area together with the SageMaker JupyterLab area, seek advice from Delete an Amazon SageMaker area. You might also need to delete the S3 buckets the place the info and mannequin are saved. For directions, see Deleting a bucket.

Conclusion

On this submit, we defined the totally different points of fine-tuning a Mistral mannequin utilizing SageMaker, importing the mannequin into Amazon Bedrock, invoking the mannequin utilizing each an Amazon Bedrock playground and Boto3, after which evaluating the imported mannequin utilizing the FMEval library. You need to use this characteristic to import base FMs or FMs fine-tuned both on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the fashions with none heavy lifting in your generative AI purposes. Discover the Customized Mannequin Import characteristic for Amazon Bedrock to deploy FMs fine-tuned for code technology duties in a safe and scalable method. Go to our GitHub repository to discover samples ready for fine-tuning and importing fashions from varied households.


In regards to the Authors

Jay Pillai is a Principal Options Architect at Amazon Net Providers. On this function, he features because the Lead Architect, serving to companions ideate, construct, and launch Companion Options. As an Data Know-how Chief, Jay focuses on synthetic intelligence, generative AI, information integration, enterprise intelligence, and person interface domains. He holds 23 years of in depth expertise working with a number of shoppers throughout provide chain, authorized applied sciences, actual property, monetary providers, insurance coverage, funds, and market analysis enterprise domains.

Rupinder Grewal is a Senior AI/ML Specialist Options Architect with AWS. He presently focuses on serving of fashions and MLOps on Amazon SageMaker. Previous to this function, he labored as a Machine Studying Engineer constructing and internet hosting fashions. Exterior of labor, he enjoys enjoying tennis and biking on mountain trails.

Evandro Franco is a Sr. AI/ML Specialist Options Architect at Amazon Net Providers. He helps AWS clients overcome enterprise challenges associated to AI/ML on prime of AWS. He has greater than 18 years of expertise working with expertise, from software program improvement, infrastructure, serverless, to machine studying.

Felipe Lopez is a Senior AI/ML Specialist Options Architect at AWS. Previous to becoming a member of AWS, Felipe labored with GE Digital and SLB, the place he targeted on modeling and optimization merchandise for industrial purposes.

Sandeep Singh is a Senior Generative AI Information Scientist at Amazon Net Providers, serving to companies innovate with generative AI. He focuses on generative AI, synthetic intelligence, machine studying, and system design. He’s keen about creating state-of-the-art AI/ML-powered options to unravel advanced enterprise issues for numerous industries, optimizing effectivity and scalability.

Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock, the place he has had the privilege to take heed to buyer wants first-hand and understands what it takes to construct and launch scalable and safe Gen AI merchandise. Previous to Bedrock, he labored on quite a few merchandise in Amazon, starting from units to Adverts to Robotics.

Paras Mehra is a Senior Product Supervisor at AWS. He’s targeted on serving to construct Amazon SageMaker Coaching and Processing. In his spare time, Paras enjoys spending time together with his household and highway biking across the Bay Space.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

237FansLike
121FollowersFollow
17FollowersFollow

Latest Articles