This publish has been co-written with Seunghyun Jeong, Sunwoo Lee and Eric Davis from SK Telecom.
SK Telecom (SKT), South Korea’s main telecommunications firm serving 30 million clients, is on the forefront of AI innovation. In keeping with its AI Pyramid Technique, which goals to unlock AI’s potential for anybody, wherever, anytime, SKT has collaborated with the AWS Generative AI Innovation Middle (GenAIIC) Customized Mannequin Program to discover domain-trained fashions utilizing Amazon Bedrock for telco-specific use instances.
This collaboration aligns with SKT’s imaginative and prescient of utilizing AI experience and strategic partnerships to develop revolutionary AI-based services. One such initiative centered on creating a customized answer for grounded query answering (Q&A) based mostly on reference paperwork.
Retrieval Augmented Technology (RAG) is a well-liked approach for Q&A duties, providing improved factual accuracy and information grounding. Nevertheless, RAG faces challenges with producing a response not matching most popular tone, model, and manners for telco use instances, in addition to retrieving irrelevant paperwork, doubtlessly resulting in inaccurate responses. To handle this, SKT and AWS GenAIIC aimed to make use of mannequin customization to enhance Anthropic Claude fashions on Amazon Bedrock in three key areas:
- Offering concise and informative solutions
- Appropriately referencing hyperlinks from retrieved paperwork
- Answering in a tone and magnificence in step with SKT and just like floor fact solutions
Moreover, the group explored boosting smaller mannequin efficiency utilizing artificial knowledge generated by larger massive language fashions (LLMs) for information distillation and eventualities with restricted labeled coaching knowledge.
Amazon Bedrock is a totally managed service that provides quite a lot of LLMs and basis fashions (FMs) together with capabilities akin to Amazon Bedrock Data Bases, Amazon Bedrock Brokers, and Amazon Bedrock Guardrails that may expedite many generative AI use instances. Amazon Bedrock is the one totally managed service that gives you with the flexibility to fine-tune Claude fashions. Amazon Bedrock provides an intuitive and safe manner of fine-tuning Anthropic’s Claude fashions and extra. The fine-tuned Claude mannequin may be deployed utilizing Amazon Bedrock and might use the capabilities of Amazon Bedrock seamlessly, for instance, Amazon Bedrock Data Bases for the telco domain-specific RAG or Amazon Bedrock Brokers for the agentic utilization.
On this publish, we share how SKT customizes Anthropic Claude fashions for telco-specific Q&A relating to technical telecommunication paperwork of SKT utilizing Amazon Bedrock.
Resolution overview
The group explored mixtures of immediate optimization, customization (fine-tuning), and knowledge augmentation with artificial knowledge. This multifaceted method aimed to maximise the advantages of every approach for the grounded Q&A era process.
Within the following sections, we discover these strategies in additional element.
Anthropic’s Claude customization with immediate optimization
Effective-tuning, which is accessible by Amazon Bedrock for numerous FMs, together with Anthropic’s Claude, permits adaptation of pre-trained language fashions for particular use instances. It’s notably efficient for tailoring response model and format adherence.
The group first optimized the system immediate, implementing standardized pointers for reply formatting and doc quotation based mostly on Anthropic mannequin prompting greatest practices. Key focus areas included:
- Clear presentation of system instructions
- Constant use of code block formatting
- Context-based tailor-made responses
This immediate engineering, mixed with fine-tuning, yielded substantial enhancements:
- Over 50% improve in ROUGE-3 rating
- Over 25% enchancment in ROUGE-L rating
- Over 4% improve in embedding similarity rating
- Vital progress in correct reference quotation
The iterative enhancement course of demonstrated cumulative advantages, with immediate updates alone exhibiting 35–40 p.c enhancements in key metrics, and the ultimate personalized mannequin attaining 50–60 p.c features in some metrics.
This development clearly illustrates the cumulative advantages of mannequin customization by RAG, immediate engineering, and fine-tuning, leading to a mannequin that considerably outperformed each the baseline and the prompt-updated variations by way of ROUGE scores and quotation accuracy. ROUGE rating measures the similarity between floor truths and generated outcomes by computing N-gram phrase overlaps. The next desk summarizes these enhancements.
LLM | Immediate replace | Effective-tuning | Relative enchancment over baseline | ||
ROUGE-3 | ROUGE-L | Quotation accuracy | |||
Anthropic’s Claude 3 Sonnet | – | – | baseline | baseline | baseline |
Anthropic’s Claude 3 Sonnet | ✅ | – | +38.30% | +13.4% | +52.94% |
Anthropic’s Claude 3 Sonnet | ✅ | ✅ | +58.1% | +26.8% | +70.59% |
Artificial knowledge for fine-tuning
To handle the problem of restricted high-quality labeled coaching knowledge, the group explored artificial knowledge era methods. This method additionally facilitates information distillation from bigger LLMs to smaller, extra focused fashions, providing advantages akin to decrease latency and value.
The group performed managed experiments utilizing:
- A baseline set of 500 floor fact samples
- An augmented set with 500 authentic over 1,500 artificial samples
- A bigger authentic set of two,000 samples
Artificial knowledge was generated utilizing Anthropic’s Claude Sonnet 3, creating new question-answer pairs over the identical retrieved paperwork utilized in floor fact examples.
The outcomes had been evaluated utilizing each LLM-based comparability and human choice analysis. Human evaluators blindly ranked mannequin outputs, with scores assigned based mostly on choice (Finest: 4, Second: 3, Third: 2, Worst: 1). The next desk exhibits the outcomes of the human choice analysis scores.
Rank | Mannequin | Cumulative rating (absolute best: 160) |
1 | Effective-tuned with 2,000 authentic samples | 114 |
2 | Effective-tuned with 500 authentic and 1,500 artificial samples | 112 |
3 | Effective-tuned with 500 authentic samples | 85 |
4 | No fine-tuning (baseline) | 84 |
Some key findings embrace:
- Small coaching units (500 samples) confirmed minimal enchancment over baseline
- Bigger coaching units (2,000 samples) scored significantly larger
- Synthetically augmented knowledge carried out equally to equivalent-sized authentic knowledge
Though having a big quantity of domain-specific coaching knowledge is all the time perfect, many companies have restricted out there datasets. In such eventualities, artificial knowledge can play an important function rather than authentic knowledge. This demonstrates the potential of artificial knowledge for mannequin customization.
Conclusion
SK Telecom’s collaboration with AWS GenAIIC showcases the corporate’s dedication to creating revolutionary AI options for telco challenges. By utilizing Amazon Bedrock to customise Anthropic’s Claude fashions, SKT has achieved vital efficiency enhancements for telco-specific, Korean language use instances with out the necessity to construct fashions from scratch. The proof of idea demonstrated vital enhancements:
- ~58% improve in ROUGE-3 rating
- ~27% improve in ROUGE-L rating
- Substantial enchancment in returning right reference hyperlinks
This method, mixed with artificial knowledge era methods, aligns with SKT’s AI Pyramid Technique, enabling quicker testing and growth of latest approaches. As SKT continues to deal with key areas akin to private AI assistants, AI healthcare, and AI knowledge facilities, this collaboration with AWS represents a big step of their AI evolution and long-term competitiveness within the world AI panorama.
For these eager about working with AWS on comparable tasks, go to Generative AI Innovation Middle.
Concerning the Authors
Sungmin Hong is a Senior Utilized Scientist at AWS Generative AI Innovation Middle the place he helps expedite the number of use instances of AWS clients. Earlier than becoming a member of Amazon, Sungmin was a postdoctoral analysis fellow at Harvard Medical Faculty. He holds Ph.D. in Laptop Science from New York College. Exterior of labor, Sungmin enjoys mountain climbing, studying and cooking.
Sujeong Cha is a Deep Studying Architect on the AWS Generative AI Innovation Middle, the place she focuses on mannequin customization and optimization. She has intensive hands-on expertise in fixing clients’ enterprise use instances by using generative AI in addition to conventional AI/ML options. Sujeong holds a M.S. diploma in Information Science from New York College.
Arijit Ghosh Chowdhury is a Scientist with the AWS Generative AI Innovation Middle, the place he works on mannequin customization and optimization. In his function, he works on utilized analysis in fine-tuning and mannequin evaluations to allow GenAI for numerous industries. He has a Grasp’s diploma in Laptop Science from the College of Illinois at Urbana Champaign, the place his analysis centered on query answering, search and area adaptation.
Yiyue Qian is an Utilized Scientist II on the AWS Generative AI Innovation Middle, the place she helps offering generative AI options to AWS clients. On this function, she collaborates with a group of consultants to develop revolutionary AI-driven fashions for AWS clients throughout numerous industries. Yiyue holds a Ph.D. in Laptop Science from the College of Notre Dame, the place her analysis centered on superior machine studying and deep studying methods.
Wei-Chih Chen is a Machine Studying Engineer on the AWS Generative AI Innovation Middle, the place he works on mannequin customization and optimization for LLMs. He additionally builds instruments to assist his group sort out numerous features of the LLM growth life cycle—together with fine-tuning, benchmarking, and load-testing—that accelerating the adoption of numerous use instances for AWS clients. He holds an M.S. diploma in Laptop Science from UC Davis.
Hannah Marlowe is a Senior Supervisor of Mannequin Customization on the AWS Generative AI Innovation Middle. Her group focuses on serving to clients develop differentiating Generative AI options utilizing their distinctive and proprietary knowledge to realize key enterprise outcomes. She holds a Ph.D in Physics from the College of Iowa, with a deal with astronomical X-ray evaluation and instrumentation growth. Exterior of labor, she may be discovered mountain climbing, mountain biking, and snowboarding across the mountains in Colorado.
Seunghyun Jeong (Steve) is a group chief of the Platform Software group at SKT. He’s accountable for commercializing the International Intelligence Platform (GIP), which gives AI fashions and instruments. For many of his profession, he has been a PM creating numerous cellular companies akin to cellular pockets, vogue streaming, and unified login companies for SK. His group is increasing the supply of fashions and options to make it simpler for inside groups to use AI, contributing to SKT’s AI Transformation. Earlier than getting into the AI area, he was a Product Supervisor, creating and working numerous cellular companies akin to cellular pockets, vogue streaming, and unified login companies for the US and Korea.
Sunwoo Lee (Lois) is the group chief of the Information Development and Analysis Group inside SK Telecom’s International AI Tech division. She oversees the design and building of coaching knowledge for language fashions, the mannequin efficiency analysis course of, and its software to companies. Her profession has centered on NLP inside IT, which is a superb match together with her background in Linguistics and Korean language schooling. Alongside her world-class group, she continues to discover and remedy fascinating issues akin to how you can optimize the design of information for language mannequin coaching, which duties and strategies to implement for validating language mannequin efficiency, and the perfect design of AI-human conversations.
Eric Davis is the vice chairman of the AI Tech Collaboration Group at SKT. Eric oversees tech collaborations with worldwide tech companions to customise massive language fashions (LLMs) for the telecommunications area. His groups are accountable for designing and constructing the datasets to tune LLMs, in addition to benchmarking LLMs basically and for the telecommunications area. Eric holds a Grasp of Science diploma in Laptop Science from Carnegie Mellon from the Language Applied sciences Institute and a Bachelor of Arts in Linguistics and Psychology from the College of California, Los Angeles.