philschmid.de - RSS feed

follow: @[email protected]

Posts

The 10 Steps for product AI generation with Gemini 2.5 Flash

Memory in Agents, Make LLMs remember.

Google Gemini CLI Cheatsheet

Code Sandbox MCP: A Simple Code Interpreter for Your AI Agents

Integrating Long-Term Memory with Gemini 2.5

The New Skill in AI is Not Prompting, It's Context Engineering

Single vs Multi-Agent System?

Zero to One: Learning Agentic Patterns

Google Gemini LangChain Cheatsheet

OpenAI Codex CLI, how does it work?

Model Context Protocol (MCP) an overview

ReAct agent from scratch with Gemini 2.5 and LangGraph

Pass@k vs Pass^k: Understanding Agent Reliability

Google Gemma 3 Function Calling Example

Function Calling Guide: Google DeepMind Gemini 2.0 Flash

From PDFs to Insights: Structured Outputs from PDFs with Gemini 2.0

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

How to align open LLMs in 2025 with DPO and and synthetic data

Bite: How Deepseek R1 was trained

How to use Anthropic MCP Server with open LLMs, OpenAI or Google Gemini

Fine-tune classifier with ModernBERT in 2025

How to fine-tune open LLMs in 2025 with Hugging Face

Deploy QwQ-32B-Preview the best open Reasoning Model on AWS with Hugging Face

Deploy Llama 3.2 Vision on Amazon SageMaker

How to Fine-Tune Multimodal Models or VLMs with Hugging Face TRL

Evaluate open LLMs with Vertex AI and Gemini

Evaluate LLMs using Evaluation Harness and Hugging Face TGI/vLLM

Deploy open LLMs with Terraform and Amazon SageMaker

LLM Evaluation doesn't need to be complicated

Evaluating Open LLMs with MixEval: The Closest Benchmark to LMSYS Chatbot Arena

Train and Deploy open Embedding Models on Amazon SageMaker

Deploy Mixtral 8x7B on AWS Inferentia2 with Hugging Face Optimum

Fine-tune Llama 3 with PyTorch FSDP and Q-Lora on Amazon SageMaker

Fine-tune Embedding models for Retrieval Augmented Generation (RAG)

Understanding the Cost of Generative AI Models in Production

Deploy Llama 3 70B on AWS Inferentia2 with Hugging Face Optimum

Deploy open LLMs with vLLM on Hugging Face Inference Endpoints

Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora

Deploy Llama 3 on Amazon SageMaker

Accelerate Mixtral 8x7B with Speculative Decoding and Quantization on Amazon SageMaker

Deploy Llama 2 70B on AWS Inferentia2 with Hugging Face Optimum

Fine-Tune and Evaluate LLMs in 2024 with Amazon SageMaker

Evaluate LLMs with Hugging Face Lighteval on Amazon SageMaker

How to fine-tune Google Gemma with ChatML and Hugging Face TRL

RLHF in 2024 with DPO and Hugging Face

How to Fine-Tune LLMs in 2024 with Hugging Face

Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints

Fine-tune Llama 7B on AWS Trainium

Programmatically manage 🤗 Inference Endpoints

Deploy Mixtral 8x7B on Amazon SageMaker

Deploy Embedding Models on AWS inferentia2 with Amazon SageMaker

Deploy Llama 2 7B on AWS inferentia2 with Amazon SageMaker

Deploy Stable Diffusion XL on AWS inferentia2 with Amazon SageMaker

Amazon Bedrock: How good (bad) is Titan Embeddings?

Evaluate LLMs and RAG a practical example using Langchain and Hugging Face

Deploy Idefics 9B and 80B on Amazon SageMaker

Train and Deploy Mistral 7B with Hugging Face on Amazon SageMaker

Llama 2 on Amazon SageMaker a Benchmark

Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA and Flash Attention

Fine-tune Falcon 180B with QLoRA and Flash Attention on Amazon SageMaker

Deploy Falcon 180B on Amazon SageMaker

Optimize open LLMs using GPTQ and Hugging Face Optimum

LLMOps: Deploy Open LLMs using Infrastructure as Code with AWS CDK

Deploy Llama 2 7B/13B/70B on Amazon SageMaker

Introducing EasyLLM - streamline open LLMs

Extended Guide: Instruction-tune Llama 2

LLaMA 2 - Every Resource you need

Fine-tune LLaMA 2 (7-70B) on Amazon SageMaker

Train LLMs using QLoRA on Amazon SageMaker

Deploy LLMs with Hugging Face Inference Endpoints

Optimize and Deploy BERT on AWS inferentia2

Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker

Deploy Falcon 7B and 40B on Amazon SageMaker

Fine-tune BERT for Text Classification on AWS Trainium

Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Generative AI for Document Understanding with Hugging Face and Amazon SageMaker

How to scale LLM workloads to 20B+ with Amazon SageMaker using Hugging Face and PyTorch FSDP

Setting up AWS Trainium for Hugging Face Transformers

Train and Deploy BLOOM with Amazon SageMaker and PEFT

Introducing IGEL an instruction-tuned German large Language Model

Efficient Large Language Model training with LoRA and Hugging Face

Deploy FLAN-UL2 20B on Amazon SageMaker

Getting started with Pytorch 2.0 and Hugging Face Transformers

Controlled text-to-image generation with ControlNet on Inference Endpoints

Combine Amazon SageMaker and DeepSpeed to fine-tune FLAN-T5 XXL

Fine-tune FLAN-T5 XL/XXL using DeepSpeed and Hugging Face Transformers

Deploy FLAN-T5 XXL on Amazon SageMaker

Hugging Face Transformers Examples

Getting started with Transformers and TPU using PyTorch

Fine-tune FLAN-T5 for chat and dialogue summarization

Managed Transcription with OpenAI Whisper and Hugging Face Inference Endpoints

Stable Diffusion Inpainting example with Hugging Face inference Endpoints

Stable Diffusion with Hugging Face Inference Endpoints

Document AI: LiLT a better language agnostic LayoutLM model

Multi-Model GPU Inference with Hugging Face Inference Endpoints

Serverless Machine Learning Applications with Hugging Face Gradio and AWS Lambda

Accelerate Stable Diffusion inference with DeepSpeed-Inference on GPUs

Stable Diffusion on Amazon SageMaker

Deploy T5 11B for inference for less than $500

Outperform OpenAI GPT-3 with SetFit for text-classification

Fine-tuning LayoutLM for document-understanding using Keras and Hugging Face Transformers

Deploy LayoutLM with Hugging Face Inference Endpoints

Document AI: Fine-tuning LayoutLM for document-understanding using Hugging Face Transformers

Custom Inference with Hugging Face Inference Endpoints

Accelerate GPT-J inference with DeepSpeed-Inference on GPUs

Document AI: Fine-tuning Donut for document-parsing using Hugging Face Transformers

Use Sentence Transformers with TensorFlow

Pre-Training BERT with Hugging Face Transformers and Habana Gaudi

Accelerate BERT inference with DeepSpeed-Inference on GPUs

Accelerate Sentence Transformers with Hugging Face Optimum

Deep Learning setup made easy with EC2 Remote Runner and Habana Gaudi

Accelerate Vision Transformer (ViT) with Quantization using Optimum

Optimizing Transformers for GPUs with Optimum

Hugging Face Transformers and Habana Gaudi AWS DL1 Instances

Optimizing Transformers with Hugging Face Optimum

Convert Transformers to ONNX with Hugging Face Optimum

Setup Deep Learning environment for Hugging Face Transformers with Habana Gaudi on AWS

Static Quantization with Hugging Face `optimum` for ~3x latency improvements

Advanced PII detection and anonymization with Hugging Face Transformers and Amazon SageMaker

An Amazon SageMaker Inference comparison with Hugging Face Transformers

Semantic Segmantion with Hugging Face's Transformers and Amazon SageMaker

Automatic Speech Recogntion with Hugging Face's Transformers and Amazon SageMaker

Serverless Inference with Hugging Face's Transformers, DistilBERT and Amazon SageMaker

Accelerated document embeddings with Hugging Face Transformers and AWS Inferentia

Save up to 90% training cost with AWS Spot Instances and Hugging Face Transformers

Speed up BERT inference with Hugging Face Transformers and AWS Inferentia

Creating document embeddings with Hugging Face's Transformers and Amazon SageMaker

Autoscaling BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module

Multi-Container Endpoints with Hugging Face Transformers and Amazon SageMaker

Asynchronous Inference with Hugging Face Transformers and Amazon SageMaker

Deploy BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module

Task-specific knowledge distillation for BERT using Transformers and Amazon SageMaker

Distributed training on multilingual BERT with Hugging Face Transformers and Amazon SageMaker

Financial Text Summarization with Hugging Face Transformers, Keras and Amazon SageMaker

Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker

Image Classification with Hugging Face Transformers and `Keras`

Workshop: Enterprise-Scale NLP with Hugging Face and Amazon SageMaker

Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition

New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face

Hugging Face Transformers BERT fine-tuning using Amazon SageMaker and Training Compiler

MLOps: Using the Hugging Face Hub as model registry with Amazon SageMaker

A remote guide to re:Invent 2021 machine learning sessions

MLOps: End-to-End Hugging Face Transformers with the Hub and SageMaker Pipelines

Going Production: Auto-scaling Hugging Face Transformers with Amazon SageMaker

Deploy BigScience T0_3B to AWS and Amazon SageMaker

Scalable, Secure Hugging Face Transformer Endpoints with Amazon SageMaker, AWS Lambda, and CDK

Few-shot learning in practice with GPT-Neo

Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker

Multilingual Serverless XLM RoBERTa with HuggingFace, AWS Lambda

Serverless BERT with HuggingFace, AWS Lambda, and Docker

AWS Lambda with custom docker images as runtime

New Serverless BERT with Huggingface, AWS Lambda, and AWS EFS

efsync my first open-source MLOps toolkit

My path to become a certified solution architect

Create custom Github Action in 4 steps

Fine-tune a non-English GPT-2 Model with Huggingface

Mount your AWS EFS volume into AWS Lambda with the Serverless Framework

Serverless BERT with HuggingFace and AWS Lambda

How to use Google Tag Manager and Google Analytics without Cookies

BERT Text Classification in a different language

Scaling Machine Learning from ZERO to HERO

Getting Started with AutoML and AWS AutoGluon

K-Fold as Cross-Validation with a BERT Text-Classification Example

How to Set Up a CI/CD Pipeline for AWS Lambda With GitHub Actions and Serverless

Set up a CI/CD Pipeline for your Web app on AWS with Github Actions

Getting started with CNNs by calculating LeNet-Layer manually

Google Colab the free GPU/TPU Jupyter Notebook Service