Nvdia GPU LLM - 搜索

约 568,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › XiongjieDai › GPU-Benchmarks-on-LLM-Inference
XiongjieDai/GPU-Benchmarks-on-LLM-Inference - GitHub
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference? 🧐. Use llama.cpp to test the LLaMA models inference speed of different GPUs on RunPod, 13-inch M1 MacBook Air, 14-inch M1 Max MacBook Pro, M2 Ultra Mac Studio and 16-inch M3 Max MacBook Pro for LLaMA 3. Average speed (tokens/s) of generating 1024 tokens by GPUs on LLaMA 3.
nvidia.com
https://blogs.nvidia.com › blog › ai-decoded-lm-studio
Accelerate Larger LLMs Locally on RTX With LM Studio - NVIDIA …
2024年10月23日 · It makes larger, more complex models accessible across the entire lineup of PCs powered by GeForce RTX and NVIDIA RTX GPUs. Download LM Studio to try GPU offloading on larger models, or experiment with a variety of RTX-accelerated LLMs running locally on RTX AI PCs and workstations.
nvidia.com
https://www.nvidia.com › en-us › ai-on-rtx › chatrtx
Build a Custom LLM with ChatRTX | NVIDIA
ChatRTX is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content—docs, notes, images, or other data. Leveraging retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration, you can query a custom chatbot to quickly get contextually relevant answers. And because it all runs locally on ...
github.com
https://github.com › NVIDIA › TensorRT-LLM
GitHub - NVIDIA/TensorRT-LLM: TensorRT-LLM provides users …
2025年1月4日 · TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. - NVIDIA/TensorRT-LLM
nvidia.com
https://docs.nvidia.com › nim › large-language-models › getting...
Getting Started — NVIDIA NIM for Large Language Models (LLMs)
2025年1月13日 · Getting Started# Prerequisites# Setup#. NVIDIA AI Enterprise License: NVIDIA NIM for LLMs are available for self-hosting under the NVIDIA AI Enterprise License.Sign up for NVIDIA AI Enterprise license.. NVIDIA GPU(s): NVIDIA NIM for LLMs (NIM for LLMs) runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized.. Homogeneous multi-GPUs systems with tensor ...
nvidia.com
https://developer-qa.nvidia.com › blog › optimizing-inference-on-llms...
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM ...
2023年10月19日 · The TensorRT-LLM open-source library accelerates inference performance on the latest LLMs on NVIDIA GPUs. It is used as the optimization backbone for LLM inference in NVIDIA NeMo, an end-to-end framework to build, customize, and deploy generative AI applications into production.
nvidia.com
https://developer.nvidia.com › blog › nvidia-tensorrt-llm-supercharges...
NVIDIA TensorRT-LLM Supercharges Large Language ... - NVIDIA …
2023年9月9日 · In-flight batching and the additional kernel-level optimizations enable improved GPU usage and minimally double the throughput on a benchmark of real-world LLM requests on NVIDIA H100 Tensor Core GPUs, helping to reduce energy costs and minimize TCO.
nvidia.com
https://developer.nvidia.com › zh-cn › blog › introducing-new-kv-cache...
在 NVIDIA TensorRT-LLM 中引入新型 KV 缓存重用优化策略
TensorRT-LLM 是一个开源库，可为 NVIDIA GPUs 上的众多热门大语言模型提供先进的推理支持。TensorRT-LLM KV 缓存包括多项优化，例如支持分页 KV 缓存、量化 KV 缓存、循环缓冲区 KV 缓存和 KV 缓存重复使用。在本文中，我们将深入探讨已引入 TensorRT-LLM 的两个新的高级 ...
nvidia.com
https://docs.nvidia.com › tensorrt-llm › index.html
NVIDIA TensorRT-LLM - NVIDIA Docs - NVIDIA Documentation …
NVIDIA TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build NVIDIA TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
nvidia.com
https://blogs.nvidia.com › blog › tensorrt-llm-windows-stable...
Large Language Models up to 4x Faster on RTX With TensorRT-LLM …
2023年10月17日 · TensorRT-LLM, a library for accelerating LLM inference, gives developers and end users the benefit of LLMs that can now operate up to 4x faster on RTX-powered Windows PCs. At higher batch sizes, this acceleration significantly improves the experience for more sophisticated LLM use — like writing and coding assistants that output multiple ...
分页
- 1
- 2
- 3
- 4
- 下一页

XiongjieDai/GPU-Benchmarks-on-LLM-Inference - GitHub

Accelerate Larger LLMs Locally on RTX With LM Studio - NVIDIA …

Build a Custom LLM with ChatRTX | NVIDIA

GitHub - NVIDIA/TensorRT-LLM: TensorRT-LLM provides users …

Getting Started — NVIDIA NIM for Large Language Models (LLMs)

Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM ...

NVIDIA TensorRT-LLM Supercharges Large Language ... - NVIDIA …

在 NVIDIA TensorRT-LLM 中引入新型 KV 缓存重用优化策略

NVIDIA TensorRT-LLM - NVIDIA Docs - NVIDIA Documentation …

Large Language Models up to 4x Faster on RTX With TensorRT-LLM …