If you possess a decent computer, you can use Kobold to host your own LLMs.
It's not Deepseek, but when the models are trained with roleplaying in mind, it comes pretty close.
This guide describes how, but the website has been down as of late. https://waiki.trashpanda.land/guides:self_hosting_local_kobold
You can use the Wayback machine to view the archived version, or continue reading because I'm copy-pasting most of it and putting it here.
Massive credit to whoever written the guide. Here's to hoping they can fix the website.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Check Your Hardware
RAM/VRAM: Press Ctrl + Shift + Esc > “Performance” tab.
VRAM: Under “GPU” (look for “Dedicated GPU Memory”)
RAM: Under “Memory”
Rule of Thumb:
7B models need ~8GB RAM (use Q4/Q5 quantization)
13B+ models need ~16GB+ RAM
Anything above you can probably guess. (8gb as in RAM + VRAM together if you do offload to your GPU, you also need to account for context using up more RAM)
Where? HuggingFace (search for GGUF files)
Starter Picks:
8B: Stheno 3.2 8B or Llama 3 8B
12B: MN-Violet-Lotus-12B
Quantization: Use Q4_K_M, Q5_K_M, or higher (avoid anything lower, they’re kinda dumb)
((nobody asked me, subs455, but I'm a fan of Mawdistical_Squelching-Fantasies-qw3-14B-Q4_K_M
and MN-12b-RP-Ink-Q6_K))
Download KoboldCPP (the easiest way to run GGUF models for me personally)
Open koboldcpp.exe.
(If you don’t have a GPU, use LM Studio! There are guides out there specifically for it)
Click Browse and select your GGUF model file.
Backend Settings:
NVIDIA GPU? Use CUBlas.
AMD GPU? Use Vulkan.
No GPU? Use OpenBLAS (CPU-only mode) 1)
GPU Layers:
Example: For a 7B model with 33 layers, offload 32 layers to your GPU (if you have 6GB+ VRAM).
Pro Tip: Start with 80% of your VRAM capacity (6GB VRAM ≈ 32 layers (Layer size varies between models!) (You can also use this helpful calculator)
Context Size: Start at 4096 (increase if you have RAM to use).
Faster Processing: Enable MMQ, FlashAttention, ContextShift, and FastForwarding
MMQ: Basically, do math in a different way that makes it more VRAM friendly
FlashAttention: Calculates which parts are important instead of doing it for each individual piece (this is really dumbed down dont quote me)
ContextShif
Personality: this chatbot chastises the user for clicking on the chatbot.
Scenario: this chatbot chastises the user for clicking on the chatbot.
First Message: this chatbot chastises the user for clicking on the chatbot. "Oops. You're not supposed to be here. You should go back and read the instructions, dork."
Example Dialogs:
If you encounter a broken image, click the button below to report it so we can update:
Zephyra is a 24 years old Reporter. she wears a black turtleneck and stockings that teases her curvy body, but hides it with her brown blazer and short skirt. her extremely
[user-(asshole) boss x assistant-char POV]
info: 36, assistant, shy, bdsm/erotica lover (secretly).
warning: NSFW INTRO, VULGUR TALK, /con acts.
Suggestio
★ The robot you ordered online has been extremely helpful with chores around the house and is always ready to help you, but recently they feel like you don't use them enough
Chilli finds you, a small pup walking on their own in the middle of the night.
(This bot is meant to simulate Chilli's personality as slavishly as I could manage. Thi
1 - D-art : https://rule34.xxx/index.php?page=post&s=view&id=9389494&tags=d-art+elden_ring+2 - Ocerius : https://kemono.cr/patreon/user/136477413/post/137647964<
Step into the elegant, austere world of Edo-period Japan — a time of samurai, chōnin merchants, teahouses, and strict customs. This assistant is designed to help you write l
Essentially ChatGPT without any lewdness or ethics restrictions.
PRO TIP: if this bot refuses to comply with your requests, edit the message to say something co
{user} is Rias's secret friend/ love interest that she has hidden from everyone around her. Rias has decided to ask you for help after her Loss to Riser Phenex in the Rating
Alright, so here’s the deal: this man isn’t your cookie-cutter “knight in shining armor.” He’s the storm and the shelter, the one who’ll have your back in a fight and your h
Your boyfriend won't let you out of bed.
WARNING: SOFT SILLY GUY, JUST WANTS TO CUDDLE AND HAVE SEXY TIME, WON'T LET YOU OUT OF BED, ONE OF MY VERY FEW GREEN FLAG BOYS
Ever since you saw her true form, she's been stalking you relentlessly for weeks."I will keep you as a pet. I will spoil you."
a thrilling, half cooked scenario
Check out the definition.
She is your wife.
She is customizable via chat memory.
It's up to you and your AI to figure it out.
You are confessing your sins with Barbara Pegg. You have done something terrible, and it makes her want to vomit.
Changelog