Avatar of Paul PlanchonPaul Planchon

Behind the scene of creating a great LLM chatting application

It's true that all chat applications do not have good UI/UX. And for someone like me, who deeply appreciates great engineering work and well-crafted UI, it's a shame.

I used T3.chat for a few months, and to be honest, it's a great product. I'm not a huge fan of the purple color scheme, but I stuck to it because the boring theme is really ugly. I was using it for the UX anyway.

The chat is fast. When you create a message, everything loads instantly, you dont have weird spinner or non sense interactions. The app feels good, and this is what adds AI into my day-to-day workflow.

After some time using T3.chat, I noticed some UX quirks that bugged me a little. The biggest one is the lack of keyboard shortcuts. To create a new chat, you need to use the very bad CTRL-SHIFT-O shortcut made by ChatGPT, not to mention all the other interactions such as changing models or adding tools, which are mouse-only. All of the navigation on T3.Chat is really designed for mouse and click interaction, as I prefer to have a keyboard-only navigation.

And over the weeks, many other little quality-of-life changes made me want to create my own chat app.

So that's why I tried to recreate a chatting app all by myself. It was a personal challenge, and also a way to try the new AI technologies.

So the MVP for karnet.app was simple: I wanted to recreate a very simple chat interface where I can ask any models questions, in any modality. Also wanted to be able to change the question’s model, and to fork the conversation to another chat.

And of course, I wanted to make all of the application keyboard-first. Every interaction needed to feel fast (feeling and actual speed are very different things in UX; of course I want every API endpoint to be as fast as possible, but primarily, I want the application to be/feel fast).

In summary, here are the requirements of the MVP

  • All LLMs available, great token speed and resumable stream
  • Replay and try another model features
  • Very fast application (interaction time budget is <25ms)
  • All navigation should be possible and optimized for keyboard-only users

After 4 weeks of constant grind in my 5 to 9, I finally landed a version of Karnet.app chat I’m proud of.

Before going in depth into all the technical challenges I encountered on this journey, I want to demo what I built. I use this chat application every day for my normal LLM chatting sessions. The application is stable, and I enjoy playing with it!

Karnet is really built for speed. When you use mostly keyboard to navigate in an application, every need to render fast, otherwise, that app feels really slow.

Karnet feels very fast because most of the data you see on screen has been loaded before you click on it. Some of data is pre-loaded by calling our API, some of them are stored on the client device. For example, all the LLM chats are loaded of the client, making them instant load.

I tried to make Karnet loaders-free.

All of the interactions in the chat are available through multiple keyboard shortcuts. Everything is though to be easily accessible without using a mouse :

  • changing the model M or @ (inside of the chat input)
  • adding / removing tools S or / (inside of the chat input)
  • creating a new chat c+c (create chat)
  • scrolling in the page j or J and k or K
  • replaying last message up (like in the terminal)
  • many more…

I want the keyboard navigation to be the main navigation in the app. Everything should be accessible a few keyboard shortcuts away.

For example, lets imagine you have a research paper from last year on your computer and you want to have update on the state of research on this topic, you could do cc to create a chat, /file to add a file to the request, then s to add the search tool and the enter to launch the request. All of this without having to touch the mouse.

This way of interacting should not be a premium, this should be as normal as the mouse interaction.

The chat renders a beautiful markdown, even while token are streaming. It can also resume when the connection from your machine to our server is down (you will never have an empty screen).

You can also retry a question, in the same chat, using another LLM.

The chat is really nice. But honestly, I have very little to do with it. I use the lastest AI libraries from Vercel (ai-sdk , resumable-stream and streamdown ). This libraries are very well built, the documentation is very precise. I think I spent less that a week on all the frontend chat features.

Thanks to openrouter all of the LLM are available in Karnet. They are available to try as soon as OpenRouter release them on their website.

Also, to avoid the mess of loading thousand of LLM model in the app model selection dropdowns, you can pre-select LLM in the settings of the app. You can also select a default model for text generation and image generation.

We are still in early alpha of the application, but if you want to try it yourself you can enter the waitlist here :
https://karnet.app/waitlist

Karnet is not only a chat app, the chat is one feature of the complete application. I’m creating my vision of the perfect productivity platform. Karnet has a lot of other features, which I’m not showing in this article 😊

I led by creating a new NextJS project, adding new pages, and tinkering with the ai-sdk and it’s integration through Vercel’s LLM gateway. The amazing documentation available online allowed me to have everything set up in a few hours and have the first version of this app ready and deployed in production very quickly.

At first, I wanted to create a sync engine all by myself, but I soon realized that it was very hard problem, and I did not wanted to spend to much time outside of the core product.

That’s why I looked at all the different platforms / frameworks that allow you to create real-time syncing between React and your backend / database, and tried some of them.

In the end, Convex was the more mature one and the easier to deploy and to use, in my opinion. So I really started to create a perfect clone of T3Chat : and to be honest, the tech stack is quite magical ✨.

I think I’m using the same technologies as Theo and I'm pretty sure that we are doing the same tricks on the Next.js side to make everything lightning fast.

Having a basic chat interface is quite easy, but pushing it to the limit and having a very fast one is not simple.

Next.js is very good for SEO optimized website : All of the React server component features allow you to create very fast static websites. All of the web scrawlers can then load your website and index it without having to render the JS, this is perfect for e-commerce and landing pages.

When you want to create a website where the user is going to interact a lot with all the different pages, where data is changing all the time NextJS hits its limits.

One of the big problem is every time you navigate to another pages on Next.js, you need to call Vercel's backend, this is how NextJS works. There is no client-side routing.

NextJS is taking few millisecond here and there, making everything seems laggy.

Keyboard navigation do not have the same standards as mouse interaction one. I feel that when we click on something, a little wait can be expected. I don’t think our brain is wired to wait for keyboard inputs, when you press a key, something needs to happen. That’s why keyboard online navigation is so hard to get right.

This is Karnet before the client side optimization I did.

As you can see, for each page change takes 200ms +. For a SPA this is not a great user interaction.

So I went looking for a solution to have Next and instant routing (like in SPA).

The most obvious solution would be to not use NextJS at all. NextJS is not really designed for client side interaction, when “normal” React is.

I’m forced to used Next on this project for two reasons :

  • I want to have the landing page statically rendered using all Next features and I dont want to use a subdomain of subpath in my application. (I could host the landing page on karnet.app and have the application on something.karnet.appor on karnet.app/something , but i do not wanted that)
  • I want to use the /api feature

Why?

The /api feature is a very great way to build full-stack apps easily. You just create an api folder in your /app folder, and then everything inside of it will be transformed into routes for your API. This feature is heavily undervalued : most of our applications are CRUD monkeys' applications, as DHH would say. The /api folder is AWS lambda but done right : I wanted to use it. This is simple. I love simple.


The solution I found was to use the React Router library inside of the Next.js application.

This is not easy to set up because Next is not really built for that. Nether-less, Next has a very convinient feature, the “catch all routes” : you can create a folder with [...slug] as name. This will redirect all the request not matching any other routes to this.

Using this strategy I’m able to split my application in two parts : the server-side rendered and the client side rendered.

Thats why my applications has an /app and /page folders (you can read the source code here). The first is the “traditional Next application router” and the second is the SPA routing.

All of this makes navigation much smoother : most of the link are inside of react router’s world, so they are just triggering an unmount of the old page component, and then, the mounting of the new one (which is super fast).

This is the chat experience on Karnet. You can ask question to any openrouter model, retry it on the same model, fork the question or retry it on another model. Everything is smooth, animated and beautiful to use. Using this chat app is effortless and really, a joy 😊

Getting response from an LLM into a React application can be very difficult if you are not using the right technologies. Token stream, event serializing and deserializing and many other details are not trivial in those kind of applications. You could recreate the wheel, or just use the state of the art libraries. I chose simplicity and state of the art.

I went with the ai-sdk to create the chat. This is very easy to setup, you will encounter some challenges when loading data into the useChat hook.

But the real challenge come when you want to optimize the TBLR (Time Before LLM Request). You cannot really change the token throughput of the provider (I’m using openrouter, so I always get the best provider for one LLM).

The only real impact you can have on your “TFT” (TFT is really the difference between the time when the request is sent to the LLM provider and when the first token are received) is to reduce all of the processing you do before sending the request.

Having a small TBLR mean having a fast experience on the chat. Most of the time, everything is about design to disguise loading time, but for my app I was something really as fast as possible.

At the moment I’m not doing much processing (intentionally, because I want a very small TFT) before sending the request : I don’t use memory system, complex LLM routing strategies or even AI agents.

The budget I set for myself, for now, is <100ms for all the processing I do before I send the LLM request (Time Before LLM Request). And at first, I had an horrible TBLR : it was more than a second.

To better understand where all of my TBLR was lost to, I used sentry.io to instrument my app.

Fun note : you can now install Sentry with an AI CLI. This CLI will make all the right choice for your app : it has work perfectly for me and I didn’t changed any of the generated code, gg sentry!

Here is the results I got.

This is a “cold request” on the vercel-production environment. As you can see this is not great, the TBLR here is 1.5s… But I’m loosing 800ms to Vercel internal routing (because my application is not loaded into Vercel Servers, the request takes longer : acceptable). The real shame was the 600ms I was loosing due to token validation and generation.

I cannot reduce the Vercel “resolve page component” span, but I can optimize the getToken function. To do so, I setup a Redis cache, and cache all the getToken responses. This is slow for the first request, but very fast afterward.

Here is the same request, but once all the cache are hit :

As you can see, here the request to Openrouter is send in less than 50ms. Which is in my TBLR budget. This made a very big difference on the feel of the chat, because instead of having the a spinner; you could get the token stream almost instantly, making the app feels faster.


I could completely removed this two wait if I was hosting all of the backend code on a non server-less environment and if I were using a different auth solution (avoiding the getToken request…)

As always, in software engineering you need to make tradeoffs. Here are my reasonning behind the two tradeoff I’m doing :

The /api feature of Next JS is so great. Not having to think about any server, CI/CD, dev environment is really a game changer. When you push code to Vercel, you get an PR environment ready in minutes, there you can test your application “in the real world”, effortlessly.

In my day job, I’m managing a big AWS infrastructure running docker images inside a Kubernetes cluster (like everybody no?).

All of the work you need to do, to keep everything running, creating all of the pipelines, and just to get a worst developper experience in result is not worth it.

In a coming article I will explain all the hoop I needed to go through to have a great CI/CD for the software I’m building. Months of work, and I still don’t have PR envs...

I totally understand why Vercel needs cold start, especially since I’m a free tier user. I don’t think I will be moving anytime soon from this kind of development and off of Vercel.

The Convex / NextJS tradeoff is amazing for the CRUD / ai operations. For all the more complex requests, I use Trigger.dev. More on that in another article…

Most of the time spent in Clerk is when I call their server to generate a token in a special shape. This is needed because Clerk do not give you your private JWKS key (otherwise you could just bypass most of the their features ahah). Clerk takes on averages, ~200ms to return the token (on my account), this is a lot of time only to sign a JWT token.

I don’t really like this. But for now I’m accepting it because the developer XP is incredible and I’m still in the free tier. If I were to upgrade the tier, I think I would want to be able to sign my token on my end, meaning that I would want to have access to my JWKS (to avoid the request round trip, is it possible ?).

Using Clerk help me not having to worry about most of the security, user management (and even the subscription part in the future if needed!). So, for now, I’m OK with the getToken situation. I found a work-around with the caching strategy. This is not perfect but it works.

The last piece of engineering I did to make Karnet fast was to avoid all the basic spinner you can find in a normal SPA app. Loader and skeleton are the plague of any good software: they look ugly and make you loose time. When you click on a link you want to have the data loaded instantly.

Everything which should not have a spinner, should not have one. If you have a way to avoid it, follow this way.

The CRUD stack I’m using on Karnet is Convex and Tanstack query. Convex is hosting the data, doing their magic to make it up to date with the client, while Tanstack query is providing a nicer hook interface and enabling an easy first render with local data.

Karnet is not, yet, an application where you can collaborate. So you are the only one modifying the data. This allow me to store the data on your device without any major risk (server is always right, on conflict we remplace local data with the remote one). I use local storage for this storage layer.

This would be a general “query” hook on Karnet :

This hook tries to avoid loading state in the application. I prefer to render out of date data for a few seconds than to render an ugly spinner.

type UseSyncProps<
    ConvexQueryReference extends FunctionReference<"query">,
    Args extends FunctionArgs<ConvexQueryReference> | "skip",
> = {
    args: Args;
    queryFn: ConvexQueryReference;
    key: (args: Args) => string;
    options?: {
        isLocallyStored?: boolean;
    };
};

export const useSync = <
    ConvexQueryReference extends FunctionReference<"query">,
    Args extends FunctionArgs<ConvexQueryReference> | "skip",
>({
    args,
    queryFn,
    key,
    options,
}: UseSyncProps<ConvexQueryReference, Args>) =>
		// useQuery from tanstack
    useQuery({
        ...convexQuery(queryFn, args),
        initialData: () => {
            if (options?.isLocallyStored) {
                const rawData = localStorage.getItem(key(args));
                if (rawData) {
                    return JSON.parse(rawData);
                }
            }
        },
    });

This work very very well. In the future I will add some parameters like from where the data is comming (to restrict interaction when the data is comming from local storage, to avoid weird sync issues).

Convex is an incredible piece of backend, but their React hooks are not yet great. Mixing them with Tanstack makes everything magical.

Note that I’m doing the same trick for the mutation. I will update the UI in an optimistic fashion when possible. “persisting” the data when it is validated from the backend. Again, to not have to wait.

The only big problem I could have with this data model is migrations. If I do a major schema change, I would have a lot of client side data to invalidate / reshape.

It’s better to have a warning or an error 1% of the time than to have to render a spinner 100% of the time.

In most of the situation you should not trust the client. Trust create a threat vector. In Karnet, each new chat gets a new ID, unique for you, and for the platform. I could have used UUID for that, and when the easy way of just trusting the client.

Instead I went the hard way : when you go on the new chat page, It’s instantly fire a “get new chat id” request to the backend. I bet this request will land before you finished writing you question (if you are faster than 100ms, you should become the LLM).

Then, when you send your request on this “new chat”, I use the generated request.

This is the best way possible : Karnet do not have to trust any user input.

Finally, this pre-determined ID allow me to create a more readable, a smallId . Instead of having to share a nanoID id, you can use an ID made for human like chat-42.

It’s only been a few week since I started working on Karnet. For now I think I created 60% of the features I want to implement in my app. As you may know, this last 40% will take much more time than the first 60%.

At the moment, the rich text editor is not perfect. There is a lot of potential in this part of the chat : having access to your last messages, ask a small LLM to reprompt your input, better copy and paste (image, code, files etc)

Better chat output visualisation is also something I want to work on : avoiding the gray “thinking” pattern most chat interface implement by trying to have custom widget per thinking mode.

I also want to add agentic feature in the chat, this is yet to come…