#AI
14 posts tagged with this topic. ← All tags
-
Vector search: finding similar content without keyword matching.
How vector databases work, how to query them efficiently, and when to use them instead of traditional search.
-
Embeddings: what they encode and a practical use case.
What embedding vectors represent, how to generate them with the OpenAI API, and how to use them for semantic search.
-
Rate limits and retry logic for AI APIs.
How OpenAI and other AI API rate limits work, and how to build retry logic that handles them without hammering the provider.
-
Caching LLM responses: what to cache, what not to, and the key design.
A practical guide to caching LLM API responses, including what makes a good cache key and when caching backfires.
-
Latency budget in a real-time AI app: where the milliseconds go.
How to break down end-to-end latency in a real-time AI application and identify which hops to optimize first.
-
STT to LLM to TTS: a pipeline where every hop adds latency.
How to architect a speech-to-speech pipeline and where to optimize each stage to minimize end-to-end latency.
-
Text-to-speech: latency, voice selection, and streaming audio back.
How TTS APIs work, how to pick voices, and how to stream audio to the client before the full synthesis is done.
-
Gemini vs OpenAI: the API differences that matter when you use both.
A practical comparison of the Gemini and OpenAI APIs covering authentication, message format, tool calling, and streaming.
-
OpenAI function calling: the feature that makes LLMs do real work.
How OpenAI function calling works, how to define tools, and how to wire the model's output back into your application.
-
Streaming LLM responses: why waiting for the full answer feels broken.
LLMs generate tokens one at a time. Streaming sends them as they're produced instead of waiting for completion. How to implement streaming with the OpenAI API and handle it on the client.
-
JSON mode in the OpenAI API: getting structured output you can actually use.
Getting consistent, parseable JSON from LLMs requires more than asking nicely. JSON mode, structured outputs, and the patterns that make LLM output reliable.
-
Speaker diarization: turning a transcript into 'who said what.'
Transcription gives you text. Diarization adds speaker identity. How diarization works, the tools available, and how to combine it with Whisper output.
-
Chunking audio for transcription: size, overlap, and the timing that matters.
The Whisper API has a 25MB file size limit. For long recordings, chunking is required. How to split audio correctly so transcription quality doesn't suffer at the boundaries.
-
OpenAI Whisper API: what the response actually looks like and how to use it.
A practical look at the Whisper transcription API response format, the verbose JSON mode with timestamps, and how to use the output in real applications.