- Published on
How LLMs See the World
- Authors

- Name
- AbnAsia.org
- @steven_n_t

When you type "Hello world" into ChatGPT or Claude, the model isn't processing those letters and spaces like you're reading this post right now. It's converting everything into numbers through a process most people never think about.
Preprocessing comes first. Text gets normalized. Unicode characters, spacing quirks, and special symbols, they all get cleaned up and standardized. "Hello world" becomes a consistent format that the model can actually work with.
Then comes tokenization. This is where things get interesting. The model splits text into tokens, and there are different approaches.
1 - Character-based tokenization breaks everything down to individual characters. "Hello world" becomes ["H", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d"]. Simple but inefficient.
2 - Word-based splits on whole words. ["Hello", "world"]. Cleaner but struggles with rare words and creates massive vocabularies.
3 - Subword-based is what modern LLMs actually use. GPT, Gemini, Claude, they all rely on this. "Hello world" becomes something like ["Hell", "o", "world"]. It balances efficiency with flexibility, handling rare words by breaking them into known subword pieces.
The final step is Token IDs. Those subwords get mapped to numbers like [15496, 345, 995]. Each token ID corresponds to an embedding vector inside the model. That's what the neural network actually processes.
Author
Ai Base Network (ABN), ABN ASIA was founded by people with deep roots in academia, with work experience in the US, Holland, Hungary, Japan, South Korea, Singapore, and Vietnam. ABN Asia is where academia and technology meet opportunity. With our cutting-edge solutions and competent software development services, we're helping businesses level up and take on the global scene. Our commitment: Faster. Better. More reliable. In most cases: Cheaper as well.
Feel free to reach out to us whenever you require IT services, digital consulting, off-the-shelf software solutions, or if you'd like to send us requests for proposals (RFPs). You can contact us at [email protected]. We're ready to assist you with all your technology needs.

© ABN ASIA