In 4D applications, large documents are commonplace: financial reports, internal guidelines, technical manuals… Searching for an exact keyword often isn’t enough. Scrolling through 30-page reports to find one paragraph is not only time-consuming but also error-prone. This is where AI can help.
The semantic approach based on vectors, introduced in 4D 20 R10, already makes it possible to find a relevant 4D Write Pro document even when different wordings are used (for example, “insert image” vs. “add picture”).
But what happens when a document spans multiple pages and covers various subtopics? Even if the entire text can be converted into a single vector, results are often better when we work at a finer scale. This is the idea behind chunking: splitting a document into coherent segments, each represented by its own vector.
This is precisely what allows us to go further: retrieving not only the right document, but also the exact passage that matches the search.
Divide to Search Better: the Chunking Strategy
Instead of indexing an entire document as a single vector, we split it into coherent segments (chunks), typically a paragraph or a block of 400 to 800 characters.
Each chunk is stored with:
- a unique chunk identifier
- the document identifier
- its position in the text (startOffset, endOffset) used for precise highlighting
- its vector embedding
- and optionally a short text extract for quick verification
This level of granularity unlocks new possibilities:
- More precise search,
- Direct selection of the relevant passage,
- Contextual highlighting in 4D Write Pro.
Illustrated Workflow: From Search to Automatic Highlighting
Here’s how the search process unfolds from the user’s perspective:
1. User query
The user enters some text (e.g., professional training) and triggers a document search.
2. Vector search
The text is vectorized and compared to the stored embeddings. The system returns the k closest chunks (e.g., docID=42, chunkID=3, startOffset=1200, endOffset=1600).
3. Display of relevant paragraphs
The application shows a list of relevant paragraphs, along with the document name and snippet.
4. Automatic display and selecting the paragraph
By clicking on an extract in the list, the user is redirected to the full document and sees the passage highlighted, surrounded by its neighboring paragraphs. Thanks to the stored text offsets, the exact portion of the document can be selected.
Why this approach is effective
- Performance: embeddings are more accurate on short fragments than on an entire text.
- User experience: the user is taken directly to the passage of interest, saving time and avoiding scrolling through dozens of pages.
- Flexibility: chunking can be adjusted (fixed size, by paragraph, with or without 10–15% overlap).
- Scalability: this method works efficiently even with hundreds or thousands of documents.
Technical Details
A demo database showcases this feature using several 4D Write Pro documents.
1. Splitting a Document into Chunks
Each chunk corresponds to a text interval within a document.
For each chunk, we store:
- its vector embedding: embedding
- its document reference: ID_Document
- a unique identifier: ID
- its text boundaries: startOffset, endOffset

When saving a 4D Write Pro document, 4D calculates the different vectors that make up the document:
$colRange:=WP Get elements($doc.WP; wk type paragraph)
// For each paragraph, create a chunk
For each ($paragraph; $colRange)
$chunk:=ds.Chunk.new()
$chunk.ID_Document:=$doc.ID
$chunk.startOffset:=WP Paragraph range($paragraph).start
$chunk.endOffset:=WP Paragraph range($paragraph).end
$chunk.textExtract:=WP Get text($paragraph)
// Generate vector embedding using AIManagement
$chunk.embedding:=cs.AIManagement.new($apiKey).generateVector($chunk.textExtract)
$chunk.save()
End for each
2. Vector search within chunks
We compare the user’s prompt with all vectors in the Chunk table:
// Generate a vector from the custom prompt using the AIManagement class
var $vector:=cs.AIManagement.new($apiKey).generateVector($prompt)
// Sort entity by cosine similarity
return ds.Chunk.all().orderByFormula(Formula(This.embedding.cosineSimilarity($vector)); dk descending).slice(0; 5)
We limit results to the top five most relevant paragraphs.
3. Select the found paragraph in 4D Write Pro
When the user selects a passage in the listbox, 4D displays the document in the 4D Write Pro area and highlights the corresponding paragraph:
Case of
: (Form event code=On Selection Change)
// Load the selected document in the 4D Write Pro area
WParea:=WP New(Form.currentItem.document.WP)
GOTO OBJECT(WParea)
// Select the text range defined by startOffset and endOffset
WP SELECT(WParea; Form.currentItem.startOffset; Form.currentItem.endOffset)
End case
Conclusion
This approach combines the best of both worlds: semantic search identifies the relevant documents, while the granularity of chunks allows you to pinpoint the exact passage. The result is a 4D Write Pro search assistant that doesn’t just find a good document, but guides the user directly to the answer they need.
Imagine the time saved and the improved user experience when your team can jump directly to the relevant passage. How could this level of precision change the way you or your users explore and interact with your content? Share your thoughts on the 4D Forum!
