4D Write Pro – Full text indexation

Automatically translated from English

“I want to find all of the documents that talk about tango! I need them quickly! Can I do that?”

Ok, but first breathe!

Keyword searches within 4D Write Pro documents simply require adding a new indexing attribute within each document. This isn’t done by default because this type of search is not often necessary so it wouldn’t make sense to systematically increase the size of the documents. However, when it’s needed, this type of index is very easy to build.

 

“Tango is a traditional Argentinean dance”. Our objective is to create a list of words from the content of the documents. This is mainly done using two commands:

  • WP Get text – returns the raw text of the document
  • Get text keywords – fills a text array with the words from the text provided. Once this array has been filled, simply add it as an attribute of the 4D Write Pro document object, and use it as a target for keyword searches.

Let’s take a closer look at the details.

Don’t miss anything inSIDE the document

WP Get text returns the content of the target as plain text. The target can be a given range or an entire document. If you pass a document as an argument, only the text in the body of the document will be returned. Headers and footers will be ignored, which is probably not desirable. Since each section of a document can have its own headers and footers, you’ll need to loop through each one and read their content. Be sure to take any variants into account (e.g., first page, right-hand page, left-hand page).

Putting the text in AN ARRAY

Get text keywords fills an array based on the content of the source text. The only subtlety here is not to forget the optional star (*) parameter. It allows you to fill the array with only values that are distinct from the words in the text. This avoids unnecessary “weight” đŸ™‚

Create the index in the document

The name of the attribute that will contain the index is up to you, however, it’s strongly recommended to prefix it with an underscore to avoid any potential conflicts with existing (or future) public attributes of 4D Write Pro documents (Example: “_keywords”.)

Of course, this indexing must be done each time documents are modified, but this is a really very quick task and won’t cause any noticeable slowdown for users.

Index the index!

In order for the queries using the keyword index to be fast and efficient, the document itself must, of course, be indexed. There’s no need to worry about the size of the index. Only exposed (public) attributes are indexed, so only a very slight oversize is to be expected. For example, there is no need to create a new object field that would only serve this purpose. This would be totally useless and the space occupied would be greater than using an existing object (i.e.., the 4D Write Pro document itself).

 

 

In the case of object fields, “Automatic” means “cluster b-tree”, which is perfect for business letters or similar documents.

 

Making queries on this index

Classic queries must be made “by attribute”. Keep in mind that the attribute is a collection, so you must add opening and closing brackets [ ] to indicate to 4D that the search must be made within this collection. For example:

QUERY BY ATTRIBUTE([SAMPLE]; [SAMPLE]WP; "_keywords[]"; =; $val)

ORDA queries are carried out following the same principle:

$entitySel:=ds.SAMPLE.query("WP._keywords[] = :1"; $val)

Conclusion

If in your documents, this type of indexing was not foreseen from the beginning and the need arises, it’s never too late! Apply to selection of the indexing method will do the job perfectly. Searches can also be combined to take into account several words at once (all of the words) or separately (at least one of the words). This will allow you (and especially the users of your databases) to find texts very quickly according to the words they contain.

Now you have two methods to create (and maybe remove) the full-text index! They can be used starting with 4D V17. Check out this code snippet that you can use in your own application!

Roland Lannuzel

• Product Owner & 4D Expert •

After studying electronics, Roland went into industrial IT as a developer and consultant, building solutions for customers with a variety of databases and technologies. In the late 80’s he fell in love with 4D and has used it in writing business applications that include accounting, billing and email systems.

Eventually joining the company in 1997, Roland’s valuable contributions include designing specifications, testing tools, demos as well as training and speaking to the 4D community at many conferences. He continues to actively shape the future of 4D by defining new features and database development tools.