Truncation is a powerful search technique used in databases to shorten a keyword and retrieve other words that start with the same group of letters. It is also known as wildcard search and is used to search for the root form of a word with all its different endings. Truncation allows you to search for different forms of a word simultaneously, increasing the number of search results found. When dealing with batch entries of different lengths, truncation and padding are strategies used to create rectangular tensors.
Padding adds a special padding token to ensure that the shortest sequences have the same length as the longest sequence in a batch or the maximum length accepted by the model. Truncation works in the other direction when truncating long sequences. The other strategies are only_first and only_second, which refer to whether truncation should be applied exclusively to the first or second set of entries. For example, different databases use different truncation symbols, so it is important to consult the 'Help' or 'Search Tips' information in the database for details on which symbol to use.
In most cases, filling the batch to the longest sequence length and truncating it to the maximum length that a model can accept works quite well. If the model does not have a specific maximum input length, truncation or padding to max_length is disabled. To truncate a search term, perform a keyword search in a database, but remove the end of the word and add an asterisk (*) at the end of the word. For example, building with a truncation symbol will search for construction, building, building, builder, etc.
If you want to narrow or expand your search, you may want to find out if there are truncation and proximity operators available in the database you are using. Truncation can also be used in TF-IDF (term frequency-inverse document frequency) score tokens. The truncation process would maintain the 512 core TF-IDF score tokens or only the first 512 tokens. This YouTube clip from the Gumberg Library gives a little more information about using quotes, truncation and wildcards. In conclusion, truncation is an effective way to increase your search results when looking for different forms of a word. It is important to consult the 'Help' or 'Search Tips' information in the database for details on which symbol to use and how it should be applied.