Skip to content

Text to vector refactoring#4375

Draft
nicolo-rinaldi wants to merge 7 commits intoapache:mainfrom
SeaseLtd:text-to-vector-refactoring
Draft

Text to vector refactoring#4375
nicolo-rinaldi wants to merge 7 commits intoapache:mainfrom
SeaseLtd:text-to-vector-refactoring

Conversation

@nicolo-rinaldi
Copy link
Copy Markdown
Contributor

Description

This refactoring is needed to prepare the code for the addition of the new module "Document Enrichment with LLMs" already in PR #4229, to avoid duplicated code as much as possible.

Solution

Abstract classes have been created for both the model and the store/rest packages. Also, renamed the exception to generalize across models. Implemented actual instances of the new abstract classes for the Text-to-vector module.
Claude code has been used as an assistant for coding this PR.

Tests

No tests have been added, since only interfaces and abstract classes have been developed.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

@github-actions github-actions Bot added the tests label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth also moving the common part of this logic to SolrLanguageModel and generalizing it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since each model might have different parameter, I wanted to keep this separated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To discuss

* limitations under the License.
*/

/** Contains model store related classes. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this rest specific?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is leftover from a copy-paste of the license. Changed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to the wrong comment. I changed it now, adding "rest"

@@ -39,12 +39,12 @@ public void cleanup() throws Exception {
}

@Test
public void testModelAreStoredCompact() throws Exception {
public void testModelAreStored() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before it was saved as a compact file, this was done due to a (possibly) copy-paste leftover of this module from the LTR module, where it makes sense to save the files related to the models as compact, due to the possible high number of features. In both these contributions (text-to-vector and following document enrichment), it doesn't make sense to apply the compression to reduce space. This is also a consequence of the fact that the method

@Override
  protected ManagedResourceStorage createStorage(
      ManagedResourceStorage.StorageIO storageIO, SolrResourceLoader loader) throws SolrException {
    return new ManagedResourceStorage.JsonStorage(storageIO, loader, -1);
  }

from ManagedTextToVectorStore is no longer overridden. "-1" means "save in a compact format".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does it make sense to have this test?
Was this made just to check if compact?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be removed.

* via the REST API. Concrete subclasses supply the REST endpoint and the model instantiation logic.
*/
@ThreadSafe
public abstract class ManagedModelStore<M extends SolrLanguageModel> extends ManagedResource
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name variable instead of "M"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generics are usually uppercase single letters that can be followed by a number. Do you still want me to rename this variable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe LM? to discuss

/** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */
public class TextToVectorModelStore {
/** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */
public class LanguageModelStore<M extends SolrLanguageModel> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a better name instead of "M"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before


/** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */
public class TextToVectorModelStore {
/** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class necessary? Can't we just have ManagedModelStore with everything inside?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a different endpoint for each module, so I wanted to keep them different

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To discuss

* via the REST API. Concrete subclasses supply the REST endpoint and the model instantiation logic.
*/
@ThreadSafe
public abstract class ManagedModelStore<M extends SolrLanguageModel> extends ManagedResource
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generics are usually uppercase single letters that can be followed by a number. Do you still want me to rename this variable?

* limitations under the License.
*/

/** Contains model store related classes. */
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is leftover from a copy-paste of the license. Changed


/** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */
public class TextToVectorModelStore {
/** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a different endpoint for each module, so I wanted to keep them different

/** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */
public class TextToVectorModelStore {
/** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */
public class LanguageModelStore<M extends SolrLanguageModel> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

}

@Override
protected ManagedResourceStorage createStorage(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment in TestModelManagerPersistence

@ThreadSafe
public abstract class ManagedModelStore<M extends SolrLanguageModel> extends ManagedResource
implements ManagedResource.ChildResourceSupport {
private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it here due to the fact that most of the logic is in the function that are already in the abstract class

* @param modelMap a map containing {@code "class"}, {@code "name"}, and {@code "params"} keys
* @return the instantiated model
*/
protected abstract M fromModelMap(SolrResourceLoader loader, Map<String, Object> modelMap);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A static function cannot be declared as abstract

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants