Text to vector refactoring#4375
Conversation
…ture "Document enrichment with LLMs"
…nt enrichment module
…nt enrichment module
There was a problem hiding this comment.
Would it be worth also moving the common part of this logic to SolrLanguageModel and generalizing it?
There was a problem hiding this comment.
Since each model might have different parameter, I wanted to keep this separated
| * limitations under the License. | ||
| */ | ||
|
|
||
| /** Contains model store related classes. */ |
There was a problem hiding this comment.
Isn't this rest specific?
There was a problem hiding this comment.
Yes, this is leftover from a copy-paste of the license. Changed
There was a problem hiding this comment.
I replied to the wrong comment. I changed it now, adding "rest"
| @@ -39,12 +39,12 @@ public void cleanup() throws Exception { | |||
| } | |||
|
|
|||
| @Test | |||
| public void testModelAreStoredCompact() throws Exception { | |||
| public void testModelAreStored() throws Exception { | |||
There was a problem hiding this comment.
Before it was saved as a compact file, this was done due to a (possibly) copy-paste leftover of this module from the LTR module, where it makes sense to save the files related to the models as compact, due to the possible high number of features. In both these contributions (text-to-vector and following document enrichment), it doesn't make sense to apply the compression to reduce space. This is also a consequence of the fact that the method
@Override
protected ManagedResourceStorage createStorage(
ManagedResourceStorage.StorageIO storageIO, SolrResourceLoader loader) throws SolrException {
return new ManagedResourceStorage.JsonStorage(storageIO, loader, -1);
}
from ManagedTextToVectorStore is no longer overridden. "-1" means "save in a compact format".
There was a problem hiding this comment.
So does it make sense to have this test?
Was this made just to check if compact?
There was a problem hiding this comment.
Yes, it can be removed.
| * via the REST API. Concrete subclasses supply the REST endpoint and the model instantiation logic. | ||
| */ | ||
| @ThreadSafe | ||
| public abstract class ManagedModelStore<M extends SolrLanguageModel> extends ManagedResource |
There was a problem hiding this comment.
Is there a better name variable instead of "M"?
There was a problem hiding this comment.
Generics are usually uppercase single letters that can be followed by a number. Do you still want me to rename this variable?
| /** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */ | ||
| public class TextToVectorModelStore { | ||
| /** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */ | ||
| public class LanguageModelStore<M extends SolrLanguageModel> { |
There was a problem hiding this comment.
Again, a better name instead of "M"?
There was a problem hiding this comment.
Same as before
|
|
||
| /** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */ | ||
| public class TextToVectorModelStore { | ||
| /** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */ |
There was a problem hiding this comment.
Is this class necessary? Can't we just have ManagedModelStore with everything inside?
There was a problem hiding this comment.
We have a different endpoint for each module, so I wanted to keep them different
| * via the REST API. Concrete subclasses supply the REST endpoint and the model instantiation logic. | ||
| */ | ||
| @ThreadSafe | ||
| public abstract class ManagedModelStore<M extends SolrLanguageModel> extends ManagedResource |
There was a problem hiding this comment.
Generics are usually uppercase single letters that can be followed by a number. Do you still want me to rename this variable?
| * limitations under the License. | ||
| */ | ||
|
|
||
| /** Contains model store related classes. */ |
There was a problem hiding this comment.
Yes, this is leftover from a copy-paste of the license. Changed
|
|
||
| /** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */ | ||
| public class TextToVectorModelStore { | ||
| /** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */ |
There was a problem hiding this comment.
We have a different endpoint for each module, so I wanted to keep them different
| /** Simple store to manage CRUD operations on the {@link SolrTextToVectorModel} */ | ||
| public class TextToVectorModelStore { | ||
| /** Generic store to manage CRUD operations on models that extend {@link SolrLanguageModel} */ | ||
| public class LanguageModelStore<M extends SolrLanguageModel> { |
There was a problem hiding this comment.
Same as before
| } | ||
|
|
||
| @Override | ||
| protected ManagedResourceStorage createStorage( |
There was a problem hiding this comment.
See the comment in TestModelManagerPersistence
| @ThreadSafe | ||
| public abstract class ManagedModelStore<M extends SolrLanguageModel> extends ManagedResource | ||
| implements ManagedResource.ChildResourceSupport { | ||
| private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); |
There was a problem hiding this comment.
I kept it here due to the fact that most of the logic is in the function that are already in the abstract class
| * @param modelMap a map containing {@code "class"}, {@code "name"}, and {@code "params"} keys | ||
| * @return the instantiated model | ||
| */ | ||
| protected abstract M fromModelMap(SolrResourceLoader loader, Map<String, Object> modelMap); |
There was a problem hiding this comment.
A static function cannot be declared as abstract
Description
This refactoring is needed to prepare the code for the addition of the new module "Document Enrichment with LLMs" already in PR #4229, to avoid duplicated code as much as possible.
Solution
Abstract classes have been created for both the model and the store/rest packages. Also, renamed the exception to generalize across models. Implemented actual instances of the new abstract classes for the Text-to-vector module.
Claude code has been used as an assistant for coding this PR.
Tests
No tests have been added, since only interfaces and abstract classes have been developed.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.