Uploading trained models
While model training occurs outside of the Learning to Rank plugin, you can use the plugin for logging feature scores. After you have trained a model, you can upload it to the plugin in the available serialization formats, such as RankLib and XGBoost.
RankLib model training
The feature logging process generates a RankLib-comsumable judgment file. In the following judgment file, the query with ID 1 rambo
includes the logged features 1 (a title TF*IDF
score) and 2 (a description TF*IDF
score) for a set of documents:
4 qid:1 1:9.8376875 2:12.318446 # 7555 rambo
3 qid:1 1:10.7808075 2:9.510193 # 1370 rambo
3 qid:1 1:10.7808075 2:6.8449354 # 1369 rambo
3 qid:1 1:10.7808075 2:0.0 # 1368 rambo
The RankLib library can be called using the following command:
cmd = "java -jar RankLib-2.8.jar -ranker %s -train%rs -save %s -frate 1.0" % (whichModel, judgmentsWithFeaturesFile, modelOutput)
The judgmentsWithFeatureFile
is the input provided to RankLib for training. Additional parameters can be passed. See the RankLib documentation for more information.
RankLib outputs the model in its own serialization format. As shown in the following example, a LambdaMART model is an ensemble of regression trees:
## LambdaMART
## No. of trees = 1000
## No. of leaves = 10
## No. of threshold candidates = 256
## Learning rate = 0.1
## Stop early = 100
<ensemble>
<tree id="1" weight="0.1">
<split>
<feature> 2 </feature>
...
Within the RankLib model, each tree in the ensemble examines feature values, makes decisions based on these feature values, and outputs the relevance scores. The features are referred to by their ordinal position, starting from 1, which corresponds to the 0th feature in the original feature set. RankLib does not use feature names during model training.
Other RankLib models
RankLib is a library that implements several other model types in addition to LambdaMART, such as MART, RankNet, RankBoost, AdaRank, Coordinate Ascent, ListNet, and Random Forests. Each of these models has its own set of parameters and training process.
For example, the RankNet model is a neural network that learns to predict the probability of a document being more relevant than another document. The model is trained using a pairwise loss function that compares the predicted relevance of two documents with the actual relevance. The model is serialized in a format similar to the following example:
## RankNet
## Epochs = 100
## No. of features = 5
## No. of hidden layers = 1
...
## Layer 1: 10 neurons
1 2
1
10
0 0 -0.013491530393429608 0.031183180961270988 0.06558792020112071 -0.006024092627087733 0.05729619574181734 -0.0017010373987742411 0.07684848696852313 -0.06570387602230028 0.04390491141617467 0.013371636736099578
...
All these models can be used with the Learning to Rank plugin, provided that the model is serialized in the RankLib format.
XGBoost model training
Unlike the RankLib model, the XGBoost model is serialized in a format specific to gradient-boosted decision trees, as shown in the following example:
[ { "nodeid": 0, "depth": 0, "split": "tmdb_multi", "split_condition": 11.2009, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "tmdb_title", "split_condition": 2.20631, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "leaf": -0.03125 },
...
XGBoost parameters
Optional parameters can be specified for an XGBoost model. These parameters are specified as an object, with the decision trees specified in the splits
field. The supported parameters include objective
, which defines the model learning objective as described in the XGBoost documentation. This parameter can transform the final model prediction. The supported values include binary:logistic
, binary:logitraw
, rank:ndcg
, rank:map
, rank:pairwise
, reg:linear
, and reg:logistic
.
Simple linear models
Machine learning (ML) models, such as Support Vector Machines (SVMs), output linear weights for each feature. The LTR model supports representing these linear weights in a simple format, such as those learned from an SVM or linear regression model. In the following example output, the weights indicate the relative importance of the features in the model’s prediction:
{
"title_query" : 0.3,
"body_query" : 0.5,
"recency" : 0.1
}
Feature normalization
Feature normalization is used to convert feature values to a consistent range, typically between 0 and 1 or -1 and 1. This is done during the training phase to better understand the relative impact of each feature. Some models, especially linear ones such as SVMRank, rely on normalization to function correctly.
Model upload process
After training your model, the next step is to make it available for search operations. This involves uploading the model to the Learning to Rank plugin. When uploading a model, you must provide the following information:
- Feature set used during training
- Model type, for example, RankLib or XGBoost
- Model content
The following example request shows how to upload a RankLib model that was trained using the more_movie_features
feature set:
POST _ltr/_featureset/more_movie_features/_createmodel
{
"model": {
"name": "my_ranklib_model",
"model": {
"type": "model/ranklib",
"definition": "## LambdaMART\n
## No. of trees = 1000
## No. of leaves = 10
## No. of threshold candidates = 256
## Learning rate = 0.1
## Stop early = 100
<ensemble>
<tree id="1" weight="0.1">
<split>
<feature> 2 </feature>
...
"
}
}
}
The following example request shows how to upload an XGBoost model that was trained using the more_movie_features
feature set:
POST _ltr/_featureset/more_movie_features/_createmodel
{
"model": {
"name": "my_xgboost_model",
"model": {
"type": "model/xgboost+json",
"definition": "[ { \"nodeid\": 0, \"depth\": 0, \"split\": \"tmdb_multi\", \"split_condition\": 11.2009, \"yes\": 1, \"no\": 2, \"missing\": 1, \"children\": [
{ \"nodeid\": 1, \"depth\": 1, \"split\": \"tmdb_title\", \"split_condition\": 2.20631, \"yes\": 3, \"no\": 4, \"missing\": 3, \"children\": [
{ \"nodeid\": 3, \"leaf\": -0.03125 },
..."
}
}
}
The following example request shows how to upload an XGBoost model that was trained using the more_movie_features
feature set with parameters:
POST _ltr/_featureset/more_movie_features/_createmodel
{
"model": {
"name": "my_xgboost_model",
"model": {
"type": "model/xgboost+json",
"definition": "{
\"objective\": \"reg:logistic\",
\"splits\": [ { \"nodeid\": 0, \"depth\": 0, \"split\": \"tmdb_multi\", \"split_condition\": 11.2009, \"yes\": 1, \"no\": 2, \"missing\": 1, \"children\": [
{ \"nodeid\": 1, \"depth\": 1, \"split\": \"tmdb_title\", \"split_condition\": 2.20631, \"yes\": 3, \"no\": 4, \"missing\": 3, \"children\": [
{ \"nodeid\": 3, \"leaf\": -0.03125 },
...
]
}"
}
}
}
The following example request shows how to upload a simple linear model that was trained using the more_movie_features
feature set:
POST _ltr/_featureset/more_movie_features/_createmodel
{
"model": {
"name": "my_linear_model",
"model": {
"type": "model/linear",
"definition": """
{
"title_query" : 0.3,
"body_query" : 0.5,
"recency" : 0.1
}
"""
}
}
}
Creating a model with feature normalization
Feature normalization is a crucial preprocessing step that can be applied before model evaluation. LTR supports two types of feature normalization: min-max and standard normalization.
Standard normalization
Standard normalization transforms features as follows:
- Maps the mean value to 0
- Maps one standard deviation above the mean to 1
- Maps one standard deviation below the mean to -1
The following example request shows how to create a model with standard feature normalization:
POST _ltr/_featureset/more_movie_features/_createmodel
{
"model": {
"name": "my_linear_model",
"model": {
"type": "model/linear",
"feature_normalizers": {
"release_year": {
"standard": {
"mean": 1970,
"standard_deviation": 30
}
}
},
"definition": """
{
"release_year" : 0.3,
"body_query" : 0.5,
"recency" : 0.1
}
"""
}
}
}
Min-max normalization
Min-max normalization scales features to a fixed range, typically between 0 and 1. Min-max normalization transforms features as follows:
- Maps the specified minimum value to 0
- Maps the specified maximum value to 1
- Scales the values between 0 and 1 linearly
The following example request shows how to implement min-max normalization:
"feature_normalizers": {
"vote_average": {
"min_max": {
"minimum": 0,
"maximum": 10
}
}
}
Model independence from feature sets
Models are initially created with reference to a feature set. After their creation, they exist as independent top-level entities.
Accessing models
To retrieve a model, use a GET request:
GET _ltr/_model/my_linear_model
To delete a model, use a DELETE request:
DELETE _ltr/_model/my_linear_model
Model names must be globally unique across all feature sets.
Model persistence
When a model is created, its features are copied. This prevents changes to the original features from affecting existing models or model production. For example, if the feature set used to create the model is deleted, you can still access and use the model.
Model response
When retrieving a model, you receive a response that includes the features used to create it, as shown in the following example:
{
"_index": ".ltrstore",
"_type": "store",
"_id": "model-my_linear_model",
"_version": 1,
"found": true,
"_source": {
"name": "my_linear_model",
"type": "model",
"model": {
"name": "my_linear_model",
"feature_set": {
"name": "more_movie_features",
"features": [
{
"name": "body_query",
"params": [
"keywords"
],
"template": {
"match": {
"overview": ""
}
}
},
{
"name": "title_query",
"params": [
"keywords"
],
"template": {
"match": {
"title": ""
}
}
}
]}}}
Next steps
Learn about searching with LTR.