January 26, 2024
Release highlights
This release includes numerous improvements to the developer framework, including new features for model and dataset initialization, easier testing, support for additional inputs and the Azure OpenAI API, updated notebooks, bug fixes, and much more.
ValidMind Developer Framework (v1.25.3)
Improvements to init_model
When initializing a model, you can now pass a dataset with pre-computed model predictions if they are available. By default, if no prediction column is specified when calling init_model
, the ValidMind Developer Framework will compute the model predictions on the entire dataset.
To illustrate how passing a dataset that includes a prediction column can help, consider the following example without a prediction column:
= vm.init_model(
vm_model
model,=vm_train_ds,
train_ds=vm_test_ds,
test_ds )
Internally, this example invokes the predict()
method of the model for the training and test datasets when the model is initialized. This approach can be problematic with large datasets: init_model
can simply take too long to compute.
You can now avoid this issue by providing a dataset with a column containing pre-computed predictions, similar to the following example. If init_model
detects this column, it will not generate model predictions at runtime.
x1 | x2 | … | target_column | prediction_column |
---|---|---|---|---|
0.1 | 0.2 | … | 0 | 0 |
0.2 | 0.4 | 1… | 1 | 1 |
Usage example with a prediction column:
vm.init_dataset(=df,
dataset=[...],
feature_columns= ...,
target_column={
extra_columns'NAME-OF-PREDICTION-COLUMN',
prediction_column:
}, )
Improvements to init_dataset
When initializing a dataset, the new feature_columns
argument lets you specify a list of feature names for prediction to improve efficiency. Internally, the function filters the dataset to retain only these specified features for prediction-related tasks, leaving the remaining dataset available for other purposes, such as segmentation.
This improvement replaces the existing behavior of init_dataset
, which loaded the entire dataset, incorporating all available features for prediction tasks. While this approach worked well, it could impose limitations when generating segmented tests and proved somewhat inefficient with large datasets containing numerous features, of which only a small subset were relevant for prediction.
Usage example:
= ['CreditScore', 'Age', 'Balance', 'NumOfProducts', 'EstimatedSalary']
feature_columns
= vm.init_dataset(
vm_train_ds =train_df,
dataset=demo_dataset.target_column,
target_column=feature_columns
feature_columns )
A new notebook illustrates how you can configure these dataset features:
- How to utilize the
feature_columns
parameter when initizalizingvalidmind
datasets and model objects - How
feature_columns
can be used to report by segment
Improvements to run_documentation_tests()
The run_documentation_tests()
function, used to collect and run all the tests associated with a template, now supports running multiple sections at a time. This means that you no longer need to call the same function twice for two different sections, reducing the potential for errors and enabling you to use a single config
object. The previous behavior was to allow running only one section at a time. This change maintains backward compatibility with the existing syntax, requiring no updates to your code.
Existing example usage: Multiple function calls are needed to run multiple sections
= vm.run_documentation_tests(
full_suite = {
inputs
...
},="section_1",
section={
config"validmind.tests.data_validation.ClassImbalance": ...
}
)
= vm.run_documentation_tests(
full_suite = {
inputs
...
},="section_2",
section={
config"validmind.tests.data_validation.Duplicates": ...
} )
New example usage: A single function call runs multiple sections
= vm.run_documentation_tests(
full_suite = {
inputs
...
},=["section_1", "section_2"],
section={
config"validmind.tests.data_validation.ClassImbalance": ...,
"validmind.tests.data_validation.Duplicates": ...
} )
Support for custom inputs
The ValidMind Developer Framework now supports passing custom inputs as an inputs
dictionary when running individual tests or test suites. This support replaces the standard inputs dataset
, model
, and models
, which are now deprecated.
New recommended syntax for passing inputs:
= vm.run_documentation_tests(
test_suite ={
inputs"dataset": vm_dataset,
"model": vm_model,
}, )
To make it easier for you to adopt custom inputs, we have updated our how-to notebooks and code samples to use the new recommended syntax:
- How-to notebooks, including:
- Code samples, including:
- NLP and LLM models
- Regression models
- Time series models
Also check Standard inputs are deprecated.
Enhancements
Support for Azure OpenAI Service. The ValidMind Developer Framework now supports running LLM-powered tests with the Azure OpenAI Service via API, in addition to the previously supported OpenAI API. To work with Azure OpenAI API endpoints, you need to set the following environment variables before calling
vm.init()
:AZURE_OPENAI_KEY
: API key for authenticationAZURE_OPENAI_ENDPOINT
: API endpoint URLAZURE_OPENAI_MODEL
: Specifies the language model or service to useAZURE_OPENAI_VERSION
(optional): Allows specifying a specific version of the service if available
To learn more about configuring Azure OpenAI Service, see Authentication in the official Microsoft documentation.
Bug fixes
- Fixed support for OpenAI library >=1.0. We have updated our demonstration notebooks for large language models (LLMs) to provide the correct support for
openai >= 1.0.0
. Previously, some notebooks were using an older version of the OpenAI client API.
Deprecations
Standard inputs are deprecated. The ValidMind Developer Framework now supports passing custom inputs as an
inputs
dictionary when running individual tests or test suites. As a result, the standard inputsdataset
,model
, andmodels
are deprecated and might be removed in a future release. If you are a developer, you should update your code to use the new, recommended syntax.Deprecated legacy usage for passing inputs:
= vm.run_documentation_tests( test_suite =vm_dataset, dataset=vm_model model )
New recommended usage for passing inputs:
= vm.run_documentation_tests( test_suite ={ inputs"dataset": vm_dataset, "model": vm_model, }, )
Also check Support for custom inputs.
Removed deprecated high-level API methods: The API methods
run_template
andrun_test_plan
had been deprecated previously. They have now been removed from the ValidMind Developer Framework.If you are a developer, you should update your code to use the recommended high-level API methods:
run_template
(removed): Usevm.run_documentation_tests
run_test_plan
(removed) : Usevm.run_test_suite
User guide
Updated Python requirements. We have updated our user guide to clarify the Python versions supported by the ValidMind Developer Framework. We now support Python ≧3.8 and <3.11.
How to upgrade
To access the latest version of the ValidMind Platform UI, reload your browser tab.
To upgrade the ValidMind Developer Framework:
Using JupyterHub: Reload your browser tab and re-run the
%pip install --upgrade validmind
cell.In your own developer environment: Restart your notebook and re-run:
%pip install validmind