Faithfulness

Evaluates the faithfulness of the generated answers with respect to retrieved contexts.

This metric uses a judge LLM to measure the factual consistency of the generated answer against the given context(s). It is calculated using the generated text answer from the LLM and the retrieved contexts which come from some RAG process. The score is a value between 0 and 1, where a higher score indicates that the generated answer is more faithful to the given context(s).

The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context. To calculate this a set of claims from the generated answer is first identified. Then each one of these claims are cross checked with given context to determine if it can be inferred from given context or not. The faithfulness score formula is as follows:

\[ \\text{Faithfulness score} = {|\\text{Number of claims in the generated answer that can be inferred from given context}| \\over |\\text{Total number of claims in the generated answer}|} \]

Configuring Columns

This metric requires the following columns in your dataset: - contexts (List[str]): A list of text contexts which are retrieved to generate the answer. - answer (str): The response generated by the model which will be evaluated for faithfulness against the given contexts.

If the above data is not in the appropriate column, you can specify different column names for these fields using the parameters contexts_column and answer_column.

For example, if your dataset has this data stored in different columns, you can pass the following parameters:

{
contexts_column": "context_info
answer_column": "my_answer_col",
}

If the data is stored as a dictionary in another column, specify the column and key like this:

pred_col = dataset.prediction_column(model)
params = {
contexts_column": f"{pred_col}.contexts",
answer_column": f"{pred_col}.answer",
}

For more complex situations, you can use a function to extract the data:

pred_col = dataset.prediction_column(model)
params = {
contexts_column": lambda row: [row[pred_col]["context_message"]],
answer_column": lambda row: "\\n\\n".join(row[pred_col]["messages"]),
}