Table Of Contents

Previous topic

Similarity queries

Next topic

Practice with realistic data

This Page

Evaluation of similarity scores

One common usage of the similarity scores for words and phrases that distributional semantic models produce is to compare them to gold standard similarity values, e.g., those elicited from humans in experiments. DISSECT currently supports 3 standard measures to compare the model scores against other numerical values: Pearson and Spearman correlation and AUC.

Jump to command-line usage.

Python code

The toy input file word_sims.txt contains word pairs with “gold standard” scores.

#ex20.py
#-------
from composes.utils import io_utils
from composes.utils import scoring_utils
from composes.similarity.cos import CosSimilarity

#read in a space
my_space = io_utils.load("data/out/ex01.pkl")

#compute similarities of a list of word pairs
fname = "data/in/word_sims.txt"
word_pairs = io_utils.read_tuple_list(fname, fields=[0,1])
predicted = my_space.get_sims(word_pairs, CosSimilarity())

#compute correlations
gold = io_utils.read_list(fname, field=2)
print "Spearman"
print scoring_utils.score(gold, predicted, "spearman")
print "Pearson"
print scoring_utils.score(gold, predicted, "pearson")

On the command line

The following script can be used to evaluate similarity scores against some gold standard.

Usage:

python2.7 evaluate_similarities.py [options] [config_file]

Options:

-i, --input input_file

Input file containing the gold and predicted similarity scores (as produced, for example, by running the compute_similarities.py script on a list of pairs already annotated with gold scores). One of -i or –in_dir has to be provided.

-c, --columns columns_in_the_input_file

Columns in the input file containing the gold and the predicted similarity scores. For example -c 3,4 if the gold score is in field 3 and the model-generated similarity score in field 4 (relative order of gold and similarity score does not matter, but it has to be consistent across lines, of course).

--in_dir input_directory

When provided, all the files in this directory are treated as input files (they should be in same format as for -i) and evaluated. One of -i or –in_dir has to be provided.

-m, --correlation_measure correlation_measures

List of comma-separated correlation measures. Example: pearson,spearman. Correlation measure must be one of auc, pearson or spearman.

--filter filter_string

If in_dir is provided, it acts as a filter on the files in this directory: file names not containing the filter string are ignored. Optional, by default no filter is used.

-l, --log file

Logger output file. Optional, by default no logging output is produced.

-h, --help

Displays help message.

Examples:

python2.7 evaluate_similarities.py -i ../examples/data/in/sim_data.txt -c 3,5 -m pearson,spearman
python2.7 evaluate_similarities.py --in_dir ../examples/data/in/ --filter sim_data -c 3,5 -m pearson,spearman

Here is how the output of the second command (sent to standard output) looks like:

sim_data.txt
CORRELATION:pearson
          -0.988618
CORRELATION:spearman
          -0.866025
sim_data2.txt
CORRELATION:pearson
          -0.150445
CORRELATION:spearman
          -0.500000
sim_data3.txt
CORRELATION:pearson
          -0.988618
CORRELATION:spearman
          -0.866025