Evaluation of similarity scores¶

One common usage of the similarity scores for words and phrases that distributional semantic models produce is to compare them to gold standard similarity values, e.g., those elicited from humans in experiments. DISSECT currently supports 3 standard measures to compare the model scores against other numerical values: Pearson and Spearman correlation and AUC.

Jump to command-line usage.

Python code¶

The toy input file word_sims.txt contains word pairs with “gold standard” scores.

#ex20.py
#-------
from composes.utils import io_utils
from composes.utils import scoring_utils
from composes.similarity.cos import CosSimilarity

#read in a space
my_space = io_utils.load("data/out/ex01.pkl")

#compute similarities of a list of word pairs
fname = "data/in/word_sims.txt"
word_pairs = io_utils.read_tuple_list(fname, fields=[0,1])
predicted = my_space.get_sims(word_pairs, CosSimilarity())

#compute correlations
gold = io_utils.read_list(fname, field=2)
print "Spearman"
print scoring_utils.score(gold, predicted, "spearman")
print "Pearson"
print scoring_utils.score(gold, predicted, "pearson")

On the command line¶

The following script can be used to evaluate similarity scores against some gold standard.

Usage:
python2.7 evaluate_similarities.py [options] [config_file]
Options:

-i, --input input_file¶

Input file containing the gold and predicted similarity scores (as produced, for example, by running the compute_similarities.py script on a list of pairs already annotated with gold scores). One of -i or –in_dir has to be provided.

-c, --columns columns_in_the_input_file¶

Columns in the input file containing the gold and the predicted similarity scores. For example -c 3,4 if the gold score is in field 3 and the model-generated similarity score in field 4 (relative order of gold and similarity score does not matter, but it has to be consistent across lines, of course).

--in_dir input_directory¶

When provided, all the files in this directory are treated as input files (they should be in same format as for -i) and evaluated. One of -i or –in_dir has to be provided.

-m, --correlation_measure correlation_measures¶

List of comma-separated correlation measures. Example: pearson,spearman. Correlation measure must be one of auc, pearson or spearman.

--filter filter_string¶

If in_dir is provided, it acts as a filter on the files in this directory: file names not containing the filter string are ignored. Optional, by default no filter is used.

-l, --log file¶

Logger output file. Optional, by default no logging output is produced.

-h, --help¶

Displays help message.

Examples:
python2.7 evaluate_similarities.py -i ../examples/data/in/sim_data.txt -c 3,5 -m pearson,spearman
python2.7 evaluate_similarities.py --in_dir ../examples/data/in/ --filter sim_data -c 3,5 -m pearson,spearman

Here is how the output of the second command (sent to standard output) looks like:

sim_data.txt
CORRELATION:pearson
          -0.988618
CORRELATION:spearman
          -0.866025
sim_data2.txt
CORRELATION:pearson
          -0.150445
CORRELATION:spearman
          -0.500000
sim_data3.txt
CORRELATION:pearson
          -0.988618
CORRELATION:spearman
          -0.866025

Table Of Contents

Previous topic

Next topic

This Page

Evaluation of similarity scores¶

Python code¶

On the command line¶

Navigation

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Evaluation of similarity scores¶

Python code¶

On the command line¶

Navigation