Jump to command-line usage.
A semantic space can be used to compute a similarity score between two words or phrases, given a similarity measure:
#ex06.py #------- from composes.utils import io_utils from composes.similarity.cos import CosSimilarity #load a space my_space = io_utils.load("./data/out/ex01.pkl") print my_space.cooccurrence_matrix print my_space.id2row #compute similarity between two words in the space print my_space.get_sim("car", "car", CosSimilarity()) print my_space.get_sim("car", "book", CosSimilarity())
List of available similarity measures.
The words/phrases to be compared DO NOT have to be stored in the same semantic space (although of course they must be represented by the same number of dimensions). Computing similarities between elements from different spaces:
#ex07.py #------- from composes.utils import io_utils from composes.similarity.cos import CosSimilarity #load two spaces my_space = io_utils.load("./data/out/ex01.pkl") my_per_space = io_utils.load("./data/out/PER_SS.ex05.pkl") print my_space.id2row print my_per_space.id2row #compute similarity between a word and a phrase in the two spaces print my_space.get_sim("car", "sports_car", CosSimilarity(), space2 = my_per_space)
A semantic space can also be used to compute the k-nearest neighbours of a word or phrase, according to a similarity measure:
#ex08.py #------- from composes.utils import io_utils from composes.similarity.cos import CosSimilarity #load a space my_space = io_utils.load("./data/out/ex01.pkl") #get the top 2 neighbours of "car" print my_space.get_neighbours("car", 2, CosSimilarity())
Again, the neighbours can be extracted from a different space:
#ex09.py #------- from composes.utils import io_utils from composes.similarity.cos import CosSimilarity #load two spaces my_space = io_utils.load("./data/out/ex01.pkl") my_per_space = io_utils.load("./data/out/PER_SS.ex05.pkl") print my_space.id2row print my_space.cooccurrence_matrix print my_per_space.id2row print my_per_space.cooccurrence_matrix #get the top two neighbours of "car" in a peripheral space print my_space.get_neighbours("car", 2, CosSimilarity(), space2 = my_per_space)
Usage:
python2.7 compute_similarities.py [options] [config_file]Options:
- -i, --input input_comparison_file¶
Input file containing the list of word/phrase pairs to be compared (NB: if an element of a pair to be compared is not in the space(s) used for the comparison, the pair will be assigned 0 similarity).
- -c, --columns columns_in_the_input_file¶
Columns in the input file containing the words/phrases to be compared. For example -c 1,2 if the words/phrases are given as the first two columns.
- -o, --output directory¶
Output directory. After running the command, this directory will contain new text files with names SIMS.input_comparison_file.space_file.similarity_measure (e.g., SIMS.word_pairs1.txt.CORE_SS.myfile.ppmi.euclidean) or SIMS.input_comparison_file.space_file1.space_file2.similarity_measure (names of this sort become quickly monstrous, as in: SIMS.word_pairs2.txt.CORE_SS.myfile.ppmi.nmf_200.PER_SS.perfile.CORE_SS.myfile.ppmi.nmf_200.cos). Note that a separate file is created for each input semantic space or semantic space pair, and for each similarity measure. The output files contain the lines of the input file with the similarity score of the word/phrase pair they contain appended (e.g., if input contained line car book, output might contain car book 0.438529009654; if input has line car book 1, output will have car book 1 0.438529009654).
- -s, --space space_file or space_file1,space_file2¶
File(s) containing the space(s) to be used. If a second file is provided, the second element of the pairs is retrieved from the additional space. Pickle format (and .pkl extension) required. One of -s or –in_dir required.
- --in_dir directory¶
Input directory for the semantic spaces. If provided, all files with .pkl extension in the input directory are loaded one at a time and the -s value is ignored. In this case, output files will be produced for all input files, but it is not possible to request cross-space measurements as it is with the -s option. One of -s or –in_dir required.
- -m, --sim_measure similarity_measures¶
List of comma-separated similarity measures. Example: cos,lin. List of available similarity measures.
- -l, --log file¶
Logger output file. Optional, by default no logging output is produced.
- -h, --help¶
Displays help message.
Examples:
python2.7 compute_similarities.py -i ../examples/data/in/word_pairs1.txt -c 1,2 -s ../examples/data/out/ex01.pkl -o ../examples/data/out/ -m cos,euclidean python2.7 compute_similarities.py -i ../examples/data/in/word_pairs2.txt -c 1,2 -s ../examples/data/out/ex01.pkl,../examples/data/out/PER_SS.ex05.pkl -o ../examples/data/out/ -m cos,euclidean
Usage:
python2.7 compute_neighbours.py [options] [config_file]Options:
- -i, --input input_file¶
Input file containing the list of words/phrases, one per line.
- -o, --output directory¶
Output directory. Naming conventions as for compute_similarities.py with prefix NEIGHBOURS instead of SIMS. The output file contains each input word/phrase on a separate line, followed by tab-prefixed lines showing the neighbours and corresponding similarity scores.
- -s, --space space_file or space_file1,space_file2¶
File(s) containing the space(s) to be used. If a second file is provided, the neighbours are extracted from the additional space. Pickle format (and .pkl extension) required.
- -m, --sim_measure similarity_measure¶
Similarity measure. Example: cos. List of available similarity measures.
- -n, --no_neighbours number_of_neighbours¶
Number of neighbours to be returned. Optional, default: 20.
- -l, --log file¶
Logger output file. Optional, by default no logging output is produced.
- -h, --help¶
Displays help message.
Examples:
python2.7 compute_neighbours.py -i ../examples/data/in/word_list.txt -n 2 -s ../examples/data/out/ex01.pkl -o ../examples/data/out/ -m cos python2.7 compute_neighbours.py -i ../examples/data/in/word_list.txt -n 2 -s ../examples/data/out/ex01.pkl,../examples/data/out/PER_SS.ex05.pkl -o ../examples/data/out/ -m cos