Table Of Contents

This Page

semantic_space Package

operation Module

Created on Jun 6, 2012

@author: thenghia.pham

class composes.semantic_space.operation.DimensionalityReductionOperation(dim_reduction)

Bases: composes.semantic_space.operation.Operation

This class implements the application and the projection of dimensionality reduction transformations.

apply(matrix_)

Applies a dim. reduction operation.

Args:
matrix_: matrix on which the reduction is applied, of type Matrix
Returns:
the reduced matrix

The transformation matrix obtained in the reduction (specific to each reduction method) is stored in the operation object. This transformation matrix is further used for projecting the dim. reduction method on a space peripheral to the space on which it has been originally applied.

project(matrix_)

Projects a dim. reduction operation.

Args:
matrix_: matrix on which the reduction is projected, of type Matrix
Returns:
the reduced matrix

Uses the transformation matrix stored in the operation object to project the dimensionality reduction method on a new space, peripheral to the original one.

class composes.semantic_space.operation.FeatureSelectionOperation(feat_selection)

Bases: composes.semantic_space.operation.Operation

This class implements the application and the projection of feature selection transformations.

apply(matrix_)

Applies a dim. feature selection operation.

Args:
matrix_: matrix on which the reduction is applied, of type Matrix
Returns:
the reduced matrix

The columns selected are stored in the operation object. These are further used for projecting the feature selection method on a space peripheral to the original space on which it has been applied.

get_original_columns()
get_selected_columns()
original_columns

List of strings, the id2column of the space before applying the feature selection.

project(matrix_)

Projects a feature selection operation.

Args:
matrix_: matrix on which the selection is applied, of type Matrix
Returns:
the reduced matrix

Uses the information on selected columns stored in the operation object to project the feature selection method on a new space, peripheral to the original one.

selected_columns

List of integers, indices of the columns selected.

set_original_columns(original_columns)
class composes.semantic_space.operation.Operation

Bases: object

This class implements both the application, and the projection of a transformation on a semantic space.

An operation object can be used to apply or to project a specific transformation on a semantic space. After a transformation is applied, for example on a core space, the operation class stores the information required to further project this same operation onto a space peripheral to the core space.

class composes.semantic_space.operation.ScalingOperation(scaling)

Bases: composes.semantic_space.operation.Operation

This class implements the application and the projection of scaling transformations.

apply(matrix_)

Applies a scaling operation.

Args:
matrix_: matrix on which the scaling is applied, of type Matrix
Returns:
the scaled matrix

The column statistics computed by the scaling transformation, if any, is stored in the current operation object. For example, PPMI scaling needs column sums in order to be projected on peripheral spaces, while PLOG scaling does not require this.

project(matrix_)

Projects a scaling operation.

Args:
matrix_: matrix on which the scaling is projected, of type Matrix
Returns:
the scaled matrix

If the current operation object has column_stats, this structure is used in the projection.

peripheral_space Module

Created on Sep 26, 2012

@author: georgianadinu

class composes.semantic_space.peripheral_space.PeripheralSpace(core_space, matrix_, id2row, row2id=None)

Bases: composes.semantic_space.space.Space

classdocs

add_rows(matrix_, id2row)

Adds rows to a peripheral space.

Args:
matrix_: Matrix type, the matrix of the elements to be added. id2row: list, string identifiers of the rows to be added.

Modifies the current space by appending the new rows. All operations of the core space are projected to the new rows.

Raises:
ValueError: if attempting to add row strings which are already
in the space. matrix of the new data is not consistent in shape with the current data matrix.
classmethod build(core_space, **kwargs)

Reads in data files and extracts the data to construct a semantic space.

If the data is read in dense format and no columns are provided, the column indexing structures are set to empty.

Args:

data: file containing the counts format: format on the input data file: one of sm/dm rows: file containing the row elements. Optional, if not provided,

extracted from the data file.

cols: file containing the column elements

Returns:
A semantic space build from the input data files.
Raises:
ValueError: if one of data/format arguments is missing.
if cols is missing and format is “sm” if the input columns provided are not consistent with the shape of the matrix (for “dm” format)

space Module

Created on Sep 21, 2012

@author: georgianadinu

class composes.semantic_space.space.Space(matrix_, id2row, id2column, row2id=None, column2id=None, **kwargs)

Bases: object

This class implements semantic spaces.

A semantic space describes a list of targets (words, phrases, etc.) in terms of co-occurrence with contextual features.

It contains a matrix storing (some type of) co-occurrence strength values between targets and contextual features: by convention, targets are rows and features are columns. The space also stores structures that encode the mappings between the matrix row/column indices and the associated target/context-feature strings.

Transformations which rescale the matrix elements can be applied to a semantic space. A semantic also space allows for similarity computations between row elements of the space.

apply(transformation)

Applies a transformation on the current space.

All transformations affect the data matrix. If the transformation reduces the dimensionality of the space, the column indexing structures are also updated. The operation applied is appended to the list of operations that the space holds.

Args:
transformation: of type Scaling, DimensionalityReduction or
FeatureSelection
Returns:
A new space on which the transformation has been applied.
assert_1dim_element()

Asserts that the elements of the space are one dimensional.

classmethod build(**kwargs)

Reads in data files and extracts the data to construct a semantic space.

If the data is read in dense format and no columns are provided, the column indexing structures are set to empty.

Args:

data: file containing the counts format: format on the input data file: one of sm/dm rows: file containing the row elements. Optional, if not provided,

extracted from the data file.

cols: file containing the column elements

Returns:
A semantic space build from the input data files.
Raises:
ValueError: if one of data/format arguments is missing.
if cols is missing and format is “sm” if the input columns provided are not consistent with the shape of the matrix (for “dm” format)
column2id

Dictionary, maps column strings to integer ids.

cooccurrence_matrix

Co-occurrence matrix associated to the semantic space, of type Matrix.

element_shape

Shape of row elements, of type tuple. By default, in standard spaces, element_shape=(no_cols,).

Used in composition models which build word representations which are matrices or higher order tensors, instead of simple vectors. If the representation of a word is a matrix of shape (2,2) for example, then element_shape=(2,2). The actual space matrix stores each element as a linearized vector, just as in standard spaces.

export(file_prefix, **kwargs)

Exports the current space to disk. If the space has no column information, it cannot be exported in sparse format (sm).

Args:
file_prefix: string, prefix of the files to be exported format: string, one of dm/sm
Prints:
  • matrix in file_prefix.<format>
  • row elements in file_prefix.<row>
  • col elements in file_prefix.<col>
Raises:
ValueError: if the space has no column info and “sm” exporting
is attempted
NotImplementedError: the space matrix is dense and “sm” exporting
is attempted
get_column2id()
get_cooccurrence_matrix()
get_element_shape()
get_id2column()
get_id2row()
get_neighbours(word, no_neighbours, similarity, space2=None)

Computes the neighbours of a word in the semantic space.

Args:

word: string, target word no_neighbours: int, the number of neighbours desired similarity: of type Similarity, the similarity measure to be used space2: Space type, Optional. If provided, the neighbours are

retrieved from this space, rather than the current space. Default, neighbours are retrieved from the current space.
Returns:
list of (neighbour_string, similarity_value) tuples.
Raises:
KeyError: if the word is not found in the semantic space.
get_operations()
get_row(word)

Returns the row vector of a word.

Args:
word: string

Returns: Matrix type (of shape (1, no_cols)), the row of the word argument.

Raises:
KeyError: if the word is not found in the space
get_row2id()
get_rows(words)

Returns the sub-matrix corresponding to a list of words.

Args:
words: list of strings
Returns: Matrix type (of shape (len(words), no_cols)),
the sub-matrix containing the words given as an input.
Raises:
KeyError: if one of words is not found in the space
get_sim(word1, word2, similarity, space2=None)

Computes the similarity between two targets in the semantic space.

If one of the two targets to be compared is not found, it returns 0..

Args:

word1: string word2: string similarity: of type Similarity, the similarity measure to be used space2: Space type, Optional. If provided, word2 is interpreted in

this space, rather than the current space. Default, both words are interpreted in the current space.
Returns:
scalar, similarity score
get_sims(word_pair_list, similarity, space2=None)

Computes the similarity between two LIST of targets in the semantic space.

If one of the two targets to be compared is not found, it returns 0..

Args:

word_pair_list: list of (string, string) tuples. Words to be compared. similarity: of type Similarity, the similarity measure to be used space2: Space type, Optional. If provided, the second word of the word pairs

is interpreted in this space, rather than the current space. Default, both words are interpreted in the current space.
Returns:
list, list of similarity scores
id2column

List of strings, the column elements.

id2row

List of strings, the row elements.

operations

List of operations which have been applied on the semantic space. List of Operation type objects.

The operations, together with their associated side information, are stored because they may need to be projected on peripheral data.

row2id

Dictionary, maps row strings to integer ids.

set_cooccurrence_matrix(matrix_)
to_dense()

Converts the matrix of the current space to DenseMatrix

to_sparse()

Converts the matrix of the current space to SparseMatrix

classmethod vstack(space1, space2)

Classmethod. Stacks two semantic spaces.

The rows in the two spaces are concatenated.

Args:
space1, space2: spaces to be stacked, of type Space
Returns:
Stacked space, type Space.
Raises:
ValueError: if the spaces have different number of columns
or their columns are not identical