Home

Overview

The EDISON COmpetencies ClassificatiOn (E-CO-2) system is a distributed automated tool designed to support gap analysis. It can identify the similarity of a document against a set of predefined categories. It can therefore be used to perform a gap analysis based on the EDISON DS-taxonomy to identify mismatches between education and industry. Moreover, students, practitioners educators and other stake holders can use these tools to identify the gaps in their skills and competences.

System: EDISON-COmpetencies ClassificatiOn (E-CO^2)

Performs the following actions:

Train

Manually define each category. In practice, provide for each category a set of simple txt files that contain keywords, definitions that represent the category. The quality of the classification depends on this step. Therefore the text has to be concrete and representative and contain specific nouns. For example expressions like "analyze large data sets and investigate possible solutions" are not concrete.
Perform term extraction on the text files to produce a list of terms. Identify terms used in a subject or content.
Generate TD-IDF values for the collection of extracted terms. The output will be a table with the extracted terms as its header and each line will contain the TD-IDF values for each processed document.
The values of the table are summed and filtered to create a vector to represent one category

Classify

Provide input text for classification
Do text filtering
Find overlapping terms
Calculate TF-IDF of terms
For each category vector calculate cosine similarity
The output is a table with the similarity for each category

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Overview

System: EDISON-COmpetencies ClassificatiOn (E-CO^2)

Train

Classify

Clone this wiki locally