|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.olat.core.commons.services.text.impl.nutch.NGramProfile
public class NGramProfile
This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction. The similarity calculation is at experimental level. You have been warned. Methods are provided to build new NGramProfiles profiles.
| Field Summary | |
|---|---|
static OLog |
log
|
| Constructor Summary | |
|---|---|
NGramProfile(java.lang.String name,
int minlen,
int maxlen)
Construct a new ngram profile |
|
| Method Summary | |
|---|---|
void |
add(java.lang.StringBuffer word)
Add ngrams from a single word to this profile |
void |
analyze(java.lang.StringBuilder text)
Analyze a piece of text |
static NGramProfile |
create(java.lang.String name,
java.io.InputStream is,
java.lang.String encoding)
Create a new Language profile from (preferably quite large) text file |
java.lang.String |
getName()
|
float |
getSimilarity(NGramProfile another)
Calculate a score how well NGramProfiles match each other |
java.util.List<org.olat.core.commons.services.text.impl.nutch.NGramProfile.NGramEntry> |
getSorted()
Return a sorted list of ngrams (sort done by 1. |
void |
load(java.io.InputStream is)
Loads a ngram profile from an InputStream (assumes UTF-8 encoded content) |
static void |
main(java.lang.String[] args)
main method used for testing only |
void |
save(java.io.OutputStream os)
Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding |
java.lang.String |
toString()
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final OLog log
| Constructor Detail |
|---|
public NGramProfile(java.lang.String name,
int minlen,
int maxlen)
name - is the name of the profileminlen - is the min length of ngram sequencesmaxlen - is the max length of ngram sequences| Method Detail |
|---|
public java.lang.String getName()
public void add(java.lang.StringBuffer word)
word - is the word to addpublic void analyze(java.lang.StringBuilder text)
text - the text to be analyzedpublic java.util.List<org.olat.core.commons.services.text.impl.nutch.NGramProfile.NGramEntry> getSorted()
public java.lang.String toString()
toString in class java.lang.Objectpublic float getSimilarity(NGramProfile another)
another - ngram profile to compare against
public void load(java.io.InputStream is)
throws java.io.IOException
is - the InputStream to read
java.io.IOException
public static NGramProfile create(java.lang.String name,
java.io.InputStream is,
java.lang.String encoding)
name - is thename of profileis - is the stream to readencoding - is the encoding of stream
public void save(java.io.OutputStream os)
throws java.io.IOException
os - the Stream to output to
java.io.IOExceptionpublic static void main(java.lang.String[] args)
args -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||