Web services
Text analytics web services
We have a wide range of web services available. On this page, we describe what they do. For the technical details, check out our DevOps Center.
How does it work?
Textgain provides a set of URLs to which you can send requests, i.e. text to analyze. Note the ?q in the example below. You need to send your personal key with each request (replace ***). The server responds with a JSON string, a standardized, compact data format.
$r = 'https://api.textgain.com/1/age?q=lets+roll&key=***';
$r = file_get_contents($r);
$r = json_decode($r);
echo $r['age'];
Requests
https://api.textgain.com/1/age?q=lets+roll&key=***
Response
The server returns:
{"age": "25-", "confidence": 0.75}
HTTP 429 “Too Many Requests”
status code if the daily limit is
exceeded.
Profiling
Discover the author behind a text through writing style analysis.
Age Prediction
Age prediction
estimates whether a
text is written by
an adolescent or an
adult. Online,
adolescents use more
informal language,
including
abbreviated
utterances (omg,
wow) and
mood (awesome,
lame).
Adolescents tend to
talk about school,
parents, and
partying. Adults
tend talk about
work, children,
health, and use more
complex sentence
structures.
REST-API
>
Gender
Prediction
Gender prediction
estimates whether a
text is written by a
man or a woman.
Statistically, women
tend to talk more
about people and
relationships
(family, friends),
while men are more
interested in
objects and things
(e.g., cars, games).
As a result, women
will use more
personal pronouns
(I, you,
we) in a
social context and
men will use more
determiners (a,
an, the)
and more quantifiers
(one,
many).
REST-API
>
Gender Tagging
Gender tagging
provides for each
word in a text a
male
,
female
or
neutral
tag. These tags are
estimated on
observed language
usage by male and
female writers.
Gender tagging
differs from gender
prediction, in that
it indicates
which words
the respective
genders have been
observed to use more
in writing, as
opposed to measuring
typical male vs
female writing
style.
REST-API
>
Education
Prediction
Education prediction
estimates whether a
text displays basic
or advanced writing
skills.
Statistically,
people with higher
education will use
more formal language
and use more
punctuation marks
(, ;
:),
correct spelling and
capitalization,
longer words and
sentences and less
emoji (cf. idk lol
just talkin ☺☺☺).
REST-API
>
Personality
Prediction
Personality
prediction estimates
whether a text is
written by an
extraverted or an
introverted person.
Extraverts tend to
be more sociable,
assertive and
playful, while
introverts are more
solitary, reserved
and shy. As a
result, extraverts
will use we
more often, and more
positive adjectives
and less formal
language. Introverts
will use I
more often, and they
employ a broader
vocabulary.
REST-API
>
Sentiment Analysis
Measure whether people are communicating in a positive, neutral or negative way.
Sentiment
analysis
Sentiment analysis
predicts whether a
text is objective
(fact) or subjective
(opinion).
Subjective text
contains adverbs and
adjectives with a
positive or negative
‘polarity’ that
capture the author’s
personal opinion
(e.g., an
excellent
opportunity
or a bad
product).
REST-API
>
Sentiment
tagging
Sentiment tagging
provides for each
token in a text a
sentiment score,
expressing its
polarity. Unlike
sentiment analysis,
it is not sensitive
to more complex
linguistic patterns,
such as negation and
modality, you can
use this as an
alternate way of
calculating the
overall sentiment or
to extract
subjective terms in
your document.
REST-API
>
Concept Extraction & Conversion
Extract concepts from text and apply conversion, such as geocoding, anonymization or even simple word-translation.
Concept
extraction
Concept extraction
identifies keywords,
key phrases and
‘named entities’ –
names of persons,
products,
organizations,
locations, dates,
and so on. Keywords
are nouns that
appear more often in
a text, and often at
the start of a text.
Named entities
frequently start
with a capital
letter (e.g.,
Barack
Obama).
Concept extraction
can be used to
summarize a text, or
to compare if two
texts discuss
similar topics for
example.
REST-API
>
Geocoding
Geocoding looks for
place names in a
text (in any
language) and
returns a list of
possible locations,
along with their
longitude and
latitude and country
of origin. Note that
the results are
exhaustive! For
example, Berlin,
Germany as well as
Berlin in Colombia
(Berlín) will be
returned. The
results are sorted
according to
population size (if
known).
REST-API
>
Concept
translation
A simple translation
engine that finds
English translations
for words in a text.
This is word-based
translation model
and should not be
considered as a
machine translation
solution.
REST-API
>
Lexicon & Readability
These services are designed to help you grammatically analyze your documents and measure their readability.
Lemmatization
Lemmatization
involves the
morphological
analysis of words to
reduce them to their
dictionary form
(lemma). It
is more powerful
than stemming, which
simply strips
morphological
prefixes, rather
than taking into
account a word's
part-of-speech and
allomorphic
transformations. For
example, "bathing"
would be stemmed to
"bath", but would be
lemmatized as
"bathe".
REST-API
>
Part-of-Speech
Tagging
Part-of-speech
tagging identifies
sentence breaks and
word types. Words
have different roles
depending on how
they are used. For
example, the word
shop can be a noun
(a shop, object) or
a verb (to shop,
action).
REST-API
>
Passive Voice
The use of the
passive voice helps
you to draw
attention away from
the agent of the
action.
Stylistically,
however, it is often
frowned upon,
because it reduces
readability. This
classifier
identifies the verbs
involved in the
passive voice of a
sentence.
REST-API
>
Syllable Counts
/ Hyphenation
Readability metrics
often rely on
syllable counts.
Hyphenation and
syllabification go
hand in hand. This
classifier outputs
hyphenation patterns
and syllable counts.
It is fairly robust
to noisy language
(see example
*awsome).
REST-API
>
Identification
Determine what language or genre your documents are in.
Language
identification
Language
identification
detects the language
a text is written
in. Different
languages use
different
characters. For
example, Russian
(Кирилица), Chinese
(汉字) and Arabic
(العربية) are easy
to distinguish.
Languages that use
the same characters
(e.g., Latin
alphabet, abc) often
have cues that set
them apart (e.g., é
↔ ë).
REST-API
>
Genre
classification
Genre classification
predicts the type of
text, based on its
length, tone of
voice and
content.
REST-API
>