Diffbot is a robot that sees the web the way people do, and helps developers extract the important parts from any web page.

Text and natural language processing, including word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, and more.

content-extraction tier-2 upcoming
generic tier-2

We're constantly evolving

We'd be glad to discuss your workflow and deliver the required functionality.


Text

  • Apply Regular Expression

    Apply a regular expression on a text.

    You can use Pythex to test your regular expression.

  • Find All Matches to a Regex

    Apply a regular expression on a text and return all non-overlapping matches of the regex.

    You can use Pythex to test your regular expression.

Sentiment

  • Text
    The text to analyse.
    Polarity
    The sentiment polarity.
    Strength
    The sentiment strength.
    Language
    The detected language that the source text was written in.
  • Extract Sentiment

    Extract positive/negative sentiment from the given some text.

We're constantly evolving

We'd be glad to discuss your workflow and deliver the required functionality.