NLP

The Infermedica API features custom Natural Language Processing technology, allowing your applications to understand symptoms mentioned by users as plain English text.

The service is easy to use: you can send the user’s original message and our endpoint will process it and do its best to spot symptom mentions.

Standard usage

The service is accessible via the /parse endpoint. It returns a list of symptom mentions that have been recognized in the message. Our language technology is also able to spot some negated mentions (as in “I don't have headache”) and to deal with spelling errors (a common problem in chat language).

The endpoint expects a simple JSON containing one attribute, named text. Here's an example:

curl "https://api.infermedica.com/v2/parse" \
  -X "POST" \
  -H "App-Id: XXXXXXXX" -H "App-Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
  -H "Content-Type: application/json" \
  -d '{"text": "i feel smoach pain but no couoghing today"}'

And the response:

{
  "mentions": [
    {
      "id": "s_13",
      "orth": "stomach pain",
      "choice_id": "present",
      "name": "Abdominal pain",
      "type": "symptom"
    },
    {
      "id": "s_102",
      "orth": "coughing",
      "choice_id": "absent",
      "name": "Cough",
      "type": "symptom"
    }
  ]
}

Each mention is associated with a symptom ID (id attribute) and a modality (present or absent, the attribute named choice_id). These attributes are directly compatible with the /diagnosis endpoint. The name attribute contains the main name of a symptom, while orth contains the orthographic form of the mention, that is to say, the words used in the text (after spelling correction).

The text analyzed cannot be longer than 1,024 characters per /parse call. An error message (400) is returned for texts that are too long.

Obtaining detailed output (advanced)

Optionally you can pass include_tokens: true to obtain additional information on tokenization in output. This may be helpful if you plan to perform additional stages of text processing; it makes it easier to align output of our service with output of other NLP tools you might wish to employ. For instance:

curl "https://api.infermedica.com/v2/parse" \
  -X "POST" \
  -H "App-Id: XXXXXXXX" -H "App-Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
  -H "Content-Type: application/json" \
  -d '{"text": "I often feel sad.", "include_tokens": true}' 

will yield the following structure:

{
  "mentions": [
    {
      "id": "s_169",
      "positions": [0, 2, 3],
      "head_position": 2,
      "name": "Depressed mood",
      "choice_id": "present",
      "orth": "I feel sad",
      "type": "symptom"
    }
  ],
  "tokens": ["I", "often", "feel", "sad", "."]
}

The extended structure contains a list named tokens. Tokens are words, numbers, symbols and punctuation captured in the input text. Words in the list are given as orthographic forms (that is, forms encountered in the input text but after spelling correction).

Also, the representation of each mention is enriched with references to token positions. The position attribute contains a list of token indices that make up the mention (corresponding to the tokens list, counting from 0). Mentions are not always continuous, as you can see in the above example. The mention’s syntactic head is designated by the head_position attribute. Syntactic heads are tokens that determine the syntactic type of the whole phrase; in other words, if the parse tree underlying the entire mention was to be collapsed into one word, it would be the head.

Limitations

The service attempts to capture mentions of symptoms that are present in our knowledge base. If a symptom is not there, its mention will not be recognized. However, if a more general symptom is present, chances are it will be captured instead (for instance, currently there is no separate entry for “rash on legs” in our knowledge base, but the service will understand “rash” if this phrase is sent).

Also note that due to the ambiguity of natural languages and the endless spectrum of possible language expressions that may be used to convey an idea, we cannot guarantee that the recognition will be 100% accurate. Nevertheless, we believe it is already performing well, and it is continually being improved.

PreviousTriage