Home Phishing Blog Webmasters About Privacy Site map  
SiteTruth

SiteTruth application program interface - Version 3

beta test

SiteTruth has an Application Program Interface, or API, intended for use by "AJAX" applications. The API allows programs to request a rating for a URL or set of URLs. We are offering this interface for open use as part of our beta test.

Change log

  • Version 1: XML output only.
  • Version 2: (2011) "format" argument and JSON output added. Backwards compatible.
  • Version 3: (2012) Business location information added. Backwards compatible.
  • September 2014: Deprecated CGI interface discontinued.

Requests

A request for site ratings is submitted with the following fields encoded in the URL:

  • url, url, url, ... URLs to be rated These are separate fields, all beginning with "url". Either a domain name or a full URL can be submitted. Only the domain name part of the URL is used, so, for privacy reasons, it is best to send only that. When multiple url fields are provided, the entries are handled in that order. At least 20 url fields can be used in a single query; the actual limit is higher.
  • priority - request priority. When making multiple requests to rate domains returned from some search result, use priority 1 for the first search result, priority 2 for the second, and so forth. This will insure that the top search results are rated first. This field is optional, the minimum value is 0, and the default value is 1.
  • key - user key. This identifies the application using the API. Currently, a key of guest should be used. Invalid keys will be rejected with an HTTP error 403.
  • format - output format. Options are xml or json. The default is xml.

A typical URL is of the form

http://www.sitetruth.com/fcgi/rateapiv3.fcgi?url=ftc.gov&key=guest

 

Replies

The reply to each request is in XML or JSON. XML reply structure is simply a sequence of

<sitetruth:rating url="url"rating="letter" ratinginfo="message" status="200"></sitetruth:rating>

The attributes shown above are always present. Additional attributes, listed in the "Business information" and "Miscellaneous" sections below, may also be present.

Example XML output:

<x xmlns:sitetruth='http://www.sitetruth.com/schema'>
<sitetruth:ratingurl="berkshirehathaway.com" status="200" rating="Q"
salesmin="1000000000" salescurrency="USD"
name="BERKSHIRE HATHAWAY INC"
confidence="medium" matchconfidence="high"
location="OMAHA" state="NE" countrycode="US">
</sitetruth:rating>
</x>

In JSON format, the result is an array of JSON objects.

[{domain: "domain", rating: "letter", ratinginfo="message", status: "200", err: "error text"}]

Example JSON output:

[{"status": "200", "rating": "Q", "domain": "berkshirehathaway.com", "countrycode": "US", "salesmin": 1000000000, "name": "BERKSHIRE HATHAWAY INC", "err": "", "salescurrency": "USD", "confidence": "medium", "state": "NE", "location": "OMAHA", "matchconfidence": "high"}]

The HTTP reply status is normally 200. 4xx or 5xx values indicate server problems. The error text field contains internal diagnostic information only.

HTTP reply status
status Meaning Notes
200
OK Good results are attached.
403
Forbidden The key value was not accepted.
414
Request too long The request is unreasonably huge.
410
Gone This API version has been discontinued. Please upgrade.
502
Overload Too many requests pending from this IP address
5xx
Server error Other Server problems

With a HTTP reply status other than 200, a human-readable HTML document will be returned instead of XML or JSON.

Field values

SiteTruth rating codes
rating Icon Meaning
A
Green checkmark Site ownership verified.
Q
Yellow question mark Site ownership identified but not verified.
X
Red "do not enter" Site ownership unknown or questionable.
U
Grey circle Not rated.
W
Rotating wait icon Ask again later. (See "Retries" below.)

The rating letter is always in upper case. The icons above may be used in conjunction with SiteTruth ratings.

Additional information
ratinginfo Meaning
""
(No message)
"error"
An internal error occurred in the rating system.
"no_domain"
The domain name is not valid.
"no_website"
No web site was found at the domain.
"blocked"
Access to the web site was refused (by password or "robots.txt" file).
"no_location"
No street address could be associated with the web site.
"negative_info"
Negative information about this site was found.
"non_commercial"
The site appears to be non-commercial.
"unverified"
Reserved for future use.
"bad_url"
The url field has invalid syntax or has an IP address, not a domain.

These enumeration values will appear exactly as shown, to allow for translation to multiple languages in the client. For simple English display, convert underscores to spaces and make the first letter upper-case.

Business information fields - may or may note be present
Field name Type Meaning
name
text Name of the business .
location
text City of business location
state
text State of business location
countrycode
text Country of business location
activesince
numeric Earliest date business known to be active.
confidence
"low", "medium", "high", "test_only" Data quality for the business data
matchconfidence
"low", "medium", "high", "test_only" Confidence that the correct business was matched to the domain
salesmin
numeric Minimum estimate of annual sales volume
salesmax
numeric Maximum estimate of annual sales volume
salescurrency
text Currency unit of annual sales volume

SiteTruth attempts to match domain names to business background information. Some of that information is supplied through this API.

Notes:

  • For businesses with multiple locations, the location will be the corporate headquarters if it can be identified.
  • The earliest date the business is known to be active is the earliest data for which we have records. This will not usually be earlier than 2007.
  • Sales figures are approximate; they are meant only as an indicator of business size. For companies with multiple locations or subsidiaries, the sales figures may be for a part of the company, rather than the entire company. See the SiteTruth details page for more information. For public companies in the US we will usually have a link to Securities and Exchange Commission filings with detailed financial data.
  • The "confidence" field represents the quality of the data source. SEC filings are considered to be of high quality; other data sources less so.
  • The "match confidence" field represents the confidence of the match between the domain and the reported business data.
Miscellaneous fields - may or may not be present
Field name Type Meaning
note
text Reserved for future use, will contain short additional textual info.
warning
text Reserved for future use, will contain short warning messages

Status codes
status Meaning Notes
200
OK Normal completion
202
Accepted Sent with "W" rating - site not yet rated, queued for rating. See below..
500
Internal server error Internal error, try again later.

See below for how to handle a 202 status. Note that this is the status value in the XML or JSON reply, not the HTTP reply status.

Retries and flow control

If a requested domain is in the SiteTruth database, the rating will be returned immediately. If the domain has not been previously rated, it will be queued for rating, and usually rated within a minute or two. When a site is queued for rating, a rating of "W" and a status of "202" are returned. The request should be retried every 5 seconds, for up to two minutes.

A single XML reply with multiple sitetruth:rating items may contain both 200 and 202 status items. Completed items (status 200) are done, and should not be retried.

Retrying the same request more rapidly than once every 5 seconds may result in blocking of the client's IP address. Clients should cache SiteTruth replies to avoid making the same request repeatedly. A cache expiration time of at least one hour is suggested. We use an expiration time of one week in our demo applications.

SiteTruth applies "fair queuing" to requests. Multiple requests from a single IP address are permitted but will not yield faster responses. If an excessive number of requests, more than 100, are outstanding from a single IP address, further requests will be rejected with an HTTP reply code of 502 (Overload). This error indicates that the querying program is defective, and is making requests without waiting for the successful completion of previous requests. (This means you, "fwvplab.elet.polimi.it".)

Obtaining rating details

The application program interface above provides basic information about the site. More detailed information, in the form of a pop up web page, is available by using URLs of the following form:

http://www.sitetruth.com/fcgi/ratingsummary.fcgi?url=www.ftc.gov

 

This is best displayed as a pop-up page opened in a new window. We suggest opening a browser window with the properties:

'height=600,width=700,toolbar=no,menubar=no,scrollbars=yes,resizable=no,location=no,status=no'

 

This provides a summary page, which includes basic information about the business behind the web site. Information provided when available includes the SiteTruth rating, the business name and address, annual revenue, number of employees, and an aerial photo of the company's location. Buttons which display detailed information from other data sources, such as the U.S. Securities and Exchange Commission, may appear. The page will also contain a link to a larger SiteTruth page with full details, more information than most users will want.

If a SiteTruth rating icon is displayed by a program using the API, it should be made a clickable link of the form above. This allows users to easily obtain the information behind the rating.

This feature is currently available only in English.

As of September 2014, the older URL form has been discontinued.

http://www.sitetruth.com/cgi-bin/ratingdetails.cgi?url=www.ftc.gov

 

Terms of use

This service is provided at no charge on a "best-efforts" basis. Ratings reflect the automated opinion of SiteTruth®. This is an alpha test. We reserve the right to modify or discontinue this service. We retain copyright in SiteTruth ratings. This service may not be sold, resold, or used in a commercial product without our express written permission. Use of this service in free software (as defined by the Free Software Foundation) is encouraged.

SiteTruth. Search, with less evil.

Another service from the publishers of Downside