This guide for webmasters describes in some detail what SiteTruth looks for when rating a site. This guide describes the alpha release of SiteTruth.
Most legitimate sites are already ready for SiteTruth. Our initial experience is that about 15% of sites have some easily fixed technical problem which reduces the site's SiteTruth ranking. This guide is to help webmasters fix those problems.
Analyzing a site
First, use SiteTruth's webmaster analysis tool to see the details of site analysis.
How SiteTruth examines a site
SiteTruth reads through the first few pages of a web site, looking for seals of approval, mailing addresses, and indicators that the site is selling something. The whole site is not processed; no more than twenty pages, and no more than 1 megabyte of a page, will be examined.
SiteTruth starts at the site's home page and follows links likely to contain information identifying the business behind the web site. The text associated with the link must contain a word that indicates some likelihood of finding a street address. The current word list includes "About", "Site Map", "Contact", "Location", "English" (for non-English sites with an English language page), "Order", "Return", "Checkout", "Cart", and similar words. Capitalization does not matter. We suggest a "Contact" or "Contact Us" page.
Only on-site links (interpreted broadly; sub domains and multiple domains pointed to the same IP address are properly understood) are followed.
Street addresses outside the United States must end with a country name or country code to be recognized. Only street addresses in Roman character sets are currently recognized.
The SiteTruth parser will make an attempt to parse incorrectly formatted web pages, but may not find addresses or links on them. We don't insist on perfect HTML or XHTML, but if a page is totally rejected as unparseable by the W3C validator, do not expect SiteTruth to parse it.
SiteTruth looks through each web site for "the legal name and ... the complete street address from which the business is actually conducted.", in line with California B&P code § 17358 and the European Directive on Electronic Commerce. We interpret this to mean an address formatted as an ordinary mailing address.
SiteTruth is looking for standard mailing addresses, like these:
1234 Example St.<br>
Example, IL 12345
24 Grosvenor Square<br>
London, W1A 1AE<br>
The street address can be inside a <table> or <div>, or decorated with <font> and <span> tags. It should be surrounded by some white space, using either <p>, <br>, or some other enclosing tag such as <li>, <td>, or <div>. This handles most real world cases.
Street addresses within an <a> tag will not be recognized. Watch for unclosed tags. Also, <li> tags outside an <ul> or <ol> result in problems. Street addresses with each line in a separate table box or <div> will not be recognized. Nor will addresses displayed only as images, even if there is an "alt" attribute on the image.
Our experience is that if a site has a standard mailing address on the site in a reasonable place, SiteTruth will find it.
SiteTruth matches these addresses against data sources such as business directories.
Seals of approval
SiteTruth recognizes certain "seals of approval" as providing a guarantee of business identity. In the initial version of SiteTruth, the only approved seal is that of the Better Business Bureau Online.
BBBonline seals normally consist of an image of the BBB seal with a clickable link in this format:
SiteTruth recognizes such links, and will then check the BBBonline web site, performing the "click to verify" operation that users are supposed to do but never do. If the site is verified successfully, the BBBonline seal summary will read.
Business name: Example, Inc.
429 F Street
Davis, CA 95616
BBBonline seal valid, confirmed by BBBonline database.
Currently, a valid BBBonline seal will give a site the top SiteTruth rating, unless other serious negative information is discovered. We suggest obtaining a BBBonline seal as a low-cost way for small businesses to achieve high SiteTruth ratings.
Seal links are checked. The ID number in the URL must match the BBB database, and the BBB's database must specify the domain of the site with the link. Otherwise, this will appear:
BBBonline seal appears to be misused.
Error: This company is currently not active under the BBB Online Reliability Seal Program..
If that message appears, your site will receive a very low rating.
Other seal programs may be recognized in future. Such programs would need to meet standards comparable to those of BBBonline, or the CA Browser Forum standards for issuing "Extended Validation" certificates.
For SiteTruth's purposes, seals of approval from certificate issuers are redundant, since SiteTruth can check business identity via the site's SSL certificate.
SiteTruth looks at SSL certificates in some detail. If SiteTruth is satisfied with the certificate, the SSL certificate summary will read:
This valid certificate has sufficient information to identify the business.
If that message appears, there's no problem. If it does not appear, there is no SSL certificate or the certificate is being ignored, and this appears instead:
No valid certificate.
The "Details" section will contain a more detailed error message.
SSL certificate validation failed (timed out)
SSL certificate validation failed (unexpected eof)
SSL certificate validation failed ((0, 'Error'))
These messages all simply indicate that the site did not return a certificate on request. SiteTruth simply opens the base domain of the site with SSL and reads any certificate offered. This depends only on the domain; no URL is involved. Because of the way SSL works, the web server must present a certificate before receiving a URL. So SiteTruth just opens a secure connection, performs the secure handshake, obtains a certificate, and closes the connection without reading a URL.
There seems to be no accepted standard on the proper way for a web server to reject such a request. Some servers refuse to open a connection on the HTTPS port (443). Some open a connection and close it without sending anything, and some don't respond at all. Thus, an assortment of error messages can appear.
Once a certificate has been read, it is validated. First, the SSL certificate chain is validated. SiteTruth trusts the same set of root certificates as Firefox. Any certificate not traceable back to one of those root certificates will be rejected. Expired certificates will also be rejected.
The certificate must, of course, be tied to the correct domain, to prevent spoofing by copying another site's certificate. This can cause problems in some legitimate cases:
Wrong host for SSL certificate. Certificate for "pro4.abac.com", actual host "www.countrysidecabinetry.com".
Here, the hosting provider has returned their own certificate, not one for the hosted domain. Such certificates are ignored. SiteTruth applies no rating penalty for this. We do some special handing so that the presence or absence of "www." at the beginning of a domain name won't cause problems.
The certificate must identify the site owner, not just the domain. Domain-only certificates, which contain no useful information about the business, are rejected:
This certificate identifies the domain only, not the actual business.
The contents of the certificate are checked. SiteTruth looks for these elements of the certificate as it works to obtain the name and address of the business:
||2081 N. Webb Rd
||Required for CA and US
||Required in EV certificates
||Required in EV certificates
||Corporation registration number
||Required in EV certificates
Most, but not all, SSL certificates issued by the major certificate authorities do contain the appropriate information. All the "Required" fields must be present, or the certificate will be rejected.
Currently, a valid SSL certificate that meets all the criteria above will give a site the top SiteTruth rating, unless other serious negative information is discovered.
SiteTruth recognizes "Extended Validation" certificates, but does not at present give them any greater value than other certificates bearing all the required fields that identify a business.
The top problems SiteTruth finds in web sites.
- Seriously unbalanced HTML tags, which can cause part of a page to be completely ignored.
- Invalid or expired SSL certificates on web sites.
- Expired or invalid seals of approval.
- Business addresses that aren't in mailing address format.