Web Crawling and Black List

It is best practice to define black/white lists to ensure that parts of the web application will not be scanned/will be scanned. Setup a black list to identify the URLs you do not want the service to scan. Any link that matches a black list entry will not be scanned unless it also matches a white list entry. Setup a white list to identify the URLs you want to be sure that the service will scan. When you setup a white list only (no black list), no links will be crawled unless they match a white list entry.

The web crawler automatically follows links it encounters in web page HTML. To do this, the web crawler makes requests to the web server just like the web application makes itself when users take certain actions. For example, when the crawler encounters a button in a form, the crawler submits a request for the button URI to the web server. These actions made by the web crawler could cause an undesirable effect. For example, if the button is designed to "delete all" data, like accounts and configurations, then all data would be deleted. In an administrator web application, an administrator may click a button to change the authentication type for the subscription account, changing the authentication behavior for all users of the web application.

Web Crawling and Black List

Related Topics