Content Grabber is used for web scraping and web automation. It can extract content from almost any website and save it as structured data in a format of your choice, including Excel reports, XML, CSV and most databases. Web-scraping is the process of extracting data from websites and storing that data in a structured, easy-to-use format. The value of a web-scraping tool like Content Grabber is that you can easily specify and collect large amounts of source data that may be very dynamic (data that changes very frequently).
Usually, data available on the Internet has little or no structure and is only viewable with a web browser. Elements such as text, images, video, and sound are built into a web page so that they are presentable in a web browser. It can be very tedious to manually capture and separate this data, and can require many hours of effort to complete. With Content Grabber, you can automate this process and capture website data in a fraction of the time that it would take using other methods.
Web-scraping software interacts with websites in the same way as you do when using your web browser. However, in addition to displaying the data in a browser on your screen, web-scraping software saves the data from the web page to a local file or database.
Performance & Scalability
Content Grabber was designed from the very beginning with performance and scalability as the top priority. Multi-threading is used wherever appropriate to limit common web scraping bottlenecks such as web page retrieval.
Optimized web browsers
Web browsers are used to load and parse web pages, and Content Grabber has a range of different browsers to achieve maximum performance in every scenario — from a fully dynamic web browser to the ultra-fast HTML5 parser only browser. Different types of browsers can be used on the same website and Content Grabber will normally use many browsers at the same time – all running multi-threaded.
All web scraping tools spend most of their time waiting for new web pages to load, so its important to optimize this process. Content Grabber will automatically optimize page loads, but will also allow you to get under the hood to fine tune every aspect of the process.
Web scraping is notoriously unreliable and will often fail because of problems you have no control over. We understand that reliability is extremely important in many situations, so we have tackled this difficult issue head on and added strong support for debugging, error handling and logging.
Content Grabber has one of the best debuggers of any web automation software, and this will help you build reliable agents where all issues that can be resolved at design time are resolved at design time.
Many web scraping errors are unavoidable even with the best designed agents, and this is where error handling comes into play. One example could be an unreliable website that suddenly starts returning only error pages, and requires a web browser restart to start functioning again.
Many dynamic websites have bugs causing errors that are impossible to handle gracefully. Dynamic websites are small applications running in your web browser, and they may crash, hang, leak memory or cause many other fatal issues.
Content Grabber uses a health monitor process that looks for problems in the running web browsers, and restarts browsers that have run into trouble. A restarted web browser will continue from the point where it failed, so in most situations, this will not cause any interruption to the web scraping process.
Logging & notifications
Some website errors may occur very rarely, and may be impossible to catch during debugging. An example could be CAPTCHA protection that appears after hours of web scraping, or simply a broken Internet connection. Content Grabber can log all activity and errors, including the full HTML of web pages that are causing problems. This makes it much easier to identify runtime errors and take appropriate action to resolve these.
Notifications can be used to notify an administrator about specific problems, such as missing web content or other errors.
Content Grabber can email status reports to an administrator when errors or notifications have occurred during web scraping.
The Content Grabber agent editor has a typical point and click user interface where you click on the content you want to extract, or on the buttons and links you want to follow.
The agent editor sets itself apart from the crowd with its built-in smarts that automatically detect and configure all commands. It will automatically create lists of content and links, handle pagination and web forms, download or upload files, and configure any other action you perform on a web page. At the same time, you always have the option to manually fine tune the commands, so Content Grabber gives you both simplicity and control.
The Content Grabber agent editor is so simple to use that it can easily be used by beginners, and the built-in smarts enable users to quickly build large numbers of web scraping agents.
Data is everything when it comes to web scraping. Content Grabber allows you to load data from any source and use it in your agents for anything you need. You can also export extracted data to almost anywhere. This flexibility is key — enabling your technology to grow with your business.
Once data has been extracted and exported, it can be distributed by email, FTP or a custom defined destination.
Agent Management Tools
Content Grabber is designed to manage hundreds of agents in a professional web scraping environment with development, testing and productions servers.
Logs, schedules and status information for all agents can be managed in one centralized location, and all proxies, database connections and script libraries can be managed on a per server basis.
No one wants to write scripts to get things done and with Content Grabber you rarely have to. However, if you have some unusual requirements, or you need to fine tune some process, it's nice to know the ability is there.
Content Grabber has a fully-fledged built-in script editor with IntelliSense that is more than capable when building smaller scripts.
Distribute Executable Agents Royalty Free
Build royalty free self-contained web scraping agents that can run anywhere without the Content Grabber software. A self-contained agent is a single executable file that is easy to send or copy anywhere, and has a multitude of powerful configuration options.
You are free to sell or give away your self-contained agents and you can add promotional messages and advertisements to the agents' user interface. Content Grabber imagery / adverts are also included. Note: If you want to white-label your self-contained agent you will need to use the Premium Edition of Content Grabber.
You can run agents from the command-line by using the Content Grabber command-line program. With this you can specify command-line parameters that can easily be used as input data by your agents.