3 min read

How to Archive Better with Spider

Table of Contents

Utilize Full Resource Storing

Spider allows you to enable Full Resource storing in the settings or website configuration. This feature ensures that all the elements of a website, including images, scripts, and stylesheets, are captured. Here’s how to enable it:

  1. Access Settings: Log into your Spider account and navigate to the settings panel.
  2. Enable Full Resource Storing: Find the option for Full Resource storing and toggle it on. This will ensure that every resource on the website is archived.
  3. Configure Website Specific Settings: For specific websites, you can configure these settings within the website configuration screen. This customization can help tailor the archiving process according to the needs of each site.

Schedule Regular Crawls

Regularly scheduled crawls can help maintain up-to-date archives of your websites. Here’s how to set them up:

  1. Navigate to Crawl Scheduler: Within Spider, go to the Crawl Scheduler section from the dashboard.
  2. Set Frequency and Timing: Choose how often you want the crawl to take place (daily, weekly, monthly) and set the specific timing according to your preference.
  3. Select Websites for Crawling: Choose the websites you wish to include in the regular crawling schedule.
  4. Activate Scheduling: Save your settings to activate the scheduled crawls.

Leverage Incremental Crawling

Incremental crawling helps in capturing only the changes made since the last crawl, saving time and storage space. Here’s how to use it:

  1. Enable Incremental Crawling: In the website configuration panel, enable the incremental crawling feature.
  2. Configure Change Detection: Set up criteria for change detection, such as monitoring file modifications or specific webpage elements.
  3. Run Incremental Crawls: Once setup, Spider will only save the changes, making the archive process more efficient.

Download and Store Data Effectively

Efficient data downloading and storage ensure that you can access your archived data when needed. Here’s how to manage it effectively:

  1. Run Crawls and Download Data: After running your scheduled or manual crawls, go to the download section to obtain your archived data.
  2. Organize Downloads: Store the downloaded data in a well-organized file system. Consider using a dedicated folder structure that categorizes data by date or website.
  3. Backup Archives: Regularly back up your downloaded archives to an external storage solution or cloud-based service to prevent data loss.

Monitor and Review Archives

Regular monitoring and reviewing of your archives help in maintaining their integrity. Here’s how to perform these tasks:

  1. Periodic Reviews: Schedule regular reviews of your archived data to ensure everything is captured accurately.
  2. Compare and Validate: Compare current live websites with your archives to validate the completeness of the stored data.
  3. Issue Tracking: Keep an issue tracker for any discrepancies or missing elements within your archives, and address them promptly using Spider’s tools.

By leveraging these advanced features and best practices, you can enhance the effectiveness of your website archiving process with Spider, ensuring that your data is stored accurately and efficiently for future reference.