ICONICO

Discussion Thread

Data Extractor

Message Thread

For WindowsData Extractor

Data Extractor iconExtract any data, including email addresses and URLs from your files and webpages.

Posted in the Data Extractor Forum.




Tips for fast extraction

Here are a few tips for speedy data extraction, as an example I recently extracted 26 fields from over 36000 pages in only a few minutes (note this was only the extraction time)

1) Download all the pages locally to your hard drive, there are various automated ways to do this depending on how the site of interest is structured so I can't give a simple "works for all sites" solution here but you need to save just the HTML and ignore images. For the site I recently worked on I used Excel to generate a list of sequentially numbered URL's, then made a web site with all those URL's as links and used the firefox addon "DownThemAll!" to download them.
2) Split the downloaded pages up into separate directories with no more than 5000 pages in each directory THIS IS IMPORTANT!
3) Run data extractor and tell it to extract from each directory in turn and save the results.
4) re combine the results in word etc. and bingo you're done.

by Martin King on Jan 27 2010 5:19pm Reply

Our Software Stores

IconicoAccurate Design and Development Software

BitsDuJourDiscount Deal Coupons for Windows and Mac Software Apps

Our Software Services

IcoBlogOur Official Blog

© copyright 2004-2024 Iconico, Inc. Code & Design. All Rights Reserved. Terms & Conditions Privacy Policy Terms of Use Login