A System for Gathering Data From Classifieds Websites
The program for data gathering from classified websites. Web crawlers imitate the actions of the website user and collect the required information. In addition to text data, robots also recognize information from images: locations, phone numbers, etc.
SOLUTION
We have implemented autotests to check the functionality of the websites. Collecting information on one resource takes 3-6 days. Therefore, before running the tests, you need to check whether the functionality or location of the blocks has changed so that the robots didn’t get lost.
TECHNOLOGIES
Development: Scrapy, Spark, Scala, Java, Python, Tesseract Testing Tools: XPath, Selenium, PyTest, JSON, request
7 months
of development
10 robots
developed
1 000 000
records per day
90% recognition of image data
Other cases
Send us your request