Parser of the catalog of categories and products leroymerlin.ru in YML format
The parser of products and catalogs of the site https://leroymerlin.ru/ into a yml file in the shortest possible time (do not forget about bypassing locks, etc. - the assumption about the use of js-cookie formation on the resource, qrator protection).
Language: PHP
Code rules:
open source with comments of functions and basic loops/conditions/variables/constants
Command or file name: pars_lm.php
Options:
without parameter
Creation of a yml file with all catalogs, products and pictures,
file name: pars_lm.yml
The location of the pars_lm.yml file - specified as a constant in the executable file, for example,
$yml_all = __DIR__ . ‘/catalog/pars_lm-all.yml’
Location of the directory with pictures - specified as a constant in the executable file, for example,
$imgcat = __DIR__ ‘/catalog/img’
the name of the picture file corresponds to the product article with the addition of an index (_xx), for example:
Product number: 83800179, then
image file names:
83800179_01.jpg
83800179_02.jpg
83800179_03.jpg
etc.
Note for images:
Do not forget to determine the extension of the original image file.
If there are different sizes of images - determine and upload the highest resolution, for example, for a product with an article 83800179, as well as for other products, at the moment these are pictures with a resolution: 1200x1200:
https://res.cloudinary.com/lmru/image/upload/f_auto,q_auto,w_1200,h_1200,c_pad,b_white,d_photoiscoming.png/LMCode/18415706.jpg
upd-cat
Updating the yml file with directories,
file name: pars_lm-cat.yml
The location of the pars_lm-cat.yml file is specified as a constant in the executable file, for example,
$yml_cat = __DIR__ . ‘/catalog/pars_lm-cat.yml’
(take into account closing in case of additional pumping, etc.)
upd-offers
Creation/updating of a yml file with products (take into account closing ones in case of additional pumping, etc.)
upd-img
Updating/downloading product images
Location of the directory with pictures - specified as a constant in the executable file, for example,
$imgcat = __DIR__ ‘/catalog/img’
Note:
update image files in case of changing the size of the image file or missing
other parameter
Invalid parameter notification
Input data: catalogs and products with basic data, description and pictures of goods from the site https://leroymerlin.ru/, id-city (set as a constant in the code)
Output:
1) a file in yml format with product categories with links to parent categories and with products (basic data, parameters, description, link to the image relative to the path: $imgcat = __DIR__ /catalog/img) linking to the parent category
2) file in yml format with product categories with links to parent categories
3) a file in yml format with products (basic data, parameters, description, link to the image relative to the path: $imgcat = __DIR__ /catalog/img) linking to the parent category
4) Downloading images of goods relative to the path of $imgcat = __DIR__ /catalog/img and the rules for specifying the names of image files (you need to take from somewhere a list of products (links to products) on which to download images)
5) Keeping a log file
6) Logging the work of the parser in order to determine the breakpoint of work - resumption from the last place in case of incomplete processing of the task by parameter
Conditions of the script/parser:
keeping a log file
the possibility of pumping (goods in yml / images to the catalog) from the place of breaking (can pre-create some json file with a list of products for parsing, type: date / time, product, status)
[
{
"date": "2022-05-11 14:30",
"category": "http://178.248.234.184/catalogue/nabory-sadovoy-mebeli",
"catid": 18,
"offerid": 18415706,
"offerhref":” http://178.248.234.184/product/nabor-sadovoy-mebeli-ottoman-uglovoy-polirotang-bezhevyy-s-chernym-stol-i-divan-82467040/”,
"status": "true", //true, false, error
},
Approximate log file format:
"dd-mm-yyyy/hh:mm parser start/end date/time"
"Creating a yml file with directories"
"Creating a yml-file with catalogs and products, downloading images"
"Update / Download images"
"dd-mm-yyyy/hh:mm - error when downloading image"
"dd-mm-yyyy/hh:mm - error when updating catalogs"
"dd-mm-yyyy/hh:mm - error when updating products"
And others..
ID directory numbering (see examples in the appendix)
0 - root
Categories of the first level: Next from 1 to ...
Categories starting from the second level: "number of categories of the first level" * 2 1
Numbering of ID-products = article number of goods from the LM website
Product data
offer id
available
categoryId
name
price
description
picture(s)
param name
Annex 1. Approximate format of a yml file with categories
Leroymerlin
Leroy Merlin Vostok Ltd.
https://leroymerlin.ru/
Garden
Garden furniture
...
...
...
Lighting of commercial premises
Garage lighting
21.06.2022 16:21