Skip to content

Ryan lippman solution#377

Open
lippmanry wants to merge 3 commits into
serpapi:masterfrom
lippmanry:ryan_lippman_solution
Open

Ryan lippman solution#377
lippmanry wants to merge 3 commits into
serpapi:masterfrom
lippmanry:ryan_lippman_solution

Conversation

@lippmanry
Copy link
Copy Markdown

Created parser solution to:

  • search for base64 in script tags and handle unicode characters/validate image
  • handle getting a header for the object (e.g. "artworks", "books")
  • deal with different page structures (html vs wp-grid-tile)
  • search and parse all .html files in /files
  • dynamically name .json output based on page title
  • output item object with name, extensions, link, image

Created tests to check:

  • item types
  • validate url structure
  • validate base64
  • validate year

Added 2 additional .html files to test parser:

  • chuck.html
  • monet.html

Output of parser is .json object:

  • chuck-wendig-books.json
  • monet-paintings.json
  • van-gogh-paintings.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant