Comment by Alifatisk

Comment by Alifatisk 3 months ago

How could I extract rectangles from PDF and then do something like this?

Do you mean ingesting the extracted rectangles/ bounding boxes? We're actually working on bounding boxes, this is a good insight and we can add this to the product. However, the way we ingest is literally converting each page to an image then embedding that so the text, layout, diagrams are all encoded in. Would like to know what the exact use case is, can help you better

Reply View 2 replies

mnky9800n 3 months ago

Why do you convert to image? It’s easy to turn the components of a pdf into separate items and then ingest them individually. I also imagine at some point rasterizing vectors will become a pain point for some reason.

Reply View | 1 reply
- Adityav369 3 months ago
  
  Mainly to maintain layout information. Also search becomes easier this way.
  
  Reply View | 0 replies