GPT PDF & Image Data Extraction (Power Automate)

No video

GPT PDF & Image Data Extraction (Power Automate)

Рет қаралды 13,947

Tyler Kolota

Күн бұрын

Пікірлер: 62

@tylerkolota Жыл бұрын

A Version 2 is now available that is 2X faster & that uses 1/7th the action api calls. This should make it even better for real-time scenarios like loading data to a Power App screen when a user uploads a document or processing many hundreds of documents a day.

@dmvogan 9 ай бұрын

I'm not able to import your flows, I get an error. Can you briefly describe how you optimized them?

@tylerkolota 9 ай бұрын

@@dmvogan If the standard import of the flow-only packages below do not work for you, you can also try importing the flows through a Power Apps solution package here: powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/m-p/2201670/highlight/true#M1637

@tylerkolota 11 ай бұрын

A Microsoft Staff member just confirmed that the Create text with GPT action has been updated to use a 16k token model. So this template should now be able to work on 4x as many pages at once!

@madhavilatha7881 23 күн бұрын

Hi @tylerkolota , Thank you so much for this solution, it is so helpful. Here I am looking for sorting the text by top property along with the coordinates. My pdf documents are scanned tilted which is causing these not coming as expected, especially the tables. I appreciate your input on this.

@tylerkolota 23 күн бұрын

@@madhavilatha7881 The template orders the text replica based on wherever the center of the text boxes are. I don’t have any further adjustments to help with a significantly tilted page on this template. However if you want you can try using a different method with premium HTTP actions to call GPT4o Mini’s image/vision component to extract from documents community.powerplatform.com/galleries/gallery-posts/?postid=73cdb790-11c9-45b7-80d0-b991d1f43f34

@madhavilatha7881 20 күн бұрын

@@tylerkolota Thank you for the input. I checked the above approach but I may not be able to go with this approach because of the premium actions and Azure functions logic.

@antoniocgonzalez8013 9 ай бұрын

OMG this is awesome, you sir are a genius. Do you do freelance? how can I contact you?

@tylerkolota 9 ай бұрын

Thank You! You can reach out at takolota@gmail.com or on LinkedIn at www.linkedin.com/in/kolota?

@monching6919 5 ай бұрын

nice content really helpful I have a project that needs to extract info from contracts. I can use this one for automation but may I know if those consumes ai credits with this workflow? another question since its using ocr can it transcribe hadwritten text like dates after a signature?

@tylerkolota 5 ай бұрын

Yes & Yes. It consumes AI credits both for the OCR & for the GPT prompt. It also does capture handwritten text.

@tylerbrooks17 3 ай бұрын

Hi Tyler, great video and very helpful. What is the benefit with this flow using GPT vs AI Builder invoice reading? Also should a business be concerned with vendor data flowing through GPT services? Thanks!

@tylerkolota 3 ай бұрын

GPT is much more flexible to different file formats/styles which is especially helpful when one may have files coming in from numerous sources like many different suppliers. There are also times where tagging each thing in AI Builder may not be feasible if there are numerous possible instances of said things in the same file. Also GPT prompts are much more customizable. They can interpret data & can transform the data during the extraction. And here is the MS doc on data privacy of Azure GPT. The data is not shared or used for training or anything. learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy

@emilypierce5944 3 ай бұрын

Also massively cheaper IIRC. And for SMBs like myself more "pay as you go" rather than the huge blocks of $500 for a million credits you have to do for AI builder.

@tylerkolota 3 ай бұрын

Yeah pricing is also another thing. If you have the 5000 AI builder credits per month from the $15/mo premium power automate license then you can process a decent number of pages without needing to upgrade to the $500/mo AI Builder package. Also if you wanted to make things more pay as you go, you could set up an HTTP action to call an Azure instance of GPT instead of using an AI Builder prompt action. That would enable you to pay like $.001 per page for the GPT prompt & use the 5000 AI credits for just the OCR action. I also may set something up to do all this with the GPT4 Vision model in Logic Apps for a true pay as you go set-up.

@rachellim4147 2 ай бұрын

Hi Tyler ,thank you for sharing this amazing video. I encountered an errot with the package version 1.7, which gives the error message: "The 'Create text with GPT' action doesn't have a content approval action after it." I did not add anything to the package. Could you please advise what might be going wrong?

@tylerkolota 2 ай бұрын

Microsoft later added a requirement for an approval action after that preview GPT action. Please use a more recent version where an approval action has been added after the GPT action with a static result or where the new Create text with prompt action is used.

@ManouchehrNorouzi-gd5xg 6 ай бұрын

Hi and very thanks to this flow and document. I need to extract information from scanned pdf files related to different kinds of contracts. I would like to ask for a suugestion on how can I improve this flow for this reason?

@tylerkolota 6 ай бұрын

Hello, You will mostly want to customize the prompt going to GPT to specify what you want to extract. Now are you saying you have contracts with different formats but you want the same information from them, or different contracts you want different information from depending on their type of contract? If it is the latter, & your use-case doesn't require faster processing speeds, then maybe you would want to split things out to two steps. One with a model, text parsing, or GPT prompt to categorize which type of contract it is, & then a switch action where depending on the contract type you send the text to different instances of the GPT action with a different prompt for each type of contract.

@ManouchehrNorouzi-gd5xg 6 ай бұрын

I have different kinds of contracts, but I need to extract fixed information such as service type, service provider, service reciever, startdate and enddate of contract, and so on.

@tylerkolota 6 ай бұрын

@@ManouchehrNorouzi-gd5xg Okay, then you should be able to adjust the example JSON fields in the prompt to extract those pieces of data.

@manifesttthat 6 ай бұрын

This is some amazing stuff

@tylerkolota 6 ай бұрын

Thank you! It will probably soon be replaced by GPT 4 Turbo’s built in image & pdf functionality, but this did: 1. Get PDF reading capability to people ~1 year sooner. 2. Ensure we all have a way to use less expensive models on PDFs, especially if Microsoft tries to charge extra for it on GPT4.

@swarnpriyaswarn 6 ай бұрын

Hey..thanks for this amazing tutorial. Just wanted to know how you make connection with OpExOptimization...after importing it to the power automate. I am kind of stuck over there....plz do help

@tylerkolota 6 ай бұрын

Are you stuck on the import screen where you add connections or after the flow actions have loaded?

@brandonvelasquez3530 8 ай бұрын

I am working one extracting data from medical insurance claims. 2 pages out of a potentially 20 page pdf might have info that i need. Can this still process that many pages in a file? The other potentially 18 pages has a bunch of disclaimer stuff that is boiler plate and comes with every claim. If i stick that text file output inside the gpt action won't that go over the token limit for input?

@tylerkolota 8 ай бұрын

Hello Brandon, On newer versions of the flow I added some actions after the OCR read where you can set what page numbers you want it to process.

@brandonvelasquez3530 8 ай бұрын

@tylerkolota the thing is I don't ever know what page numbers I want it to process. Sometimes it might be 2 and 5, maybe another 3 and 9, maybe another 10 and 15. Sometimes, there might only be 5 pages total and some times 20 total. I tried using unstructured document extraction custom model and am leveraging the multi page table field, but that only works on consecutive pages and some times it's not always consecutive. Any thoughts?

@tylerkolota 8 ай бұрын

@@brandonvelasquez3530 Well you can try hacking something together to cut out some of the material on the non-relevant pages. Otherwise if I were you, I might be waiting for GPT4Turbo to come out on Azure. Even if MS doesn't immediately include image/document support in Power Automate for it, it is still possible to set up an LLM service with it that you could call in a flow & pass the text to its much larger context window. I already tested & set something like that up for GPT3.5Turbo incase MS started charging larger AI Builder credit fees for it.

@brandonvelasquez3530 8 ай бұрын

@@tylerkolota I only have 2 weeks left in this engagement with the customer. So I will just leverage what I have done so far and five this advice to them on how to make it better. There is not much I can do based on the situation this engagement finds itself in. I may try to build something like that on my own though. And do my best to make it a reusable solution because I can imagine this being a common issue companies find themselves looking to solve.

@tylerkolota 8 ай бұрын

@@brandonvelasquez3530 I mean, is the information you’re looking for on these pages usually in a specific part of the page? Like top left/right or something? Because there are ways to use Filter array action(s) to limit the extracted text outputs to just the text in a specific part of each page.

@rameshbabuc5981 5 ай бұрын

Thanks Tyler, One quick query - Is it possible to read table rows content continuing from Page 1 to page 2. My use case is below I need to extract information in tabular format from order confirmation pdfs received. Each pdf has multiple items and each item will have a Name, description, Vendor and delivery date. So the table will have four columns: Name , description, Vendor, Delivery Date with each row representing an item. The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. Example : Description in the table continuing in the page 2 from page 1 bottom , So unable to tag these rows which is continuing from page 1 to page 2. For example: if this is the pdf -----some text-------------------------------------------- -----some text--------------------------------------------- code: 1 description: this is first item Vendor: XYZ1 delivery date: 12.01.2024 code: 102 description: this is second item Vendor: XYZ2 delivery date: 13.01.2024 code: 103 description: this is third item -------page 1 ends here--------- -------page 2 begins here-------- description(Continuing from Page): this is third item Continuing Vendor: XYZ3 delivery date: 14.01.2024 code: 104 description: this is fourth item Vendor: XYZ4 delivery date: 15.01.2024 code: 105 description: this is fifth item Vendor: XYZ5 delivery date: 16.01.2024 ---------some text here-------------------------------- ------------------------------page 2 ends---------------------- ------------------------------pdf ends---------------------------- The document cannot be tagged correctly using custom model when page 1 content - Description is continuing on Page 2 . For the above document, the tagged tables look like this Code Description Vendor Delivery Date 101 this is first item XYZ1 11.01.2024 102 this is second item XYZ2 12.01.2024 103 this is third item XYZ3 13.01.2024 Code Description Vendor Delivery Date Some text are continuing from page 1 104 this is fourth item XYZ4 14.01.2024 105 this is fifth item XYZ5 15.01.2024

@tylerkolota 5 ай бұрын

This is a common use-case for this set-up because the GPT prompts generally do a better job determining that text before & after a page break belong to the same item. Feel free to set it up & test it on your files.

@rameshbabuc5981 5 ай бұрын

@@tylerkolota Thanks Tyler , i will look into GPT prompts , if you have such reference could you please provide more details on the GPT prompts.

@tylerkolota 5 ай бұрын

@@rameshbabuc5981 Yes that would be this video that you are commenting on and its associated thread / download page where you can get the template: powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

@itrmendoza 7 ай бұрын

@tylerkolta, as a use case scenario, how would it handle a checkmark next to text? I have pdfs with a check marks Id like to pull into the flow.

@tylerkolota 7 ай бұрын

I’m not sure it would pick up a checkmark. It does often pick up hand written signatures though, so it may do better with any x in the checkbox.

@tylerkolota 2 ай бұрын

There are now ways to do this with GPT4o powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-PDF-Data-With-GPT4o/m-p/2805514#M2882

@saidajimenez2159 9 ай бұрын

Hello, I am trying to create a flow so that when I receive CVs in my email, it automatically saves them in a share point folder. Up to this point I have a clear flow, there is no major problem. My problem comes when I want to extract the text found in the PDF of the CV, all the content is saved in a variable but I don't know how to send it back to a sharepoint list in this way to be able to make requests to gpt Could someone tell me if they can think of how to do it?

@tylerkolota 9 ай бұрын

Could you explain more about what the content is in the variable & what you mean by sending it to SharePoint? Are you using this template & the content is the text output? Do you have a multiline text column in SharePoint to send it to? And why are you trying to save the text to SP instead of going directly to the GPT action?

@MsKaryn Жыл бұрын

Is there a way to extract only specific images from a PDF (not text) and classify those images?

@tylerkolota Жыл бұрын

Hello Karyn, This template is mainly for extracting text data from pages of a PDF, but if you want to just extract entire images from within a PDF, then there are 3rd party connectors for that & some AI Builder models can help classify them. support.encodian.com/hc/en-gb/articles/360006998058-Extract-Images-from-PDF

@brentallard2087 Жыл бұрын

How would you configure it to work on many pages at once? I'm struggling with the SharePoint Connector to Get file Metadata and Get file Content to pass the File Content to the AI. Any help would be greatly appreciated. Great Template!

@tylerkolota Жыл бұрын

It automatically works on many pages if the PDF file content you pass it has many PDF pages. If you have multiple PDF files you want to work on at once, then you may need to combine them beforehand or maybe after a txt conversion on each. What error are you getting?

@brentallard2087 Жыл бұрын

@@tylerkolota totally makes sense about the many pages in a PDF. The error I am getting is - {"operationStatus":"Error","error":{"type":"Error","code":"InvalidPredictionInput","message":"Input prompt length cannot exceed 15788 characters or 4097 tokens. Please try again with a shorter prompt","properties":{"BackendErrorCode":"InvalidInferenceInput","DependencyHttpStatusCode":"400"},"innerErrors":[{"scope":"Generic","target":null,"code":"TooManyInputTokens","type":"Error","properties":{"maxCharacters":"15788","MlIssueCode":"TooManyInputTokens"}}]},"predictionId":null} It would appear that the (SharePoint) Get file Content is not extracting the PDF content in the same way in which the (OneDrive) Get file Content to pass to the AI hence why it the error sees too many token.

@tylerkolota Жыл бұрын

Yes, it’s going over the token / character limit for prompts. If you only need select pages for your workflow, I added a new version 1.8 that allows you to customize which page numbers go to the prompt.

@tylerkolota 11 ай бұрын

@@brentallard2087 A Microsoft Staff member just confirmed that the Create text with GPT action has been updated to use a 16k token model. So this template should now be able to work on 4x as many pages at once!

@brentallard2087 11 ай бұрын

How cool is that! Thanks for the update.

@Life_latelyyy 10 ай бұрын

Hi, Is this able to extract QR code information from any document (pdf or something)?

@tylerkolota 10 ай бұрын

This only pulls text data. It doesn’t copy QR codes.

@Life_latelyyy 10 ай бұрын

@@tylerkolota thanks for the quick reply

@saidajimenez2159 9 ай бұрын

Hello, in my country the gpt chat function has not yet been implemented, therefore I have to make an HTTP request to GPT4 chat, I'll tell you. I have to take a CV, send it to GPT chat and have it return me according to a list of jobs so that three jobs are qualified. That person must also return me first name, last name, address, training, experience and languages spoken of the person. It returns a json with all this information within a message, therefore it returns a string array, and I have managed to separate all this message within the array but now I need to get the different values and I don't know how to do it

@tylerkolota 9 ай бұрын

You could manually parse it with expressions or use a Parse JSON action kzfaq.info/get/bejne/m5aUrbCi1LCrpI0.htmlsi=9lONmcJMMdmH41RS

@tylerkolota 5 ай бұрын

Anyone concerned about the amount of pages they can feed the model may want to check a new template using GPT 4 Turbo & Retrieval Augmented Generation (RAG) to expand querying to just about any length document here: powerusers.microsoft.com/t5/Power-Automate-Cookbook/Query-Large-PDFs-With-GPT-RAG/td-p/2650178

@saidajimenez2159 9 ай бұрын

Hola podrias subir como podria ser para un cv?

@suryagvs9296 Жыл бұрын

Hi @tylerkolota9031 I did not find "create text with GPT" in my power automate action list , to get this should i need to activate any feature or prerequests for this. i'm able to see "GPTPromtengineeringmodel" in predict action, in this action i have given some prompt but i'm getting below error, {"operationStatus":"Error","error":{"type":"Error","code":"InvalidPredictionInput","message":"Parameters JSON string could not be properly deserialized","properties":{"BackendErrorCode":"InvalidInferenceInput","DependencyHttpStatusCode":"400"},"innerErrors":[{"scope":"Generic","target":null,"code":"InvalidModelParameters","type":"Error","properties":{"MlIssueCode":"InvalidModelParameters"}}]},"predictionId":null}, please guide me on this.

@tylerkolota Жыл бұрын

Hello, The Create text with GPT action is not yet available in all regions. However if you want to try setting something up before the general availability of the action, and if you are not dealing with sensitive data, then you can try requesting access & setting up an OpenAI API connection so you can send HTTP requests to GPT. kzfaq.info/get/bejne/sMySibGGq966mKs.htmlsi=SfkP2gGts6WaQVIP learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart techcommunity.microsoft.com/t5/azure-ai-services-blog/working-with-gpt-4-and-chatgpt-models-on-azure-preview/ba-p/3773595 platform.openai.com/docs/guides/gpt

@suryagvs9296 Жыл бұрын

Thanks for quick reply

@rameshn1195 Жыл бұрын

@tylerkolota9031 I am facing error like this in AI biluder Predict action "{"operationStatus":"Error","error":{"type":"Error","code":"InvalidPredictionInput","message":"Parameters JSON string could not be properly deserialized","properties":{"BackendErrorCode":"InvalidInferenceInput","DependencyHttpStatusCode":"400"},"innerErrors":[{"scope":"Generic","target":null,"code":"InvalidModelParameters","type":"Error","properties":{"MlIssueCode":"InvalidModelParameters"}}]},"predictionId":null}" Could you please assist?