Store Receipts to structured information
As the title suggests, this process is a tool to transform receipts into a table sheet. The process is made for receipts that are already scanned and processed with an OCR Tool.
Input: txt. Files
Output: table with the following columns: Date, price, category, receipt index, buyer, product description
Roughly speaking this process is divided in the following steps:
1. .txt2exset: In this sub process a receipt.txt file is segmented. Every line represents one example. So, the example set has as many examples as there are lines in the original file. For every example (i.e. line) 2 regex extract the price: “(-?([0-9]+([\,\.][0-9]{2})))” and “(((-?([0-9]|[0-9]{2}|[0-9]{3})([\,\.\'\;\s][0-9]{2}))(\s[ab]|[ab]|[12]|\s[12])))”
2. Categorization: Each product should be categorized. To keep it simple I choose to build a dictionary of rules that stem products to a category. Type: categoryX:.*product.keywordX.* . To do this, the process has to transform the values from the data set to documents, loop these documents and put it back into a data set.
3. Manual corrections: Of course this procedure does not create a usable table. Manual corrections are necessary. That’s why the process stores the example set several times in a sheet for manual correction. This means that the process has three break points. First (“Art und Preis”) to correct the extracted prices and categories if necessary; second (“datum und geschäft”) to correct the extracted date and the extracted store. This was a bit tricky because you have to pay attention to the continuity of the data set during the manual editing.
4. Output: Because of OCR errors there are still many wrong characters in the extracted prices. You can correct these errors an format it in the table sheet with a formula like this: “=GLÄTTEN(WECHSELN(WECHSELN(WECHSELN(WECHSELN(WECHSELN(W2;"b";"");"a";"");";";",");".";",");"'";","))” and “=WERT(SÄUBERN(X2))” . At the end you may cross check the result with a pivot table.
Preview
Run
Not available
Workflow Components
Unavailable
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment