Systems that use immediate strict categorization don’t work for some people. At a company I worked at we went with a two-phase approach. A person would write notes (literally iphone notes) in the following format:
Apr 5 24
-50 Alice tools
-220 Bob home project
Apr 2 24
+20 Bob returned loan
Note that time goes “up” to avoid scrolling too much on open. Later we loaded it into a script which would parse dates, detect keywords and make templates for proper double-entry. It would detect both the outer “agent” and the internal analytics and also put the original text as a comment. What it couldn’t detect had to be manually categorized and (sometimes) added to the script. Of course the script used editable lists for correspondence, not only hardcoded values. These lists were person-specific, so that Alice and Bob wouldn’t have to sync their vocabularies.
I cannot imagine any of the people at that company filling this
9/19 (1234) comment here
Cat1:cat2:cat3
Cat4:cat5
every time money moves and they are in a queue, in the car, talking to someone, etc. They’d simply resort to smaller notes again in the notes app or in the chat, or would try to “remember” it, to fill properly at the end of the day (more likely week).
It sounds like what you are doing is plain text accounting!
> Later we loaded it into a script which would...
I too do not manually enter things in the ledger file (well, rarely). I actually enter it into KMyMoney, and have a script to convert to a ledger file.
Overall, your comment is coming across as a fairly weird complaint. It's a tool, and given that its in text, you're free to build whatever workflow you wish around it - including having people take notes on their iPhone and using a script to convert to the ledger format.
It’s not a complaint, just an observation that anything non-natural language simply didn’t work for almost 100% of my “patients”, including myself. This format isn’t even a format. It’s how they write notes for themselves with a little agreement on regularity to simplify the parser.
Btw, we tried everything: appstore apps, web forms, our own app. Everything sucks irl except just text. I still use this entry despite not having a database anymore for a script to load my data into.
We also don't type in the ledger files directly, but rather than convert it from a free format note we use the comment field of the banking apps, and the comment field of the expense report app (we use Zoho), to indicate what account it should apply to.
Once a quarter I download CSV files from our bank account, credit card and Zoho app, and we have a set of ruby scripts that parse the CSV files into ledger files. They infer the correct accounts from the comment fields. Of course, the script and the ledger output tends to need a few tweaks, but it's >95% automated that way.
As a bonus the scripts also add a few sanity checks to the ledger file, e.g. ensure the balance matches what is downloaded from the bank and entries fall within the current period.
I've been using ledger (ledger-cli) from the moment I first became self employed (almost twenty years ago). While far from perfect, I'm very happy about it. It's nice that everything is in plain text, which means that I can script things, read everything in VIM, and easily extract data. For one of my two current companies, the ledger file is 2MB of plain text and contains the transactions from 2016 onwards.
While I personally didn't find much value in lots of different "accounts" (categories), it's still been indispensable in keeping track of everything.
Learning double-entry bookkeeping (which tools like ledger use) was really fun (and not that hard in hindsight) and probably a skill that is useful for the rest of my life.
I'm not using any of these tools yet, although this article popped up at just the right time because Quickbooks and my bank conspired to miss a bunch of transactions that I'm cleaning up now... but that's beside the point.
The strategy I've used for different accounts/categories is to make accounts that match the expense categories that the Canada Revenue Agency wants on my tax return. Early on when I was getting into it I made a bunch of accounts for logical categories (Hosting Expenses, Prototype/Manufacturing Expenses, etc) but then after a few years of trying to map those categories into CRA categories... I just realized I could pre-categorize them appropriately and make my taxes simpler at the end of the year.
When I sold my house, the escrow company mailed me a refund check of about $3000 on the closing. I never got that check. Guess how I found out about it, 5 years later, when I started using ledger-cli?
+1 for ledger. It has been the best one I've used. The fact you can script it is f'ing amazing. My taxes have been on-point for 2+ years now since I started using it.
In thinking a bit more, I think the biggest pain point for me is one with plain text in general: it supports no attachments. I have my own system to refer to specific files and built a Mac app that lets me drop a file on a transaction, then it copies that file into a specific folder and adds a tag to the transaction. This could be done so much easier if the data would be sqlite, but then you obviously lose the ability to edit it directly in your text editor.
You can perhaps use local file:// references in the files and open them in a browser, or you can use ledger from within a tool like Obsidian, LogSeq or DevonThink. Perhaps associate attachments in that way?
oh yes, absolutely! I have a system that works well for me. But it's inherent to plain text files that they don't support attachments. It's plain text, after all.
ZIP can be a useful container for when you want a plaintext file with attachments, but you also want the full folder structure transportable itself as a single file. Automating zip/unzip operations isn't too bad, and you sometimes easily can teach some applications to work directly with zip container streams, without a temporary folder, too. Some existing text editors can do it naturally already; emacs and vim both have native support. Others have plugins available, like VSCode has ZipFS [1] (which is also on the backlog of possible things to support out of the box [2]).
You can paste the file as a base64 string. Disable word wrap and it should only appear as one line.
A good text editor of the future should recognize these base64 strings and make them appear like clickable files, but still let you right-click and go back to the plain text representation.
Fava and beancount have some affordances here. You can make a data directory with a directory structure that matches your chart of accounts. Place datestamped files in here and they'll show up inline in your fava ledger view.
Additionally you can annotate a transaction with a "document" tag and that document will show up directly associated in the ui.
The ui has pretty good previewing for these. I add pdf versions of all of my statements, and attach receipts to specific transactions (business expenses, proof of paying taxes, etc.)
Plain Text Accounting has become significantly easier to do for me on a regular basis, thanks to LLMs. Specifically: importing bank statements into hledger and avoiding manual entry.
I use a JSON file to map bank entries to my hledger accounts. For new transactions without mappings, I run a Python script that generates a prompt for Claude. It lists my hledger accounts and asks for mappings for the new entries.
Claude returns hledger journal entries based on these mappings, which I can quickly review.
Then another script prints out hledger journal entries for that month's bank transactions, all cleanly mapped. It takes me just a few minutes to tweak and finalize.
I can also specify these mapping instructions in plain-language which would've otherwise been a fragile hodgepodge of regexps and conditionals.
Good question. LLMs are surprisingly less fragile than hand-coded parsers for unstructured data like the ones in a bank statement.
And to be clear - I'm not sending the entire statement to Claude; instead, only the account name/narration of those transactions for which I already don't have a mapping. Claude then returns a well-formatted JSON that maps "amzn0026765260@apl" to "expenses:amazon", and "Veena Fuels" to "expenses:vehicle" and so on.
I can also pass in general instructions saying that "Restaurants and food-related accounts are categorized under 'expenses:food'", and it does a good job of mapping most of my dining out expenses to the correct account head.
The actual generation of journal entries are done by a simple Python script. The mapping used to be the hardest part, and what used to need custom classification models is just a simple prompt with LLM.
I've been using Beancount, really enjoyed learning it, writing tools for data import and having a hands-on experience with accounting. But it's been almost a year since I last imported data, I planned to do it monthly but it's a bit of a chore (it takes 30 to 60 mins despite a lot of it being automated).
I wish banks weren't actively sabotaging data export. I wrote some scripts for this in the past, reverse engineered the login with OCR, cleaned up the invalid entries, but... It's just too much effort to maintain this. Once the mobile confirmation started being required, I just gave up.
I use beancount and I use ofxget for some accounts and then keep a CSV archive for the rest and wrote some scripts to sync them into beancount when a new file shows up. It's annoying to download everything and keep it organized, but it gives me better piece of mind than all the other SaSS accounting systems
I've found that so few of my accounts support ofxget that it's not worth the trouble of dealing with it. I just occasionally download year-to-date statements from all my accounts and then run scripts to convert them to ledger format and append only the things after a given cutoff date (things like the credit card script are pretty interactive so I don't want to overwrite the old data). To know which accounts are in most dire need of a check, I have a script that shows the last transaction on each statement-generating account.
This basically works for me and catching up doesn't really take long. There are some accounts that I update very infrequently, but I've gotten over my OCD about this and it doesn't really matter. My wife doesn't appreciate being hassled for the statement from her work HSA every month -- nor does it really matter how much is in there at any given moment, unless we have a big medical expense -- so we just occasionally sit down and catch everything up, maybe twice a year.
Overall I find the automation means it's vastly less work than when I used gnucash, and the flexibility in expense structures and ease of assigning things mean I have much better budget data than when I used mint.com.
But even banks that use it make massive mistakes and break the basic assumptions. Unique ID repeating in the same day for example. Or using the minutes in place of the month...
Banks are much less technological than the common stereotype about them. In fact, integrating or automating a bank is one of the worst things you can do. The list of easy integration goes like this:
High-tech internet services
Stock exchanges
Websites with no API
…
Telcos
…
…
Banks
Unreachable rock bottom of insanity
You should see what the major players (especially MS) do to iCalendar. The standard has been around since at least 2009, with the last update in 2016, but none of the major players implements the standard correctly, and most utterly fail on VTODO, I suspect on purpose because to-do lists are actually useful.
I can't decide whether the failure to implement iCalendar and OFX standards properly are examples of incompetence or malice, but I suspect the latter with an eye to vendor or bank lock-in.
Working on calendar software is boring, so the big players likely give it to interns and new recruits. Office suites are so last century.
I don't think things will improve unless we find a way to shame big companies in paying attention and playing nice to each other, like it was done for HTML standards (pre-Google's monopolization).
I'll go with incompetence or lack of care. The issues I found were so bizarre/stupid, I would be worried for anyone who thought to make them on purpose.
Exactly my experience but with GnuCash. I used to do it weekly, if you miss one or even two or three weeks it's not too bad, but beyond that the effort required to catch up just snowballs. It's also been about a year since I've imported. I've been meaning to get around to reviving it, and the only way I can imagine doing it is to break it up into smaller chunks, like uploading one month at a time over a few days.
Or just start over on the day (or month) you decide to import.
As someone who wrote a budgeting app, historical data > 1 year is a lot less useful than most folks realize. I would say at least 80% of folks never look back further than 6 months. It's not even worthwhile for comparing YoY as your lifestyle, the economy, etc change so much.
I ended up adding per-transaction opt-out roll up functionality for this reason, effectively squashing transactions into monthly and yearly budget summary transaction. Saved a ton of space with little loss of granularity.
I agree that data much older than a year is less useful, but to me, historical data going back about a year is quite useful. The main time that my records are practically useful is when I need to file a tax return, which obviously requires a year's worth of data. Beyond that, I find it quite useful to track changes in spending, earning and investments over time.
Years ago when I helped people with random computer stuff (circa 2002), I had a client who’d run into trouble with Quicken because they had about 20 years of historical data.
It was so weird to me - they were paying me hundreds of dollars so that they could reference their water bill in 1987. I have my own irrational tendencies like this, of course, but still seemed weird.
Add an "error" account, put transactions there so that your transaction accounts have the correct value, and focus on inputting current data.
If for some reason you care about your historical data enough to import it, you can add new transactions into "error" to subtract whatever part of it they are responsible for. If the account reaches zero, you can even cheat and delete all transactions.
GnuCash actually does that after an import. Sure, one can use the nice reconcile flow, but for importing lots of data I prefer slowly working through the bank statement at my own place, paired with occasional sanity checks.
What is the bottleneck in your experience? I have around 100 transactions a month spread across various banks and providers. My experience is this:
I spent about a week defining a procedure to export all statements into CSV, converting the CSV into a Gnucash-friendly format where needed with some GPT3-authored PHP scripts. Once a month I do an import into Gnucash and the process takes at most 2 hours.
There are various frictions. I use multiple different banks, credit cards and investment platforms (partially due to my life being split between two countries). Each of them has their own clunky manual process for downloading statements (including MFA). The statements are in different formats. On the most recent version of GnuCash that I used I experienced weird bugs with the CSV importer which meant it would sometimes crash when I tried to use preset rules.
Classifying transactions is probably the biggest time sink. GnuCash's auto-categorise feature is somewhat helpful but requires a lot of manual intervention and honestly I spent a lot of time just trying to figure out what a particular expense related to (often the note on my bank statement is not helpful).
Ultimately, 2 hours per month is probably about what it took me as well. But x12 that's 24 solid hours of importing. And once you go back a few months it probably takes longer because it is more difficult to recall what particular expenses were for.
If you're willing to hold your nose a little bit, Plaid[0] might be worth looking into. They have (had?) a testing tier which is more than enough for personal uses (100 linked accounts).
According to this Reddit thread from 10 months ago, some major US banks like Chase won't work even in development mode without production approval, and some people mention obstacles getting approved for OAuth without being a company or otherwise having that level of security posture. (Other people seem to have gotten successfully approved for OAuth.)
I agree this niche needs a better solution than it currently has, although the commercial and regulatory incentives might not be well-aligned to force that. In my family we have important accounts in the US and in the EU (as well as less important accounts in Canada and Mexico), and I don't know of any personal use-friendly solution that lets me pull my data automatically from all of these.
I'd honestly like some nice integration between Plaid and Excel or GSheets to let me track capital gains transactions for my non-euro accounts in the weird way that German tax law requires (over a surprisingly broad range of accounts starting next year at the latest). But nothing turnkey seems to exist right now.
Also in June of this year Plaid decommissioned its development platform but added a (quite) limited free tier of the production platform: https://www.reddit.com/r/fintech/comments/1c8xxet/plaid_prod... This doesn't really invalidate the main point of my comment, but it does affect a few details.
I use ledger for all bookkeeping and accounting for personal and many LLCs. If you’re a terminal rat and cli master, use vim or emacs, sed/awk know the basics, script in bash/python/perl/ruby regularly then just learn ledger and double entry accounting and switch to it and I suspect you will be much happier than whatever you are doing today.
- bal —-dc is something US accountants might recognize a bit. but more likely than not they are incapable of understanding negatives correctly in the way ledger uses them, so easier to just write a few scripts to convert it to DR CR style for them. I’ve been shocked at how little abstraction accountants I’ve dealt with are capable of.
I agree with the first two of these, they are great. (And I bet the third is too, I've just never needed it.)
If I had to submit one tip it would be to set everything up with a Makefile or similar. I keep my transactions spread across quite a few separate files for different accounts, and the actual commands I issue to include the right files are very long. Similarly I have various plotting and summary scripts whose exact syntax I don't usually remember. But with make I can just "make cashflow" "make balance 'A=Checking'" "make balance-plot 'A=retirement'" and so on.
I've written a series of posts on practical "recipes" for how to use Ledger (one of the leading plain text accounting systems) effectively in more complex situations beyond the basic tutorials: https://felixcrux.com/blog/ledger-practices
As a heavy ledger user these are great recipes in my opinion.
Since you’re here and clearly you know ledger well I’ll share a pattern I have that perhaps is a good idea or perhaps has an alternative approach.
We need to track customers and suppliers carefully and then assign payments both made and received to specific projects and categories within those projects. Invoices can contain a mix of these projects in either direction. So our workflow is:
- assign transactions to an account labeled by the counterparty when imported from bank account transactions.
- a script uses ledger print command and creates a 2nd journal file for each counterparty with a mirror transaction inverting the original transaction out of the counterparty account and into an Unknown project and Unknown category sub account as a place holder.
- transactions are then matched by bookkeeper to specific invoices and then assigned to Project:Category accounts in the counterparty journal file. transactions will be split if it’s needed at this point. so a single payment can be split to multiple Project:Category accounts.
- by including all bank transaction journals and all counterparty journals then ledger bal command show if all transactions have been processed by bookkeeping.
Plain text accounting is cool but I think one of the biggest barriers for people is downloading bank data into a standard format.
The banks are never going to embrace much more than CSV or excel files... the various data aggregation platforms (yodlee, plaid, etc...) are not open source or hobbyist friendly.
Back in ancient times there was a company called Wesabe (https://en.wikipedia.org/wiki/Wesabe) that wrote software that did bank syncing on your desktop. Mint.com basically put them out of business but I still think about that approach. I think it could work for open source.
hledger has tooling to transform fairly arbitrary CSV into transactions it understands[0]. I haven't tried it yet, but after spending 4 hours over multiple days helping my SaaS bookkeeping company troubleshoot their bank connection problems[1], you can better believe I'm willing to put a little time in trying this out.
Every damn time I reconcile transactions I end up fighting their system that I can't see the workings of or fix myself. It's getting to the point where the juice isn't worth the squeeze. I'd sooner deal with the CSV myself given a half-way decent set of tools to do so.
This is the big advantage of hledger. It has two ways of translating csv into journal form - one simple and one more complicated, but very flexible.
I find it best to have a separate journal for each downloaded account. I just include them into a master journal (along with a manual entry journal) and generate reports from that.
I also use git so I can roll back the latest import, if something goes wrong - but that hasn’t happened yet.
I recently discovered Paisa, which is basically a nice UI over ledger-cli. Import is very convenient. You upload csv (or similar), see the preview, then write a script which converts each row into ledger text format. There is linting and everything. When you like the result, just save the script so you can import again anytime. It also supports downloading commodity prices if you use it to track stocks and similar.
Charts are not generic enough for my taste so I'm exporting data elsewhere, but for data entry it is great.
yeah I looked at that too... and that aspect is useful.
However, I'm not really looking for a layer on top of ledger as much as I'm looking for a configurable web-scraping system (using the local password manager) that can be run to get the csv/pdf/etc.. files needed to create the ledger.
I have run into issues where CSVs are not correct/aligned with PDF statement data that banks are legally obligated to provide. In addition CSVs almost never have balance data. So I download the PDFs and extract data from them. This is much more painful than it needs to be- providing a sensible machine readable PDF involves just following a few simple rules to ensure the 3 or 4 transaction fields, and the few needed statement fields (dates and balances) are extractable without fragile heuristics. There is no conflict between branded and human readable vs machine readable.
In my country any transactions are reported using sms. So implemented a system using Tasker to catch these sms and store it in a CSV file. This CSV file is put inside a folder which is synced using SyncThing to my desktop.
I had plan to process these data and add to an accounting system, but didn't get a chance, and then my mobile crashed and I lost the Tasker action. Now I'm not getting any motivation to implement it again
I’d appreciate hearing how others have used the various plain text accounting tools for their own use. Are you legitimately using it to inform yourself of your spending habits and taking corrective action? Is it simply for tracking your expenses, revenues, net worth, etc? Or is it simply about the process? I can certainly see the appeal of such an orderly, structured process.
Every time I’m reminded of plain text accounting, I have either an irresistible urge to immerse myself fully into the process, or feelings of guilt for not staying committed to my previous attempts. Right now, it’s mainly guilt, since I’ve not updated my personal ledger in a month and a half.
Ultimately, I think I’m unsure about why I’m using it, and eventually feel like I’m logging transactions just for the sake of it.
I’m always using it for spontaneous group accounting, e.g. three guys vacation. Because for group expenses it’s much easier to split the check later in the day or the week. We just accumulate the web of debts and sort it out on demand. It’s very hard to track such chaotic expenses otherwise.
Same for business expenses, I just pay with my card for equipment, services, etc. Later I report it up and get cash/transfer back (doesn’t work for some countries/businesses).
My personal eaten-shitten expenses I know from my bank and experience, there’s no need to account for these.
Every time I’m reminded of plain text accounting, I have either an irresistible urge to immerse myself fully into the process, or feelings of guilt for not staying committed to my previous attempts
You must have financial motivation, not emotional. Once you know that if you give up on accounting, your buddy will happily drink through hundreds of bucks, or your company will get a free hdmi cable, you enter the damn sum without hesitation. Personal accounting doesn’t work because you only can lose analytics, not money.
I feel like plain text accounting software is overkill for splitting vacation expenses. The Splitwise phone app is very good for this, and shares some of the responsibility.
I'm self employed and use it (Beancount, as I like the more strict approach) for my business accounts and also for my stock portfolio. Fava, the web UI, is very handy for reporting and visualising things, though I also have a few scripts to automate certain processes like importing transactions from Wise and tracking exchange rates. I really don't have the discipline to use it for daily personal expenses or budgeting though.
I don’t use it as some motivational tool to change spending habits or whatever but I do like to keep a record of all my accounts and assets, I think it gives me a better understanding of my whole financial picture and I create ‘virtual’ accounts within some cash savings account that divides the balance into pots for things I’m amortizing like car maintenance or insurance
- Keeping track of my expenses in various sectors, food (groceries, eatout), books, magazine subscriptions, etc
- Keeping track of my current balance accounts across various currency deposits
- Loans I give to people, and gifts I give to people
- Creating virtual envelops to segregate my savings account money for my goals like travelling, buying gifts for someone, investment goals, etc
This has helped me tremendously
- for reducing my eatout habits and eating more at home by realising just how much I was wasting money by eating out daily, and inputting the saved amount into compound interest calculator to realize potential lost income and wealth from putting those into an index fund account
- to keep track of pending payments from clients and calculating my real cashflow against cashflow based on expected income
- reduce my impulsive spending, by tracking my savings account money with virtual envelops aka 40% for investment goals, 1% for gifts, etc, it helps me to not just see a big balance on my account and start spending it away seeing that money as segregated chunks in my mind helps me stay in my lane.
- I have a program that generates all sorts of charts to track my wealth growth over time, expense growth and decline across categories, which I then dump into a webpage with my notes on how the changes were a net positive or negative outcome on my life, I do it annually to decide what i’ll do next
- I also have a python script that takes my ledger file and converts it into an excel sheet to send it to my chartered accountant to file my annual taxes
I also maintain a separate ledger file for my business (I dont maintain that one manually, I just export the data from accountant’s software, to do my own calculations at home)
- I use it to calculate cashflow projections to predict how my expenses might potentially grow with rise in revenue
- Track categories of spending to spot anomalies in spend across departments
- Calculate whether I should hire more or raise marketing spend, calculate metrics like ROIC (Return on Invested Capital)
The double entry helps me catch discrepancies in accounting if any, by importing bank statements and generating a ledger from that, I have accounts separated by usecase (discretionary spends, employee perks, business inputs) , with each one getting deposit from main account weekly. I use that to calculate if somethings odd and books are all cool.
I have had trouble before with an accountant running pseudo expenses on my books without telling me, just to impress me by showing a high taxes saved, without being transparent, landed me in court once, with a huge fine with late penalties.
Now I dont trust accountants and make sure I double check no matter what.
Plus I have a lot of automation scripts and stuff, imports from stripe account, imports from bank statements, accountant’s own ledger, etc
I match them all with python scripts and try to look for discrepancies.
I love plain text accounting, as a programmer it works for me, I automated a ton of it, and I have tons of my own macros and shortcuts in my code editor(vim) to make things very easy and simple,
I love it overall, I built out my own system on top of hledger across the years.
I've wanted to explore this so that I've got records of all my transactions which I can feed to generative AI when it eventually becomes capable enough.
E.g. "Using my financial records, help me set a budget which allows me to do XYZ within ABC bounds."
I’m a huge ledger fan (hledger specifically) and have used it to run my entire accounting life for the past 8 years or so.
A few tips:
* Resist the urge to break up your various accounts into too many separate files. I tried that and went back to one file per account per year (aka “venmo-2024.hledger”). Also helps with below…
* GitHub CoPilot is remarkably and shockingly good at working with ledger files. It will do the balance addition/subtraction on following lines almost perfectly. Also, if you need to manually enter a new line, you can often just enter a shortcut one-line comment and it’ll fill the entire entry, aka:
Is there an text editor that is able to autocomplete the categories in hledger format ? It would be great to type Assets:: and then get a list of the possible categories, but I haven't found any editor or extension that does it.
vscode has some code coloring extension but no code completion that I'm aware of. If you found one, please consider sharing it :). I would happily switch to another editor if one has a extension that autocomplete categories.
I use beancount in emacs using the beancount extension which does autocomplete, and I’ve added some of my own elisp tricks to make navigating around the file easier.
To be precise, hledger-ui is not an editor, but it can open hledger add, hledger-iadd, or your $EDITOR[1] for data entry, all of which can complete account names. Web guis like hledger-web and Paisa tend to complete account names as well.
I love PTA and have been doing it for years, using beancount, fava and emacs.
The main benefit for me is keeping track of everything, including pensions, RSU vests and so on.
I have some scripts that help me connect to banks and translate the transactions into the correct format with some crude rules based categorisation, along with scripts that convert CSV files from investment accounts etc.
It’s a lot of effort at first but I’ve got the system down now to maybe 10-15 minutes work once a week to keep everything updated.
I'm looking for a plain-text solution that also knows about inventory (counts, FIFO, dollar cost averaging) along with invoices/POs (AP/AR.)
Also, I wish they would use words like "debit" and "credit" instead of trying to hide this under +/- notation. It makes translating from real financial documents or scenarios into plain-text reports somewhat painful because we're not speaking the same language.
I just switched off of ledger because multiple accountants and tax preparers didn’t know what to do with it even if I hand generated profit and loss reports.
Quickbooks online has gotten pretty good. I honestly wouldn’t recommend these plain text tools anymore. Quickbooks has become almost a standard.
Has anyone tried training their own personal machine learning model to take export from their Bank+receipts and auto categorize everything? It seems like it'd be a fairly simple classification model that wouldn't require too much training...
I've been tinkering with that the last few days. My main issue is that most of my bank statement descriptions are really bad. It's often impossible to know what the transaction was about just from the description. Local LLMs (9B parameters, I don't have the required hardware for bigger models) don't help at all as they don't know the context around the transaction.
I use Firefly III for my finances management, and in the end it is just much better to setup "dumb" rules that look for keywords in the description/IBAN numbers etc. I have about 30 of them and they cover 90% of my transactions.
I have tried and given up. As others noted, my bank statements are just random short codes and numbers most of the time. So it requires a fair amount of manual review. There is an article comparing Fuzzy searching using Elasticsearch and a ML model training, that concluded a fuzzy search is better. I have tried with Miellisearch and found it okay. But still ended up doing everything manually in an Excel sheet, because bank descriptions just suck.
Not entirely related to this but, I always wanted to download my transaction information from the banks and keep it locally on a periodic basis so that I can keep track of the transfers I have done to a sepefici account. From a business perspective this is very much helpful, but not able to find any free and opensource solution for it. Have done something using Python , but updating it in a periodic basis is difficult considering there is no API there to provide this data.
For most PTA users, 1 million transactions would be a large file; we usually split them up more.
On a macbook air m1, here's how hledger 1.40 from homebrew handles it
(it used to be faster; perhaps this will improve again):
$ hledger -f examples/1Mtxns-1kaccts.journal stats
Main file : .../1Mtxns-1kaccts.journal
Included files : 0
Txns span : 2000-01-01 to 4737-11-28 (1000000 days)
Last txn : 4737-11-27 (990974 days from now)
Txns : 1000000 (1.0 per day)
Txns last 30 days : 31 (1.0 per day)
Txns last 7 days : 8 (1.1 per day)
Payees/descriptions : 1000000
Accounts : 1000 (depth 10)
Commodities : 26
Market prices : 1000000
Runtime stats : 80.23 s elapsed, 12465 txns/s, 2584 MB live, 7679 MB alloc
Ledger was traditionally faster on at least some reports, but I haven't been able to reproduce that on my machines for some time. Today, with the same file above, it ran for 40m before I killed it.
In my experience about half the time is spent on parsing and half on report calculation. Long-running apps like hledger-ui and hledger-web do the parsing only once at startup, saving time compared to CLI commands.
I set up the plaintextaccounting.org site in 2016,
to grow a more organised info hub and community around ledger (2003), hledger (2007), beancount (2008), and the many related apps and resources.
I'm happy to answer questions; there's also a FAQ on the site.
I don't spend as much time as I'd like making the site and docs better.
Feedback and help is always welcome.
Stability, efficiency, and longevity are all important, which is one reason it remains fairly simple.
This style of tools and workflows for bookkeeping/accounting, which I named "plain text accounting" for convenience,
but was first popularised by Ledger starting in 2003, has a number of aspects; it's not so easy to explain briefly.
I think a key one is the use of textual domain specific languages for interacting with the accounting software's internal data model.
I mean the various file formats of the PTA tools (describing data),
and also the tools' command line interfaces and related scripts and idioms (describing reports or actions).
Textual languages are more expressive and flexible, version controllable, and modular/scriptable/glueable than GUIs,
which tend to be more static in their capabilities.
Note once you have one text DSL, it's relatively easy to add more, eg custom formats that better fit your needs.
And such DSLs need not preclude GUIs; they can be an alternative (perhaps assisted by smart editors/IDEs), complementary, or a foundation.
I see a lot of people in here using hledger or beancount over ledger. Could somebody explain the differences? Looking to get back into PTA, but am facing some choice paralysis now.
In the past I've tried various free accounting tools, but sadly none of them could track account/card numbers (EDIT: while processing bank statement exports). I don't need to track my many bread and pastry purchases, but I'd like to track things like investments, split lunches, rent from flatmates etc.
I have multiple accounts and I need to track transactions between them, and also distinguish them for all other transactions. Could anyone here recommend a tool that deals well with this?
I'm partial to semi-ad-hoc plain-text bookkeeping, which I already do for other things, but I'd be happy for any recommendations.
Emacs org mode is useful for this. Its plain text tables have spreadsheet capabilities and you can cross reference other tables, to separate out things.
There is ledger-mode and beancount-mode which are both nice (depending on which program you use). I would say the majority of what I do in practice Python scripts to convert statements to ledger; the amount of stuff I do by hand is minimal enough that it would be easy to live without the emacs mode.
I use emacs + beancount-mode + some helper elisp scripts.
On average, it takes about 30-40 minutes per sitting to do a weekly review, and that's mostly checking g what hides behind amazon/ebay/Google payments.
I'd embrace plain text accounting more if it had a better schema. Ledger's is absolutely atrocious, and it drives my OCD nuts trying to use vim to "write" accounting entries.
I think my ideal PTA would be some kind of jsonnet-based system where I can create/call functions to generate journal entries.
You never have to look at it. I have an Excel - one file a year - with a schema that makes sense to me, then a Python script to convert that to {year}.beancount. The script is triggered by fswatch.
So I get a browser frontend with fava and data input in Excel.
It’s non-trivial, but then so is any form of accounting.
GNU recutils should become the standard backend. Plain text db which has a powerful albeit simple records format, proper ids, enforces constraints, removes need of silly type conventions better defined in fields with %type, built in encryption. Foreign keys. Auto generated fields. Regex.
No one in their right of mind will want to manually edit the data store holding critical accounting data. 99% of the times should be ETL with the odd visual check for inspiration.
For a time I threw together a bunch of python scripts which did billing for me based on YAML files which struck a nice balance of readability vs programmability (since YAML turns into lines quite well and has less special character noise).
I hope you've at least once stumbled upon the existence of https://noyaml.com/ and/or, at the time, were familiar with the quirks regarding number interpretation.
I suggest use some other tool for data entry, and as long as you can figure out its internal format, write a script to output to ledger. That's what I do with KMyMoney.
ledger can validate that all account names are pre-defined, are formatted in a specific way with regard to decimal and thousands separator, their currency specified and a few more things. (None of these are enforced by default, but for any serious use of ledger you would want to set `--strict` in your .ledgerrc file)
What else would you want ledger to enforce? We actually have a few very specific (to us) rules we enforce, but we do that via a ruby script. That's the power of plain text accounting: you can just whip up a small script to do validation for you.
That's definitely the 'retro' way of doing it, and has been effectively outlawed in Europe with the new banking file and payment standards laws.
It's rather hilarious, one of the major banks I had a conversation with back in 2019 (as these new laws were barely getting off the ground in tech implementation) noted that they had a service which allowed you to drop in raw SWIFT messages (interbank commmunication format) into a folder via SFTP to make payments in bulk or download bank statements. It's very real!
I wouldn't be surprised if they have a lot of big firms grandfathered into that, and may be still (unlawfully?) offering it.
There are far more FTP connections where people download files than there are available APIs at a bank. Plopping CSV files on a SFTP server is dirt cheap compared to an API.
I cannot imagine any of the people at that company filling this
every time money moves and they are in a queue, in the car, talking to someone, etc. They’d simply resort to smaller notes again in the notes app or in the chat, or would try to “remember” it, to fill properly at the end of the day (more likely week).