Hallucination

Vol. 22 No. 4 – July/August 2024

Hallucination

GPTs and Hallucination:
Why do large language models hallucinate?

The findings in this experiment support the hypothesis that GPTs based on LLMs perform well on prompts that are more popular and have reached a general consensus yet struggle on controversial topics or topics with limited data. The variability in the applications's responses underscores that the models depend on the quantity and quality of their training data, paralleling the system of crowdsourcing that relies on diverse and credible contributions. Thus, while GPTs can serve as useful tools for many mundane tasks, their engagement with obscure and polarized topics should be interpreted with caution. LLMs' reliance on probabilistic models to produce statements about the world ties their accuracy closely to the breadth and quality of the data they're given.

by Jim Waldo, Soline Boussard

Confidential Computing Proofs:
An alternative to cryptographic zero-knowledge

Proofs are powerful tools for integrity and privacy, enabling the verifier to delegate a computation and still verify its correct execution, and enabling the prover to keep the details of the computation private. Both CCP and ZKP can achieve soundness and zero-knowledge but with important differences. CCP relies on hardware trust assumptions, which yield high performance and additional confidentiality protection for the prover but may be unacceptable for some applications. CCP is also often easier to use, notably with existing code, whereas ZKP comes with a large prover overhead that may be unpractical for some applications.

by Mark Russinovich, Cédric Fournet, Greg Zaverucha, Josh Benaloh, Brandon Murdoch, Manuel Costa

Assessing IT Project Success: Perception vs. Reality:
We would not be in the digital age if it were not for the recurrent success of IT projects.

This study has significant implications for practice, research, and education by providing new insights into IT project success. It expands the body of knowledge on project management by reporting project success (and not exclusively project management success), grounded in several objective criteria such as deliverables usage by the client in the post-project stage, hiring of project-related support/maintenance services by the client, contracting of new projects by the client, and vendor recommendation by the client to potential clients. Researchers can find a set of criteria they can use when studying and reporting the success of IT projects, thus expanding the current perspective on evaluation and contributing to more accurate conclusions. For practitioners, this study provides a rich set of criteria that can be used for evaluating their projects, as well as strong evidence of the importance of considering not only project execution, but also post-project outcomes and impacts in the evaluation.

by João Varajão, António Trigo

Questioning the Criteria for Evaluating Non-cryptographic Hash Functions:
Maybe we need to think more about non-cryptographic hash functions.

Although cryptographic and non-cryptographic hash functions are everywhere, there seems to be a gap in how they are designed. Lots of criteria exist for cryptographic hashes motivated by various security requirements, but on the non-cryptographic side there is a certain amount of folklore that, despite the long history of hash functions, has not been fully explored. While targeting a uniform distribution makes a lot of sense for real-world datasets, it can be a challenge when confronted by a dataset with particular patterns.

by Catherine Hayes, David Malone

Program Merge: What's Deep Learning Got to Do with It?:
A discussion with Shuvendu Lahiri, Alexey Svyatkovskiy, Christian Bird, Erik Meijer and Terry Coatta

If you regularly work with open-source code or produce software for a large organization, you're already familiar with many of the challenges posed by collaborative programming at scale. Some of the most vexing of these tend to surface as a consequence of the many independent alterations inevitably made to code, which, unsurprisingly, can lead to updates that don't synchronize. Difficult merges are nothing new, of course, but the scale of the problem has gotten much worse. This is what led a group of researchers at MSR (Microsoft Research) to take on the task of complicated merges as a grand program-repair challenge, one they believed might be addressed at least in part by machine learning.

by Shuvendu Lahiri, Alexey Svyatkovskiy, Christian Bird, Erik Meijer, Terry Coatta

Deterministic Record-and-Replay:
Zeroing in only on the nondeterministic actions of the process

This column describes three recent research advances related to deterministic record-and-replay, with the goal of showing both classical use cases and emerging use cases. A growing number of systems use a weaker form of deterministic record-and-replay. Essentially, these systems exploit the determinism that exists across many program executions but intentionally allow some nondeterminism for performance reasons. This trend is exemplified in GPUReplay in particular, but also in systems such as ShortCut and Dora.

by Peter Alvaro, Andrew Quinn

Unwanted Surprises:
When that joke of an API is on you

There is the higher-order question of whether loosely typed languages with coercion are really a good idea in the first place. If you don't know what you're operating on, or what the expected output range might be, then maybe you ought not to be operating on that data in the first place. But now these languages have gotten into the wild and we'll never be able to hunt them down and kill them soon enough for my liking, or for the greater good.

by George V. Neville-Neil

Test Accounts: A Hidden Risk:
You may decide the risks are acceptable. But, if not, here are some rules for avoiding them.

A test account that's shared among many can be used by anyone who happens to have the password. This leaves a trail of poorly managed or unmanaged accounts that only increases your attack surface. A test account could be a treasure trove of information, even revealing information about internal system details. If you really need to take this approach, give your developers their own test accounts and then educate them about the risks of misusing these accounts. Also, if you can periodically expire these accounts, all the better.

by Phil Vachon