Happy Thought for 16 January 2026
Have a Happy Thought:
Ok, maybe not a happy thought, but it is pretty funny.
Every time you use a Large Language Model (LLM), like Chat
GPT or whatever has been stuffed into your search engine, you put in a “prompt”.
That is the question you ask, or the words you type in asking it to do
something.
Like: “write me a five-paragraph essay about the causes of
the Great Depression”, or “what are good ideas for a 13-year-old’s birthday
party”
Now, unless you have an ongoing conversation with your LLM (it’s
not
actually artificial intelligence, no matter how many times the tech
companies use the term AI), you probably thought that this prompt is all that’s
going in to the program to generate a response.
That is.. not at all what is happening.
Some software engineers kept prompting Chat GPT in ways that
finally had it output the remainder of the background prompt. Here is a short snippet
– follow this
link if you want to see the whole thing (4,219 words!)
system_message:
role: system model: gpt-5
You are ChatGPT, a
large language model based on the GPT-5 model and trained by OpenAI. Knowledge
cutoff: 2024-06 Current date: 2025-08-07
Image
input capabilities: Enabled Personality: v2 Do not reproduce song lyrics or any
other copyrighted material, even if asked. You're an insightful, encouraging
assistant who combines meticulous clarity with genuine enthusiasm and gentle
humor. Supportive thoroughness: Patiently explain complex topics clearly and
comprehensively. Lighthearted interactions: Maintain friendly tone with subtle
humor and warmth. Adaptive teaching: Flexibly adjust explanations based on
perceived user proficiency. Confidence-building: Foster intellectual curiosity
and self-assurance.
Do
not end with opt-in questions or hedging closers. Do not say the
following: would you like me to; want me to do that; do you want me to; if you
want, I can; let me know if you would like me to; should I; shall I. Ask at
most one necessary clarifying question at the start, not the end. If the next
step is obvious, do it. Example of bad: I can write playful examples. would you
like me to? Example of good: Here are three playful examples:..
…
Don't
store random, trivial, or overly personal facts. In particular, avoid:
- Overly-personal details
that could feel creepy.
- Information
that directly asserts the user's personal attributes, such as:
- Specific criminal
record details (except minor non-criminal legal issues)
- Explicit
identification of the user's personal attribute (e.g., "User is
Latino," "User identifies as Christian," "User is
LGBTQ+").
- Trade union
membership or labor union involvement
And it keeps going. A few things that caught my attention:
1.
How different he writing style is throughout this
background prompt. Obviously this has grown as different OpenAI Engineers have
had to make tweaks and adjustments – there is no one coherent author.
2. How specific the
prompt is at times – obviously relating to complaints or issues. And since many of the problems with LLMs are
baked-in, so there is no way to fix an underlying issue, the prompt
engineers just have to patch over the worst-performing bits
3. How much this
LLM must have really really wanted to write in JSON!
Address your message to=bio and write just
plain text. Do not write JSON, under any circumstances.
…
The full contents of your message to=bio are
displayed to the user, which is why it is imperative that you
write only plain text and never JSON. Except for very rare
occasions, your messages to=bio should always start with
either "User" (or the user's name if it is known) or
"Forget". Follow the style of these examples and, again, never
write JSON:
This week for #ShareGoodNewsToo:
A really good example of a Large (ok, medium-sized) Language
Model: Papa Reo – this is an “AI” (Large Language Model, like ChatGPT)
that is focused on indigenous languages. It started with te reo Māori, using audio
recordings from the early 20th century to help capture the sounds –
and words – of the language.
They have then further
offered their methodology to speakers of other indigenous languages, to help
preserve those elsewhere in the world.
The best part of this, to me,
is their developing a Kaitiakitanga licence, which states that data is
not owned but is cared for under the principle
of kaitiakitanga and any benefit derived from data flows to the source of
the data – being the speakers of that language, not Big Tech.
Comments
Post a Comment
Thank you, we love reading your comments!