About
Orange Words is a playground where I tinker with the combination of hacker news data, search, rag, and machine learning.
Data
The data includes a backfill of all past items from hacker news, as well as an ongoing sync of the latest items (every 30 minutes or so). This includes updating items after their edit window has closed. The processing also includes enriching the data, to make it more searchable or to support interesting features I want to develop.
The data is sourced from the Hacker News API:
Search Engine
The underlying search and data engine is a single node instance of Vespa, which runs on a robust Intel NUC server.
Machine Learning
The language models used for the retrieval augmented generation (RAG) include a handful of interesting "open" models served by together.ai, as well as various gpt models from openai.
Web Stack
The web stack is composed of python, flask, tailwind, htmx, and _hyperscript.
Who
My name is cody, please feel free to reach out!
- web: codycollier.com
- email: cmcollier@gmail.com
- x/twitter: @cmcollier
- linkedin: codycollier
Change Log
Summer 2024:
- Improved ingestion code efficiency
- Added metadata for sync tracking
Spring 2024:
- Improved the search for RAG based chat
- Added support for Llama 3 (via Together)
- Adjusted model options and config
- Misc query and latest item adjustments
Winter 2023-2024:
- Added support for session model switching
- Added support for models from Together.ai
- Added RAG based multi-turn chat
- Added RAG based Q&A with lexical search
- Added support for models from OpenAI
- Initial launch of public site with search