About Orange Words

About

Orange Words is a playground where I tinker with the combination of hacker news data, search, rag, and machine learning.

Data

The data includes a backfill of all past items from hacker news, as well as an ongoing sync of the latest items (every 30 minutes or so). This includes updating items after their edit window has closed. The processing also includes enriching the data, to make it more searchable or to support interesting features I want to develop.

The data is sourced from the Hacker News API:

Search Engine

The underlying search and data engine is a single node instance of Vespa, which runs on a robust Intel NUC server.

Machine Learning

The language models used for the retrieval augmented generation (RAG) include a handful of interesting "open" models served by together.ai, as well as various gpt models from openai.

Web Stack

The web stack is composed of python, flask, tailwind, htmx, and _hyperscript.

Who

My name is cody, please feel free to reach out!

 

Change Log

Summer 2024:

  • Improved ingestion code efficiency
  • Added metadata for sync tracking

Spring 2024:

  • Improved the search for RAG based chat
  • Added support for Llama 3 (via Together)
  • Adjusted model options and config
  • Misc query and latest item adjustments

Winter 2023-2024:

  • Added support for session model switching
  • Added support for models from Together.ai
  • Added RAG based multi-turn chat
  • Added RAG based Q&A with lexical search
  • Added support for models from OpenAI
  • Initial launch of public site with search