a b o u t
d a t a s e t
r a t i o n a l e
a n a l y s e s

The site of analysis for this project is the geography of the semantic embedding space itself. You are looking at a compressed model of this space, populated with 700 datapoints written in the English language, self-authored.

Processing language and meaning in this way creates a high-dimensional, difficult-to-wrangle topography. So, for the purpose of human understanding, I have distilled the infinite-seeming geography of 1536 dimensions into a three-dimensional realm for you to navigate freely. You will see three clusters of meaning orbiting this space, having arranged themselves according to the following local gravities: a) the human experience; b) earth's ecologies and biomes, and c) earth's modern nations.

The purpose of such a visualisation, and of the following analyses, is to illustrate mathematically the potential cultural biases caught in the processing of such a model, and to observe the nuances of the mass authorless opinion that emerges from the synthesis of its fallible human training data.

- Embeddings generated using OpenAI text-embedding-3-small model.

- Semantic proximity calculated using cosine similarity of original 1536-dimension embedding vectors (aka prior to dimensionality reduction). (important obviously).

- 3D positions produced via UMAP dimensionality reduction.


How to interact:

Long-click and drag to pan embedding space visualisation.
Click once to pause/resume animation.
Hover over any sprite to see the data point it represents.
"Similar" data points in the other semantic clusters, as interpreted by the text-embedding model, will also be revealed.

This project has not been designed for mobile use. Please view on desktop for full functionality.

sample text.

sample text.