Embedding Vector Distance

Cortensor leverages Embedding Vector Distance as a core mechanism in its Proof of Inference (PoI) validation process. This approach ensures consistency, reliability, and trustworthiness of AI inference outputs across decentralized nodes, forming the foundation of quality assurance in the Cortensor network.


What is Embedding Vector Distance?

Embedding vector distance refers to the mathematical measurement of similarity between two or more output vectors produced by AI models. When identical inputs are processed by the same model across multiple nodes, their outputs—despite slight variations due to computational or environmental factors—should exhibit high similarity. This similarity is quantified using techniques like cosine similarity or Euclidean distance, providing a robust method to evaluate output consistency.


How It Works in Cortensor

  1. Input Distribution: A predefined input (Input A) is sent to multiple miner nodes running identical AI models (e.g., LLaMA or LLaVA). Each node processes this input independently.

  2. Output Generation: Each node produces an inference output based on the input and model. Due to decentralized computation and hardware variability, the outputs may slightly differ in form or structure.

  3. Embedding Creation: The inference outputs are transformed into embedding vectors, which capture their semantic meaning in a numerical format. These embeddings are model-agnostic representations of the output content.

  4. Similarity Measurement: Using embedding distance techniques, the system measures the similarity between the embeddings generated by different nodes. For example:

    • Cosine Similarity: Measures the angle between two vectors, where a value close to 1 indicates high similarity.

    • Euclidean Distance: Measures the straight-line distance between two vectors, where smaller values indicate higher similarity.

  5. Reliability Check: Nodes whose outputs deviate beyond a predefined threshold (e.g., a cosine similarity score below 0.85) may be flagged for inconsistent or unreliable performance.


Why Embedding Vector Distance is Essential

  • Ensures Consistency: By running identical inputs across multiple nodes and measuring the similarity of outputs, Cortensor ensures consistent task execution.

  • Validates Reliability: Embedding distance acts as a safeguard against dishonest nodes or those failing to perform properly.

  • Enhances Decentralized Trust: In a decentralized network, embedding distance provides a quantifiable and verifiable measure of output reliability, fostering trust between miners and users.

  • Facilitates Redundancy: By producing multiple inference variants and validating them against one another, the network can select the most reliable outputs or offer users multiple valid results.


Role in Proof of Inference (PoI)

Embedding vector distance forms the backbone of Cortensor's PoI validation process:

  • Redundancy and Validation: Nodes are tasked with the same inference job. The outputs are compared to ensure they align within acceptable similarity thresholds.

  • Consensus Formation: A majority agreement based on embedding similarity establishes a validated inference result.

  • Cheat Detection: Nodes producing significantly different embeddings are flagged, ensuring the network maintains high reliability and accountability.


Use Cases

  1. Multi-Node AI Inference: When users request AI tasks, multiple miners process the same input to ensure consistency. Embedding vector distance helps validate the outputs.

  2. Synthetic Data Validation: For synthetic data generation, embedding similarity ensures that the produced data aligns with the expected characteristics of the input model.

  3. Quality Assurance in Training: Developers can use embedding distances to evaluate the performance and robustness of models deployed across diverse hardware in the network.


Future Enhancements

Embedding vector distance will evolve as Cortensor introduces Node Reputation Systems:

  • Time-Series Analysis: Track node reliability over time using embedding metrics.

  • Dynamic Thresholds: Adjust similarity thresholds dynamically based on the complexity of tasks or user-defined accuracy preferences.

  • Cross-Model Validation: Expand embedding comparisons across different but compatible models to support heterogeneous miner nodes.


Technical Illustration

Let’s consider an example:

  • Input: "What is the capital of France?"

  • Nodes: 5 miners running the same LLaMA model.

  • Outputs:

    • Node 1: "Paris is the capital of France."

    • Node 2: "The capital of France is Paris."

    • Node 3: "Paris is France's capital."

    • Node 4: "The capital of Paris is France." (anomaly)

    • Node 5: "France's capital city is Paris."

  • Embeddings:

    • Node 1: [0.23, 0.11, 0.87, ...]

    • Node 2: [0.24, 0.10, 0.86, ...]

    • Node 3: [0.22, 0.12, 0.88, ...]

    • Node 4: [0.45, 0.32, 0.74, ...] (outlier)

    • Node 5: [0.21, 0.10, 0.89, ...]

  • Similarity Scores:

    • Nodes 1, 2, 3, 5: High similarity (~0.95 cosine similarity)

    • Node 4: Low similarity (~0.65 cosine similarity)

Result: Nodes 1, 2, 3, and 5 are validated. Node 4 is flagged as an outlier.


Conclusion

Embedding vector distance ensures Cortensor can maintain high-quality, reliable AI inference results in an untrusted decentralized environment. By integrating this technique into PoI validation, Cortensor sets a standard for trust and accountability in decentralized AI.

Last updated