AI
Title
Summary
Topic
Date
Actions
Advances and Challenges in Foundation Agents

This comprehensive survey examines the advancements and challenges in developing intelligent agents powered by large language models (LLMs). It proposes a brain-inspired architecture for these agents, detailing core cognitive, perceptual, and action modules, including memory, world modeling, reward processing, and emotion-like systems. The paper also discusses how these agents self-evolve through optimization techniques and explores the complexities of multi-agent systems, focusing on collaboration and emergent collective intelligence. Finally, it addresses the critical need for safety and beneficial AI, outlining intrinsic and extrinsic threats and strategies for building trustworthy systems.

AI AgentsN/A
Attention is All You Need

The provided document introduces the Transformer, a novel neural network architecture for sequence transduction tasks like machine translation. This model uniquely relies entirely on attention mechanisms, discarding traditional recurrent and convolutional layers. Experiments demonstrate the Transformer achieves superior translation quality with significantly improved parallelization and reduced training time compared to existing state-of-the-art models. The authors also show the Transformer's effectiveness on English constituency parsing, indicating its broader applicability. Furthermore, the paper analyzes the computational advantages of self-attention and provides visualizations suggesting the learned attention mechanisms capture linguistic relationships.

Language ModelsJun 2017
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

This academic paper introduces AUTOPROMPT, a novel method for automatically generating prompts to evaluate the knowledge stored within pretrained language models (LMs), particularly masked language models (MLMs). Instead of manually crafting "fill-in-the-blank" style questions, AUTOPROMPT uses a gradient-based search to find effective prompts and associated label tokens. Experiments demonstrate that AUTOPROMPT helps reveal MLMs' inherent capabilities in tasks like sentiment analysis, natural language inference, and fact retrieval, often outperforming methods that rely on human-designed prompts. The paper also suggests that automatically generated prompts can be a parameter-free alternative to finetuning, particularly in situations with limited training data.

Language ModelsOct 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

This text introduces and explains BERT (Bidirectional Encoder Representations from Transformers), a novel language representation model designed for pre-training deep bidirectional representations from vast amounts of unlabeled text. Unlike prior approaches that were unidirectional, BERT jointly considers both left and right context across all layers during pre-training, enabling it to be effectively fine-tuned with minimal architectural changes for a wide range of natural language processing (NLP) tasks, including question answering and language inference. The paper highlights two unsupervised pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP), which contribute to BERT's strong performance. Extensive experiments are presented demonstrating BERT's state-of-the-art results across numerous NLP benchmarks like GLUE and SQuAD, showcasing its conceptual simplicity and empirical power. The source material also discusses the impact of pre-training tasks, model size, and compares BERT to other prominent representation learning methods like ELMo and OpenAI GPT.

Language ModelsMay 2019
Chain-of-Thought Promptings Elicits Reasoning in Large Language Models

This source introduces chain-of-thought prompting, a method that significantly enhances the reasoning capabilities of large language models (LLMs). By providing LLMs with a few examples that include intermediate reasoning steps (a "chain of thought"), these models become more adept at tackling complex tasks across arithmetic, commonsense, and symbolic reasoning. This technique demonstrates that the ability to perform multi-step reasoning is an emergent property of sufficiently large LLMs, and unlike traditional methods, it doesn't require fine-tuning the models. The paper shows striking performance improvements, particularly for larger models and more challenging problems, facilitating generalization to scenarios outside the training data.

Language ModelsJan 2023
Citibank - Agentic AI: Finance and the "Do It For Me" Economy

This Citi GPS (Global Perspectives & Solutions) report, titled "Agentic AI: Finance & the ‘Do It For Me’ Economy," examines the increasing impact of agentic AI – artificial intelligence capable of autonomous decisions and actions – particularly within the financial services sector. It highlights the expected significant growth of agentic AI in 2025, driven by technological advancements and substantial venture capital funding, and contrasts it with traditional AI and Generative AI. The report explores various use cases for agentic AI in finance, including compliance, fraud detection, onboarding, wealth management, and corporate treasury, while also addressing associated risks, such as fraud and cybersecurity, and the need for robust governance frameworks. It also considers the potential for agentic AI to reshape workforces and discusses the role of technologies like crypto and blockchain in this evolving landscape.

AI AgentsJan 2025
Conformer: Convolution-augmented Transformer for Speech Recognition

This source introduces Conformer, a novel architecture for Automatic Speech Recognition (ASR) that combines the strengths of Transformer models and Convolutional Neural Networks (CNNs). The authors propose that Transformers are effective at capturing global dependencies, while CNNs excel at exploiting local features. Conformer integrates these two types of neural networks to model both local and global information in audio sequences efficiently, achieving state-of-the-art accuracies on the LibriSpeech benchmark, significantly outperforming previous models. The paper details the architecture of Conformer blocks, which include feed-forward modules, multi-headed self-attention modules, and convolution modules, arranged in a "macaron-like" sandwich structure, and presents ablation studies highlighting the contribution of each component to the improved performance.

Speech RecognitionNov 2020
Databricks - The Big Book of GenAI

This source from Databricks, titled "The Big Book of Generative AI," outlines the process of building production-quality GenAI applications, emphasizing that this requires evolving data infrastructure to support these advanced technologies effectively and securely. The document describes a multi-stage path to deploying GenAI applications, starting with foundation models and progressing through prompt engineering, retrieval augmented generation (RAG), fine-tuning foundation models, and pretraining from scratch. It provides practical use cases and technical details for each stage, including an introduction to Databricks' state-of-the-art open LLM, DBRX, and discusses the importance of LLM evaluation for continuous monitoring and improvement. Ultimately, the source highlights Databricks' platform and tools as crucial for navigating these stages and achieving high-quality, cost-efficient GenAI deployments.

Generative AIMar 2024
Nature - Deep Learning

This review paper provides a comprehensive overview of deep learning, a subfield of machine learning that has revolutionized various domains like image and speech recognition. The authors explain how deep learning utilizes multilayer neural networks to automatically learn hierarchical representations of data, overcoming the limitations of traditional machine learning methods that rely on hand-engineered features. Key techniques like backpropagation for training and specific architectures such as convolutional neural networks (ConvNets) for image processing and recurrent neural networks (RNNs) for sequential data are discussed in detail. The paper highlights the significant breakthroughs and current applications of deep learning and speculates on its future potential, particularly in areas like unsupervised learning and natural language understanding.

Machine LearningMay 2015
Deep Reinforcement Learning from Human Preferences

This document presents research into deep reinforcement learning (RL) systems that learn from human preferences instead of requiring a predefined reward function. The authors propose a method where a reward function is learned by having humans compare pairs of short trajectory segments, and an RL agent then optimizes its behavior based on this learned reward. They demonstrate that this approach allows RL agents to learn complex tasks in environments like Atari games and simulated robotics, even without an explicit reward signal and using a limited amount of human feedback. This method successfully scales to intricate behaviors not easily defined by traditional reward functions, suggesting a practical way to align advanced RL systems with complex human goals.

Machine LearningJun 2017
Page 1 of 5 (48 resources found)