AISpace

Title	Summary	Topic	Date
Advances and Challenges in Foundation Agents	This comprehensive survey examines the advancements and challenges in developing intelligent agents powered by large language models (LLMs). It proposes a brain-inspired architecture for these agents, detailing core cognitive, perceptual, and action modules, including memory, world modeling, reward processing, and emotion-like systems. The paper also discusses how these agents self-evolve through optimization techniques and explores the complexities of multi-agent systems, focusing on collaboration and emergent collective intelligence. Finally, it addresses the critical need for safety and beneficial AI, outlining intrinsic and extrinsic threats and strategies for building trustworthy systems.	AI Agents	N/A
Attention is All You Need	The provided document introduces the Transformer, a novel neural network architecture for sequence transduction tasks like machine translation. This model uniquely relies entirely on attention mechanisms, discarding traditional recurrent and convolutional layers. Experiments demonstrate the Transformer achieves superior translation quality with significantly improved parallelization and reduced training time compared to existing state-of-the-art models. The authors also show the Transformer's effectiveness on English constituency parsing, indicating its broader applicability. Furthermore, the paper analyzes the computational advantages of self-attention and provides visualizations suggesting the learned attention mechanisms capture linguistic relationships.	Language Models	Jun 2017
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts	This academic paper introduces AUTOPROMPT, a novel method for automatically generating prompts to evaluate the knowledge stored within pretrained language models (LMs), particularly masked language models (MLMs). Instead of manually crafting "fill-in-the-blank" style questions, AUTOPROMPT uses a gradient-based search to find effective prompts and associated label tokens. Experiments demonstrate that AUTOPROMPT helps reveal MLMs' inherent capabilities in tasks like sentiment analysis, natural language inference, and fact retrieval, often outperforming methods that rely on human-designed prompts. The paper also suggests that automatically generated prompts can be a parameter-free alternative to finetuning, particularly in situations with limited training data.	Language Models	Oct 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	This text introduces and explains BERT (Bidirectional Encoder Representations from Transformers), a novel language representation model designed for pre-training deep bidirectional representations from vast amounts of unlabeled text. Unlike prior approaches that were unidirectional, BERT jointly considers both left and right context across all layers during pre-training, enabling it to be effectively fine-tuned with minimal architectural changes for a wide range of natural language processing (NLP) tasks, including question answering and language inference. The paper highlights two unsupervised pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP), which contribute to BERT's strong performance. Extensive experiments are presented demonstrating BERT's state-of-the-art results across numerous NLP benchmarks like GLUE and SQuAD, showcasing its conceptual simplicity and empirical power. The source material also discusses the impact of pre-training tasks, model size, and compares BERT to other prominent representation learning methods like ELMo and OpenAI GPT.	Language Models	May 2019
Chain-of-Thought Promptings Elicits Reasoning in Large Language Models	This source introduces chain-of-thought prompting, a method that significantly enhances the reasoning capabilities of large language models (LLMs). By providing LLMs with a few examples that include intermediate reasoning steps (a "chain of thought"), these models become more adept at tackling complex tasks across arithmetic, commonsense, and symbolic reasoning. This technique demonstrates that the ability to perform multi-step reasoning is an emergent property of sufficiently large LLMs, and unlike traditional methods, it doesn't require fine-tuning the models. The paper shows striking performance improvements, particularly for larger models and more challenging problems, facilitating generalization to scenarios outside the training data.	Language Models	Jan 2023
Citibank - Agentic AI: Finance and the "Do It For Me" Economy	This Citi GPS (Global Perspectives & Solutions) report, titled "Agentic AI: Finance & the ‘Do It For Me’ Economy," examines the increasing impact of agentic AI – artificial intelligence capable of autonomous decisions and actions – particularly within the financial services sector. It highlights the expected significant growth of agentic AI in 2025, driven by technological advancements and substantial venture capital funding, and contrasts it with traditional AI and Generative AI. The report explores various use cases for agentic AI in finance, including compliance, fraud detection, onboarding, wealth management, and corporate treasury, while also addressing associated risks, such as fraud and cybersecurity, and the need for robust governance frameworks. It also considers the potential for agentic AI to reshape workforces and discusses the role of technologies like crypto and blockchain in this evolving landscape.	AI Agents	Jan 2025
Conformer: Convolution-augmented Transformer for Speech Recognition	This source introduces Conformer, a novel architecture for Automatic Speech Recognition (ASR) that combines the strengths of Transformer models and Convolutional Neural Networks (CNNs). The authors propose that Transformers are effective at capturing global dependencies, while CNNs excel at exploiting local features. Conformer integrates these two types of neural networks to model both local and global information in audio sequences efficiently, achieving state-of-the-art accuracies on the LibriSpeech benchmark, significantly outperforming previous models. The paper details the architecture of Conformer blocks, which include feed-forward modules, multi-headed self-attention modules, and convolution modules, arranged in a "macaron-like" sandwich structure, and presents ablation studies highlighting the contribution of each component to the improved performance.	Speech Recognition	Nov 2020
Databricks - The Big Book of GenAI	This source from Databricks, titled "The Big Book of Generative AI," outlines the process of building production-quality GenAI applications, emphasizing that this requires evolving data infrastructure to support these advanced technologies effectively and securely. The document describes a multi-stage path to deploying GenAI applications, starting with foundation models and progressing through prompt engineering, retrieval augmented generation (RAG), fine-tuning foundation models, and pretraining from scratch. It provides practical use cases and technical details for each stage, including an introduction to Databricks' state-of-the-art open LLM, DBRX, and discusses the importance of LLM evaluation for continuous monitoring and improvement. Ultimately, the source highlights Databricks' platform and tools as crucial for navigating these stages and achieving high-quality, cost-efficient GenAI deployments.	Generative AI	Mar 2024
Nature - Deep Learning	This review paper provides a comprehensive overview of deep learning, a subfield of machine learning that has revolutionized various domains like image and speech recognition. The authors explain how deep learning utilizes multilayer neural networks to automatically learn hierarchical representations of data, overcoming the limitations of traditional machine learning methods that rely on hand-engineered features. Key techniques like backpropagation for training and specific architectures such as convolutional neural networks (ConvNets) for image processing and recurrent neural networks (RNNs) for sequential data are discussed in detail. The paper highlights the significant breakthroughs and current applications of deep learning and speculates on its future potential, particularly in areas like unsupervised learning and natural language understanding.	Machine Learning	May 2015
Deep Reinforcement Learning from Human Preferences	This document presents research into deep reinforcement learning (RL) systems that learn from human preferences instead of requiring a predefined reward function. The authors propose a method where a reward function is learned by having humans compare pairs of short trajectory segments, and an RL agent then optimizes its behavior based on this learned reward. They demonstrate that this approach allows RL agents to learn complex tasks in environments like Atari games and simulated robotics, even without an explicit reward signal and using a limited amount of human feedback. This method successfully scales to intricate behaviors not easily defined by traditional reward functions, suggesting a practical way to align advanced RL systems with complex human goals.	Machine Learning	Jun 2017

Page 1 of 5 (48 resources found)