Ashish Patel 🇮🇳’s Post

3y Edited

Day-51 Computer Vision Learning MR-CNN & S-CNN — Multi-Region & Semantic-aware CNNs (Object Detection) by Université Paris-Est Follow me for similar post : 🇮🇳 Ashish Patel 🇮🇳 Interesting Facts : 🔸 It is published in 2015 #ICCV, which has already got over 233 citations. 🔸 Using Multi-Region Features and Semantic Segmentation Features for Object Detection ------------------------------------------------------------------- 𝗔𝗺𝗮𝘇𝗶𝗻𝗴 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 : https://lnkd.in/eumiZyz official Code : https://bit.ly/2No4t2g Notes : https://lnkd.in/eppd-S4 ------------------------------------------------------------------- 𝗜𝗠𝗣𝗢𝗥𝗧𝗔𝗡𝗖𝗘 🔸 Multi-Region CNN (MR-CNN): Object representation using multiple regions to capture several different aspects of an object. 🔸 Segmentation-aware CNN (S-CNN): Semantic segmentation information is also utilized to improve the object detection accuracy. #computervision #artificialintelligence #data

2 Comments

pikk

thank you for sharing

Ashish Patel 🇮🇳

For previous post visit this github : https://github.com/ashishpatel26/365-Days-Computer-Vision-Learning-Linkedin-Post

3 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Ashish Patel 🇮🇳

🔥 6x Linkedln Top Voice | AI Research Scientist & Chief Data Scientist at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 11+ Years in AI | MLOps | IIMA |
1d
Report this post
Zamba: A 7B Mamba-like SSM hybrid model trained for 1T tokens ✦ Zyphra, a leading AI research company, has unveiled Zamba - a novel 7 billion parameter foundation model. ✦ Zamba represents a significant advancement in the field of large language models, showcasing innovative architectural choices that enable impressive performance with a relatively small parameter count. 🎯 Zamba's Key Highlights: 1. Compute-Efficient Architecture: → Zamba's novel hybrid design combines Mamba blocks with a global shared attention layer. → This architecture is more compute-efficient during both training and inference compared to vanilla transformer models. → It demonstrates the scalability and performance capabilities of Sparse Transformer (SSM) models. 2. Impressive Performance: → Zamba approaches the performance of state-of-the-art models like Mistral and Gemma, despite being trained on significantly fewer tokens. → It notably outperforms larger models like LLaMA-2 7B and OLMo-7B on a wide range of benchmarks, using less than half the training data. 3. Two-Phase Training Approach: → Zamba was trained in two phases: first on lower-quality web data, followed by an annealing phase on high-quality datasets. → This two-phase approach appears to significantly improve the model's quality. 4. Open-Source Commitment: → Zyphra is releasing all Zamba checkpoints open-source under the Apache 2.0 license. → This level of transparency is crucial for advancing the understanding of large-scale language models and enabling further innovations. 5. Efficient Development: → Zamba was developed by a small team of 7 researchers in just 30 days, using 128 NVIDIA H100 GPUs. → This demonstrates that highly capable and efficient models can be created without the need for massive teams and computational resources. 🧰 Implications and Impact: → Zamba represents a major step towards developing compact, parameter and inference-efficient models that can outperform larger, more resource-intensive models. → The open-source availability of Zamba's checkpoints and architectural details will enable researchers and developers to dive deep into the model, explore its unique characteristics, and contribute to further advancements in the field. - Zamba's efficiency and accessibility have the potential to democratize advanced AI capabilities, bringing cutting-edge language technologies within reach of a wider audience. Zamba is a groundbreaking development in the world of large language models, showcasing how innovative architectural choices and a commitment to open science can lead to impressive performance and efficiency. As the AI community explores and builds upon Zamba's unique capabilities, we can expect to see further advancements in the quest for compact, high-performing, and accessible AI systems. P.S. What do you think is this game changing ?
4 Comments
Like Comment
To view or add a comment, sign in
Ashish Patel 🇮🇳

🔥 6x Linkedln Top Voice | AI Research Scientist & Chief Data Scientist at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 11+ Years in AI | MLOps | IIMA |
2d
Report this post
Do you know How many types of Attention Mechanism is there? So let's first see What makes attention so transformative? At its core, attention allows language models to hone in on the most relevant information when processing sequences of text. It's akin to a laser-focused spotlight, guiding the model to the key elements that matter most for the task at hand. This targeted focus pays dividends across a wide range of language tasks. Attention mechanisms empower models to: ✨ Capture long-range dependencies that were previously elusive, enabling deeper contextual understanding. ✨ Process multiple parts of the input sequence simultaneously, supercharging the speed and efficiency of learning. ✨ Resolve ambiguities and nuances by analyzing the surrounding context, leading to more accurate and natural language generation. ✨ Seamlessly integrate information from diverse sources, facilitating cross-modal tasks like image captioning. Now you can see below (Big list): 🌴 Taxonomy of attention mechanisms: 🎄 Feature-Related Attention Mechanisms 1. Multiplicity - Singular Features Attention - Coarse-Grained Co-Attention (Alternating Co-Attention, Interactive Co-Attention) - Fine-Grained Co-Attention - Multi-Grained Co-Attention - Rotatory Attention 2. Levels - Single-Level Attention - Attention-via-Attention - Hierarchical Attention 3. Representational Attention - Single-Representational Attention - Multi-Representational Attention 🎄 General Attention Mechanisms 1. Scoring - Additive Scoring - Multiplicative Scoring - Scaled Multiplicative Scoring - General Scoring - Biased General Scoring - Activated General Scoring - Similarity Scoring 2. Alignment - Global/Soft Alignment - Hard Alignment - Local Alignment - Reinforced Alignment 3. Dimensionality - Single-Dimensional Attention - Multi-Dimensional Attention 🎄 Query-Related Attention Mechanisms 1. Type - Basic Queries - Specialized Queries - Self-Attentive Queries 2. Multiplicity - Singular Query Attention - Multi-Head Attention - Multi-Hop Attention - Capsule-Based Attention Learning Resource : https://lnkd.in/gq95yzPQ P.S Attention mechanisms offer a versatile taxonomy of focused learning tools to empower LLM models across diverse applications.

5 Comments
Like Comment
To view or add a comment, sign in
Ashish Patel 🇮🇳

🔥 6x Linkedln Top Voice | AI Research Scientist & Chief Data Scientist at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 11+ Years in AI | MLOps | IIMA |
3d
Report this post
Beyond benchmarks: How a new study is turning LLMs evaluation on its head with compression efficiency ✦ The Challenge: Accurately evaluating the intelligence and capabilities of large language models (LLMs) remains a significant challenge. Existing benchmark suites can be prone to overfitting and dataset contamination, limiting their reliability as comprehensive measures of model performance. 🕹️ The Solution: Infini-Attention The research paper by Huang et al. suggests a promising solution - leveraging compression efficiency as a proxy for intelligence in LLMs. The key insight is that a model's ability to compress external text corpora is strongly correlated with its performance on a wide range of intelligence-related tasks. 💡 Key Features: ✱ Compression Efficiency as an Intelligent Metric: The study found a near-linear correlation (Pearson coefficient ~-0.95) between an LLM's compression efficiency (measured in bits per character) and its scores on benchmarks covering knowledge, commonsense, coding, and mathematical reasoning. ✱ Robustness Across Models: This compression-intelligence relationship held true across a diverse set of 30 public LLMs, regardless of their size, architecture, or training data. ✱ Unsupervised Evaluation Approach: Measuring compression efficiency is an unsupervised technique that can overcome the limitations of supervised benchmark suites, making it a promising alternative for LLM evaluation. 🧰 Benefits: → Reliable Intelligence Estimation: By leveraging compression as a proxy, organizations can more accurately assess the true intelligence and capabilities of their LLMs, beyond what traditional benchmarks can provide. → Efficient Development Cycles: Using compression efficiency as a metric can streamline the model development and tuning process, as it offers a direct, unsupervised signal of the model's overall performance. → Reduced Benchmark Contamination: The unsupervised nature of compression-based evaluation mitigates the risk of dataset contamination, ensuring a more objective assessment of model intelligence. 🔥 Implications and Future Directions: ❇️ Widespread Adoption in LLM Evaluation: The findings of this research could lead to the widespread adoption of compression efficiency as a standard metric for measuring and benchmarking the intelligence of LLMs. ❇️ Expansion to Fine-Tuned Models: Future research could explore how the compression-intelligence relationship holds up for fine-tuned LLMs, further expanding the applicability of this approach. ❇️ Exploration of Cross-Domain Compression: Investigating the impact of compressing diverse datasets on a model's cross-domain abilities could provide an even more holistic view of its intelligence. ❇️ Minimum Corpus Size for Reliable BPC: Determining the minimum corpus size required for reliable bits per character (BPC) computation could help establish guidelines for effective compression-based evaluation. P.S. What do you think about this?

1 Comment
Like Comment
To view or add a comment, sign in
Ashish Patel 🇮🇳

🔥 6x Linkedln Top Voice | AI Research Scientist & Chief Data Scientist at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 11+ Years in AI | MLOps | IIMA |
4d
Report this post
Google gives the wings 🦋 to the Attention - "Infini-Attention" ✦ The Challenge: Transformers have significantly advanced the capabilities of Large Language Models (LLMs), but the quadratic complexity in memory and computation has posed challenges when scaling to longer input sequences. 🕹️ The Solution: Infini-Attention This work introduces a novel attention mechanism called "Infini-attention" that enables Transformers to process infinitely long inputs with bounded memory and computational cost. 💡 Key Features: 1. Compressive Memory: Infini-attention embeds a compressive memory that efficiently encodes long-term context into a compact, fixed-size representation. 2. Efficient Attention: Through clever reuse of attention's key-value pairs, the mechanism supports incremental learning over vast inputs without linearly increasing memory requirements. 3. Recurrent Attention Layers: By updating the associative memory matrix incrementally, Infini-attention facilitates a recurrence mechanism within each attention layer, allowing the model to retain a coherent understanding of extended contexts. 🧰 Benefits: → Superior Performance on Long-Context Tasks: The model demonstrated state-of-the-art results on challenging benchmarks, including: → Long-context language modeling (PG19 and Arxiv-math) → Passkey context block retrieval (up to 1M tokens) → Book summarization (500K length) → Significant Efficiency Improvements: Infini-Transformers outperform existing segment-level memory models in terms of memory footprint and effective context length. → Unparalleled Data Compression: The model can compress information more than 100x compared to Memory Transformers, without a loss in modeling quality. 🔥 Implications and Future Directions: ❇️ The development of Infini-attention presents a significant advancement in the efficiency and applicability of Transformer-based LLMs, opening new avenues for research and application in areas requiring extensive contextual understanding. ❇️ Future work can explore extending this framework to other domains, improving memory compression techniques, and optimizing the architecture for larger datasets and more complex tasks. P.S. What do you think about this let me know in the comment.

5 Comments
Like Comment
To view or add a comment, sign in

View Profile Follow

Ashish Patel 🇮🇳’s Post

More from this author

The Art of Training LLMs: Navigating the Toolkit Beyond Rewards for LLMs

Exploring Mixtral 8x7B: Deep Dive into its Architectural Wonders

Discover the World of Graph Analytics: A Python Guide to Graph Data Modeling

Explore topics