What is RAG & how it process ?
What is a traditional RAG? What is a vector-based RAG? What are the problems associated with vector-based RAG?
And how does this new concept—known as Vectorless RAG (or Page Indexing)—effectively solve a significant problem? Furthermore, how does it utilize reasoning models to enhance document retrieval? So, with that, let's dive into the context.
Alright, so before jumping into Vectorless RAG, let's first understand what a traditional RAG is—and even before that, let's define what RAG stands for. RAG basically stands for "Retrieval Augmented Generation." The problem statement here is quite simple. For the moment, let's set Vectorless RAG aside—let's not worry about it just yet. Let's assume that within a traditional application, you have a large collection of documents. For instance, I'll take a PDF file here; let's say this represents one of my PDF files. You might have numerous PDF files, or perhaps a single PDF file containing many pages.
Now, a user wants to perform a Q&A session based on this content using AI. So, essentially, I have these pages—it could be three pages, or it could be three hundred pages—and I need to facilitate a Q&A interaction over them. The simplest, most naive solution would be this: let's consider what the user does. The user will provide you with a query—right? The user is going to give you a specific question (which is also referred to as a "prompt"). So, let's assume this is my query. What you can then do is feed this into an LLM model—you can use any Large Language Model, whether it's GPT, OpenAI, Anthropic, or any other model.
Let's assume this represents my model. Basically, what you can do is simply feed *all* of these documents directly into the model. You take all your documents—all your content—and include it within the prompt itself; you also include the user's query within that same prompt. The LLM can then process this input and generate a corresponding output. This constitutes one simple, "naive" solution—one that is reliable and will certainly work—wherein you provide the entire content of the documents, supply the user's query, trigger the generation process (make an LLM call), and receive your output. But as you can see in the
diagram, these things don't work that simply. There are a lot of problems with this particular approach. The first problem that comes is that number one it's a very large context because these are particular files, they can be very big files, there can be a lot of content in it and there is a problem that LLMs have limited context window okay so limited context window that means that if you put a lot of content in it, then LLM can fail because the context window is limited, you cannot in just 3000 pages, one or two pages, if you put a lot of content in it, then LLM can fail because the context window is limited.
you cannot in just 3000 pages, you can put one or two pages that's completely okay but even if you have 100 pages in your PDF file there is a high probability that your LLM will fail okay let's assume that after one or two years from today, the context window of the LLM will increase and you can even ingest 3000 pages.
The second problem that comes here is the problem of so much context and LLM will start hallucinating here. Because you have a PDF of 3000 pages, if you ingest all of it in LLM because your LLM is now having a lot of context, So the output that will come, Its quality is not going to be good, because you have too much context, There is no focus on mother, You have completely ingested the entire book in LLM, so the focus is not there, So the answers you will get, it can be generic answers, but not a focused input,
so that's one problem, and third problem is the cost because If the user's query is very simple There was also a query which is just for the page number 5 you every 3000 within LLM call If you ingest the pages so obviously you have to pay a lot of cost for the tokens because In case of LLMs everything is a token and tokens are costly So this is not an efficient solution, right, Why should I give 3000 pages just for a user query? because it is increasing my cost and it is even decreasing my quality of the output,
So this was a problem, which is a known problem within LLMs, and that is where the rag comes into the picture, again my friend, I am not talking about vectors or vectorless, We are just understanding the problem statement simply. So what happened here is you have the traditional rag system. So rag system solves this problem How do I ingest the last documents into LLMs? So inside traditional rag what you do is you have two phases number one phase is known as the indexing phase and number two phase is known as the query phase
so what you have to do is first let's talk about indexing phase, one phase nona indexing phase and number two phase nona indexing phase ok so that's the first indexing phase. Talking about phase, what is indexing phase? Ok let's talk about indexing phase, first key user? Will give you some PDF files PDF and can be Excel can be document files can be any kind of file but now for this we Let's assume PDF file only, because that's the like most common PDF format, right, most common format, what you have to do is, number one, you have to chunk these into many many segments,
What happens is that I can chunk it by using a particular algorithm, So simple chunking can be that I chunk it page by page, if I have 3000 pages, what will I do, I will make 3000 chunks of my PDF file, So first what could be your approach, let me chunk you page by page. which works perfectly fine, second, if you want to make the chunk smaller you can do something known as a paragraph by paragraph You can also do paragraph by paragraph chunking.
so basically what you are doing is you are doing some kind of chunking and splitting of the documents okay most commonly what do you do paragraph You can do by paragraph but in that There is also a problem that a paragraph If it is very big then you can still go out of context right the context window can be reached So what do people usually do? what do they take a fixed window chunking okay this is known as a fixed window chunking so i'm here I will take one size let's say 500 characters of so what I can do is
One for every 500 characters I will chunk it, it takes 500 words not characters let's take 500 words so what I am going to do is I will chunk this entire PDF into 500 words. so what I am going to do is Chunks based on 500-500 words If I make it, let's say this is my chunk 1 this is my chunk 2 this is my chunk 3 and so on So the first part was chunking.
and now what you basically do is you these chunks convert inside vectors using a LLM using an LLM okay so here what you can do is let's say if you are using open AI vectors inside so open AI Special models are made to make You don't use simple models here there are actually vector models so let me just show you vector models okay so vector models if we use open AI so you can see vector embeddings and you can see that you have special models here like text embed 3 small is a model text embed 3 large is a model so you use special different
models for these embeddings so what you can basically do is you take these chunks you call your embeddings model right which will call right which can be your open or your entropic whatever you want to use and then what you basically do is you get some array of numbers because what is vectors at the end of the day there are some array of numbers So you will get some array of numbers returned here.
so that means what i did picked up chunks Gave it to my vector embedding model, I got the embeddings here, so that means what I did, picked up the chunks, gave them to my vector embedding model, I got the embeddings here, just in case you want to understand what are these vector embeddings and all, okay, now basically what you can do is, In embeddings, you have to store somewhere in database,
Now you can use these vector embeddings Not saved inside traditional databases can there are special databases for vector embeddings for example you have pinecone okay what is pinecone db a vector database similarly you have chroma db you have viviate you have milwis you have quadrant there are so many databases which are vector databases So you can find many databases among them, which are vector databases, So you can store inside them, Even Postgres also comes with an extension, known as PG vector, which makes it a vector database,
So what do we do with these vectors, Let's save our vector inside the db, along with the chunk, So that means this was its chunk, These were its vector embeddings, So all the chunks you have made, You will save that many vector embeddings in your database. So this was your first part indexing that's it take the PDF file chunk it up make its vectors and save it in the database Your indexing phase is done Now second phase comes user query When the user wants to chat over his PDF file want to ask something about that
so what happens is that your user will come here, so this user wants to chat over his PDF file, wants to ask something about it, so what happens, That is your user, He will come here, so this user came, And what will the user do? Someone will give you a query, He will say, Friend, I have a query. We made the spelling of the query wrong.
This is my query, please me over it, tell me something, Tell me according to my PDF file, what is that, now the thing is, what you do is, first of all, This is the query, using the same model, you will also create vector embeddings of this query, okay, now we do this you do is first of all this query is using the same model You can also use vector embeddings of this query you will make it ok now let's look here We are not using simple llm here.
we are using a vector embedding model so what you are going to do is you are going to convert the user's query also in the vector embeddings using the same model And what will this do to you some number Will give let's say whatever number comes Something came like 3 2, 5 and 6 What will it do by becoming vector embeddings it will give you some number let's say whatever number came it came something like 3, 2, 5 and 6 something like this came as vector embeddings now what you can do is you can search for similar numbers
in your database Don't you remember this was our pine cone database? so what you basically do is you are going to go into the database and you are going to do a vector similarity search vector similarity search okay, so that means you will tell him Friend, I did not get these numbers. 3, 2, 5, 6 by searching this Bring it, then see it is possible that this particular which is the vector point If he brings it to him, what you can basically do is you can get this chunk, because user The question asked, let's say,
The user said something about the car, the question was asked, And wherever, Inside our PDF, about that car, Must have talked, inside vector embeddings, So you, Relevant chunks will be found, You pass another parameter here, which we call, top underscore k, how many relevant chunks do I need, So I said, This is not top k, make mine 5, Bring me the top 5 relevant chunks, not the entire PDF file, just the chunks, and what is the size of each chunk we had decided, maybe 500 words, maybe you have made one paragraph a chunk, then what will you get here, top 5, 1, 2, 3, 4 and 5 chunks have been found, PDF file could be yours 3000.
pages, but here you What did you do by using the user's query? smartly only relevant chunks Found it, it is called relevant relevant meaning in every chunk The discussion is taking place as per the query of that user. now what you can do is, you can take these chunks, plus The user who originally asked the query said these two things: then you can do a simple open API call which LLM is called this can be a GPT 4.
model, cloud, anthropic whatever you want you can call it man The user has asked this query and now it has relevant chunks. Above you perform one generation you will get some result which you can return back to the user. And this is how your traditional rag system works. Got it? You use vectors in this. Okay. Now let's understand the problem behind this vector rag. It is called vector rag.
Right? Vector rag works fine. It is used a lot, almost every company is using it, and it's a very, like you know, a very traditional and a very old way to do the document raga, But what is the biggest problem in this, the problem is chunking, Because we don't have any solid justification for chunking, okay, see, I'll tell you one thing, let's say you have a paragraph, You have this second paragraph, you have this You have this third paragraph, you have this fourth paragraph, what did you do, blindly an algorithm
I have decided what I will do, I will not chunk above 500 words, Okay, so what would you have done, let's say if this is your entire page, You split it into 500 words, That friend, pick up the first 500 words here, then pick up the next 500, pick up the next 500, then pick up the next 500, then this is a whole page of yours, you have split it into 500 words, that friend, pick up the first 500 words from here, Next 500 then pick it up, Next 500 then pick it up, Next 500 then pick it up, the thing is that,
It is possible, Some information should be in this chunk, And some information should be in this chunk, but because you split it in the middle, your context is lost, your data is lost, right, Let's say, I open any random paragraph in front of you here, your context is lost, your data is lost right, if let's say, I am here Any one random in front of you I'll open the paragraph, okay, by the way There is a paragraph of vectorless melody, for Abe assume, this is a story book, I what did i do in the first chunk
Just kept the data till here, maybe My 500 characters, not just here It's done, what will happen now? you can see clearly, me technically This should also have been taken, only then there would be a complete story frame, but you took a number of 500 characters, the first chunk became just this much, and the second chunk became this much, so what happened, because you were doing chunking on a static number, your context remained in one chunk, and the rest of the context went to the next chunk, so that paragraph could not be completed, that's a one problem, second problem that.
comes is, see, ho could that this whole three paragraphs make one story right, it never happens that this Paragraph's own story and this paragraph's own story is, maybe that this alone makes one story, but If you have paragraph by paragraph chunking If done, then this is one, this is two and this is three and secondly what happened here, third paragraph is very hurtful, second paragraph is very long, So whatever chunking you did, There is no justification behind it, We're chunking blindly, so this is one problem,
How do I do that chunking, Meaning I should make relevant chunks, I should create chunks semantically, not some hard-coded way, that I'll take paragraph by paragraph, or let's take 500 words, not some hard-coded way, That I will take paragraph by paragraph, Or I will take 500 words, this is not a good way to chunk the data, A strike off within context, a cut will come, second problem is, If you would have seen, like for example, If you have seen any legal documents, legal documents, So what happens in legal documents,
usually no, inside it, There are references, There are references inside it, that as per per rule let's say he said it or as per rule append let's say 63.7.4 of of a something like that, okay then He said something before that, now the thing is, you clearly see that, Here is a reference to another page, let's say what was yours, this was yours on page number 4, And then you can have, There can be a page number inside the same PDF, 578, In which this rule is actually mentioned, That's what happens, isn't it? usually this rule is mentioned inside it,
So what will happen now? you can clearly see that you want to read both of the pages, because there is a rule mentioned inside it, so what will happen now? because there is a reference inside this of the page and the actual content inside this page So it is for this generation I want both these pages But this does not happen in chunking.
What will happen in vector embeddings? What is this keyword being used here? will do this will pick this up this may happen that Don't raise it so that's also one problem of the chunking okay third problem which will take in vector rag may not take it so that's also one problem of the chunking third problem which is in vector rag it comes and sees when you hit chunks When you over chunks perform vector similarity search Based on these numbers Your vector similarity search is performed now these numbers rely heavily on what
kind of question user is asking If the user gets the same keywords in his query, he says, look, user, anything. You do not have control over the user, if the prompt I have written Wrote my llm very well i mean i did not write the exact same If I used keywords which were inside my pdf file then its vector embeddings and were inside my pine cone then its vector embeddings and inside my pine cone The vector embeddings stored are very Will match easily and I like it very much You will get good relevant chunks
but it doesn't happen every time Maybe your book which we I ingested the terms that were inside it. The butt user is very different. Asks only vague or very high level questions question right how to do this It is possible that the chunks that are created, the vector embedding that is created, may never match with that original documentation, because the user does not know how to ask, the user does not know what his keyword was inside the original book, right, what was the keyword, what should I really ask, so here we rely on the user's query that the user's query will be good.
whose vector embeddings are That's from our original document vector embeddings only if they match What is vector similarity search? will return relevant documents So if the user's query is useless We will not find relevant documents and our llm output will not be good these are some of the problems Which comes inside the traditional rack, and these problems are now solved, now kind of solved using vectorless rag, okay, so vectorless rag as the name says, inside it you do not do vector embeddings at all, okay, in this also you have two phases, number one is the indexing phase, number two is the query phase, phases are exactly the same, but the indexing phase has changed here.
no vectors there is no pine cone there is no vector embeddings there is nothing there is no chunking even right so what do you do you use reasoning model okay because look what is overtime, your llms have become more smart. they are more capable they are more smarter more reasoning so you heavily rely on the reasoning models, they are more capable they are more smarter, can do more reasoning, so you heavily rely on the reasoning models, then do one thing These documents read, so this is one article, which I want to show you
and inside this is an example of Sholay movie ok, so this page index is called by the way, the vectorless Another name of raga, that is a page index, how to build a vectorless Raga, i.e. there is no vector embeddings, no vector DB, If you read this document a little, If we start, you can clearly see, Page Index is a vectorless, reasoning based, retrieval augmented generation, rag, okay, it's a vectorless, and what does it do, If we go here, instead of relying on semantic similarity search, What I was telling you was vector search, right?
semantic similarity search, Page Index builds a hierarchical table of content tree, here your data searches will be very useful, this is a very important line, so this is a very important line that is hierarchical table of content, okay, let's note this, because this is the indexing phase, so inside the indexing phase you are not making vector embeddings here right you are not doing any kind of chunking or vector embedding.
but what you basically do is you build something known as a TOC tree which is basically a tree, you must have read in data structures that it is a tree, right? What does a tree look like, your tree looks something like this you have something here, you have something here, you have something here Then you have some nodes here right you have multiple nodes, these are called nodes and then you are basically join these nodes so there are some such nodes right you have multiple nodes these are called nodes and then you are basically join these nodes so there are some such nodes right
so this is what a tree looks like so you build a table of content like you buy any book you have an index and if you have to read something what you do you open the index you see what is the line there and that's how you basically do it correct so this is something we have to build but here's the end and that's how you basically do it correctly Let's take So if we go back here Suvishwa basically create this from document use is large Language Model to Reason Over Its Structure Here Reasoning is Used The Model First
identifies the most relevant sections using the documents hierarchical hierarchy, the tree, then navigate to the section to generate precise answer. Ok? So that means, if we talk about the whole thing, Traditional raga worked on similarity, What does page index do? Does reasoning. This bit is inspired by that human, because if you ever notice, if I give you a very big book, I will give it to you in a very thick book, and I ask you a question, how will your brain perform? So that is actually something like page index. So what does page index basically do, by the way, before going on, so it also solves the problem of legal documents and legal contracts that I told you about, okay?
So what does page index do, number one, if we go down a little bit here structure before search okay so what you are going to do is you are going to build a hierarchical index so this is basically your entire pipeline document Will get you are going to create an hierarchical index of it then you reason on it based retrieval and then you will get an answer instead of doing this This is your vector embedding Okay, so what will we do first? First of all, we are going to build an index something like this, see if you have
Sholay movie, I am not sure if you saw it Is it or not, if you have sholay movie the book is from sholay movie What can you do, you can ask the llm to go page by page and create an index of it So how will an index be made of it, you will have a root document which is null, let's say inside that you just put a summary That does the entire Sholay movie? then what will you do you inside it Will you identify the scenarios? the scenarios is the right word If we go here too there is something known as scene headings
what's inside this movie What were the plots, what were the scenarios, LLM does it itself, reasoning models can do it okay, so what were the plots inside, what were the scenarios, LLM does it itself, reasoning models can do it, okay, so what did he do in life in Ramgad, Gabbars resign, right, final shutdown, after that Gabbars then, after that then let's see your bass recruitment of Jiro, Vero and Jai, so what did you do, which were the main headings, which There were main scenarios, where some plot twist happens, where the story changes,
Where a story is complete, what you did, one of You have created a table of content. Made its headings, content It's not here, it's just headings, okay, If you have created headings, then in headings See what can be created, what will be structural detection, Made scenes, made characters, made after breaks, major where If there was some transition, you made that, then there is no problem in it, major.
Where there was some transition, then you made that, So there is no fixed chunk size in this, Well, there is no fixed chunk size, What is there in this, based on reasoning you Identified the things, what are the different things, Meaning a new character was introduced, You posted it, maybe somewhere. There was a big twist in a movie, You took it as a detection, then maybe somewhere you have put that ending, maybe somewhere there is a big twist in the movie, you have given it a Edge detection was taken and then maybe there was an ending somewhere and there was an emotional scene there.
If you have taken that then you have detected things where things are changing. Okay and best on that. You don't have to do this, okay, now, then what you can do is, you can give some tags, for example, you turned blue, where If there are segments of a story, then from this In all the blue places, there is this There are segments of the story, then which ones did you read? Marked wherever there is something related to Gabar, You marked purple, Wherever there are critical events, and gold Marked, where there are any events,
again, LLMs can do it better, So you kept giving the documents to LLM, got him to do reasoning, on the basis of reasoning you got a tree generated where there are any events, again LLM's can do it better, so you kept giving documents to LLM, got him to do the reasoning. Got reasoning done, on the basis of reasoning You have generated a tree and a hierarchical mapping of yours It has become, that brother is my root node.
Sholay, after that you are making me watch it again I have all these first level branches ok, what happened inside him after that So based on that you have built a tree out of it, now every node What data can we store inside ok look at it this is a node this is also a node this is also a node So what data will we store inside each node? number one title title of that node id of that node This ID is very important this id is ok This is your node ID here This is basically a reference to the original document. Look here, we are just keeping it in a tree format. This is your node id. Here this is basically a reference to the original document.
Look, here we only see him are kept in a tree format but actual page number That in the official documentation where to get that thing node id Then kept a summary of it and its child nodes We have kept here this is how you do a tree in memory correct now if we are here Let's go so basically what will you do? whenever user Someone will ask a query, let's say You asked, why did Thakur lose his arms, this was our query, So what can you do here, you don't have to travel, you don't have to give the full movie, to you the full movie
So you don't have to pay for LLM, do you? You can do it, the full script is not sent, ok, because I need my context Not increase the size, no there will be no embeddings, there will be no embeddings no There will be no similarity size. What will you do by using the user's question? He will traversal your tree.
traversal birth will do the first traversal and what it will do is it will pick up the relevant nodes. Are not working on the original document, Right now we only have table of content, The tree we have just created, small tree, We will just work on it, So llm will be called, Friend, the user has not asked a question.
Why did Thakur lose his arms? So friend please do one thing for me, This is my tree, Don't search for it from this, what do you think, Which nodes are relevant? And bring its child nodes, So what will he do? On user's question, He will go to the hierarchical map, and will read the summary of each one, what will he do, based on the structure, he will pick whichever node he finds relevant, okay, because you have a very good tree, so what he did, he picked this node, he picked this node, and he picked this other node, because your tree was in a very good structure,
He picked this node and he picked that node because your tree was in a very good structure now what it can do is When you have relevant nodes you have found from here and there now you have node id You can also fetch relevant documents You can also fetch original documents plus because there was a summary here, you can use that also.
So only LLM's reasoning is used here. what nodes do i need and after that you can just give that data and you can do the retrieval, okay so that means if i'm here for a second Go back if user Asked something, it was a user's query so what you can do is, first of all Maybe this node is relevant for me OK, this is node relevant, but this node is not relevant Leave all this aside if it is not relevant.
Leave it all, it is relevant, so I picked it up, I did this pickup, so according to the user's query, I got a subset of tree, which is relevant for me, then what I can do is, inside each node there is a summary, meaning how will LLM decide which node is relevant for it, we had kept the summary, now basically what I can do is I can go to the original document.
I can fetch original chunks I can give that to llm and then I can do the retrieval, so that means What can you put inside each node First keep the node id, which is a unique id Then this is basically a location Maybe, there is a pointer pointer to original page, you can put something like this whatever you want to keep, after that you keep the title of every node, keep the description of every node, you can keep every node, whatever you want to keep, After that keep the title of each note, Keep the description of every note,
Keep a summary of every note, and obviously his child notes will come, which is basically again an array of notes, so this is basically how you can construct a tree, What happens to you in this, Look, LLM itself decides everything. what should I do, right, what I want to do, so that means, no vectors, no vector embeddings, no chunking, there is no semantic search. It purely happens on the reasoning and the capabilities of LLM.
So this is basically what page index is basically trying to do. So page index basically works on navigation and extraction. This mirrors how humans read.
When you want to know something the index basically works on navigation and extraction, this is the belief that in the same way so this is the main thing that basically that basically what happens here okay so if you go here in vectorless rag you see this one repository also which are introducing this thing this is a page index again not sponsored okay so this is basically a python sdk i feel yes this is in python which enables you page index so you can see what it happens you give it something you it builds up a tree then it does an LLM reasoning on the query and you get an answer, so this is the whole pipeline, this is a very relatively very new thing or just in case you want to see what a tree looks like
this is what a tree looks like, so you have a title you have a node id, you have a summary, you have child nodes Then after that title, node id, start index end index, where is it originally and a summary and inside that node id, start index, end index where is it originally and a summary and there can be child nodes inside it too so you construct a tree so here's your because LLMs have got smart over time So the reasoning models of LLM reasoning models and the smartness of the LLMs is basically used here
so that is how the vectorless rag comes into the picture So this is what is basically used here in this particular. So that is how the vectorless rag comes into the picture. So this was in this particular that how vectorless rags are coming into the picture, how vectorless rags are coming. So just in case you like this approach, let me know, I am even ready to code.
Recently in our very recent project, we have converted our traditional rag to a vectorless rag. The only trade-off that we have to give is number one the cost because reasoning models are expensive and the speed. Because you have to reason something and you have to do a tree traversal, it takes a little bit of time for the LLM to come to the final output.
because before that it reasons a lot. So the trade-off is that we are trading off time for accuracy. Okay, so that's the trade-off that we have to give. Rest, there are many relatively new things. Of course, it's an AI world. Things change very rapidly. new new things keep coming. So let's wait for the next thing what comes here.
But let me know in the comments that what do you feel about this Victorless lag? What are your takes on it?
Top Flutter App Development Company
Flutter is developed in 2017 by google as Alphabet Inc. flutter provides single codebase structure that can be used for development of website applications, mobile applications as well as android and ios also. In previous we have to use java + xml for android, and swift for ios but now it solves the problems as cross platform services. For startups companies they should choose flutter over native development. Flutter is a ui framework which uses dart. Its dependency manager is pub.dev where developer can find related packages as they required as payment gateway, state management, sliders, sms, etc.
Widgets:- in flutter, everything is as widget as child widget, stateless widget, stateful widget.
Editor:- we can use flutter in editor like vscode, android studio, etc.
Run:- we can run flutter in any devices like, android smartphone or ios smartphone, browser like chrome, edge, or any emulator if installed.
Apk:- we can build apk as to run on any smartphone.
Startups:- best for any business who want to build their application like ecommerce, education lms, water purification booking app, wifi booking app, matrimonial, bike taxi booking application, online food ordering application, etc.
Live:- for making application live in play store business have to purchase play console account of about 2000/- INR. For ios it will cost around 10000/- INR for lifetime purchase.
Cross platform:- we can build application for any devices like desktop app, android app, ios app.
Ai:- there are many ai platforms that they can help in design and development of building any kind of application as ChatGPT, DeepSeek, Claude, Grok, etc, now many of ai portal are providing full stack applications for web, app, etc. In Ai you can complex problems in second in meantime as the developer know how to do by reading the docs for steps properly. Some of ai website have limitations like long length characters query. You can build designs from scratch in any time as needed. For best resolution you can go for premium subscription to get relevant answers. The ChatGPT, Grok, Claude, DeepSeek is a generative technology means based on data which is on their servers or third party servers like websites their prompt will goes to search engine o find technical term for that then find all relevant sources then check and provides us. The all ai websites are using google search api from google cloud platform as GCP. In recent news google search api limited the data showing results in pagination to increase revenue because the ai website are dependent to get realtime data from internet and google has due to search console they list billion plus websites till now. For AI websites they need large amount of servers so they can choose AWS or datacenters for maintaining load balancers, backups, kafka, zookeeper, etc.
Ecommerce:- we can create shopping mobile applications in any vertical like B2B, B2C, D2C, etc. having features like home page, search, categories, products, add2cart, checkout, payment, order tracking, privacy policy, terms and conditions, shipping policy, delivery policy, returns and refund policy, login, register, etc.
Matrimonial:- we can build matrimonial application like shaadi.com, bharatmatrimony.com having features like login, signup using otp verification, profile creations, subscriptions, profile matches founder, payments using gateway, contact number and chats accessibility after purchasing subscriptions, privacy page, return and refunds page, etc.
Online Taxi Booking:- we can build online bike taxi mobile applications like rapido, ola, uber, having features as register, signin using otp verification, profile details, ride booking with start to destination point, after then check nearest rider on that location after accepting by the caption rides starts by using google maps api when reaches at location then captain has to complete the ride and collect payment via upi or cash as provided by the platform. User can see all rides whatever he has travelled , payments, etc. Captain can see all rides which he provides to customers and payouts what was transferred on particular date. Can check insurance as provided by company, bank account details, etc.
E-Learning LMS:- EdTech business can create LMS applications like Physics Wallah, Allen, Careerwill, selection way, etc having features like login and sign using mobile number, then show all available courses, users or learners can buy their relevant course by enrolling. After enrolling you can see purchased courses after clicked the course content will visible like video player having notes, QNA, relevant queries, assignments, etc. creating ticket features is also for users if they faced any issue regarding this.
Recruitment App:- Any entity who want to provides recruitment services like indeed, apna, workindia, etc. users can signup the app and then provide basic details like name, email, contact, updated CV, etc. based on interests or job profile category, users can apply with variations of benefits as provided by companies like salary range, cab facility, hybrid, travel allowance, cl, gh, etc. users can get notified in new jobs openings.
There are many categories for developing a mobile application using flutter. You can use backend like Laravel, or any other and for database you can use mysql or firebase, for pushing notification you can use firebase push notifications. In play console you can check how much app downloads counts as done.
Webgridsolution:- Global Tech provider webgridsolution provides multiples services like as website design, website development, mobile application development, saas platform, digital marketing, etc.
Webtrills:- India’s leading mobile application development services provider based in delhi having native, hybrid applications development.
Webkul:- webkul is online tech provider in services like development, marketing, etc. its main office is in Noida region.
Appsinvo:- leading full stack mobile applications company across the country with native, iot enabled applications.
Winklix:- Winklix offers mobile app development services for various platforms, including iOS, Android, React Native, Flutter, and Salesforce.
For any enquiries,
Contact support info@bindaasboldilse.com
Breaking: Oracle Announces 30,000+ Layoffs Worldwide
Oracle Layoffs Shock Tech Industry: Over 30,000 Jobs Cut
At 6 in the morning your phone vibrated and an email came. Today is your last working day No meeting, no conversation and no warning Just one cold email and your career is completely over. Globally 30,000 employees and 12,000 employees from India This was Oracle's biggest playoff mass in its 47-year history.
That company which has the history of every major year of India on its database. I will tell you three things, three villains who took away the jobs of twelve thousand Indians. And one explanation which you will probably understand now let's start the video Hello friends, my name is Deepak, you are watching this lot.
So first of all understand what is Uracal? Because this is most important to understand this video So the year was 1977, California, America. And there was a man, Larry Ellison, a college dropout, no money at home, no big background, but he had an idea, companies had to secure data, find it fast, and never allow it to be deleted, this was the data base, and with this idea, Larry Ellison created Oracle, And with this one idea, Larry Ellison created Oracle. What does Oracle actually do today? The first work is their database
Whenever you insert your card in ATM, where is the complete data of the bank? On Oracle's database, SBI, HDFC, ICICI, India's biggest banks actually run on Oracle. Consider ERP software as a second task. How does a big company like Reliance manage its finance? It manages its supply chain on Oracle's software and the third work is that of their cloud. Like AWS and Azure, Oracle has its own cloud. Oracle Cloud Infrastructure and when did Oracle actually come to India? Year 1994, they opened their first office in Bangalore.
And within 30 years, Oracle became one of India's largest tech employers. Before LEO, 30,000 employers used to work in India only. If you have ever touched any bank software Have seen the hospital system or worked in a big company So you must have encountered Oracle, it was and one Saved job for 30 years and now three villains have changed everything. The first villain and one is different in reality.
Listen, read that email once again, Today is your last working day, this line came at 6 in the morning and along with What happened, system access stopped immediately, no goodbye meeting, no manager's call, no thank you, this line came at 6 in the morning and what happened immediately, system access stopped, no goodbye meeting, no manager's call, no thank you came.
In Oracle India, some employees had 10 years of service, some had 15 years, some had 20 years. A manager also wrote in a LinkedIn post that a journey of 16 years ended in one email and all this happened when Oracle's quarterly net income was 6.13 billion dollars in a quarter, which means a profit of 6.13 billion dollars in a quarter and yet 30,000 people were fired. In reality, this was not survival, it was the company's decision to increase the margin. Now Oracle actually fired the employees but did they get some severance payment? If they got some compensation, then yes, the company has given them proper. The company fired all its employees, but did they get some severance payment, meaning they got some compensation, then yes, the company has given them proper. The company has given all its employees 15 jin salary on the basis of hour of service, plus 2 months' ex-gratia, plus notice pay, I and Ekal are not completely wrong here. I would say, they paid severance pay, followed legal compliance, but one thing they did not do was advance warning, in fact they did not give any signal to their employees, did not give any detraining offer, did not provide any transition support, just one mail and the job was over.
When a company is making a profit of Rs 50,000 crore and still sends an email at 6 o'clock, then this is not restructuring, this is a cold business. If you extract this from email then this is not restructuring, this is a cold business. Now coming to the second villain which is AI automation and this villain is the most dangerous.
Because it is not visible in reality. In fact, Larry Ellison talked about big work in January 2026. He said autonomous software eliminates human labor and human error lowering operating cost Understand what they have to say. They say that AI reduces human cost and human error. That's why they are completely shifting towards AI This is not a PR thing, this is actually Oracle's plan Oracle's biggest budget was in 2025-26 What did they actually do? He raised a debt of 50 billion dollars why? Why did Nvidia raise black debt to build AI data centers? To create AI data centers
For Nvidia's blackwell chips Actually, Oracle and Nvidia have created the world's largest AI super cluster. And they have also launched 22 AI agents. Which the company has named Oracle Fusion Agent Application: What work do these agents actually do? Earlier, 50 engineers used to do the work of cloud monitor, detecting server issues, solving customer support tickets, all that is now done by one agent.
In 2022, Oracle bought Cerner, America's largest health care IT company, for $28 billion. Now the entire system of Sarnar was actually the old code of Karuno Lines. Earlier it was estimated that several thousand engineers and developers would be required to write it anew. Together, this entire work will take 5-7 years.
Now Oracle actually did all this with AI in 3 years. 1000 soft developers woke up, AI ruined their work and where did those developers go? they disappeared This is the same pattern which is going on in Uracal India also. Cloud operations, level two support, quality assured testing, repetitive engineering tasks AI agents are doing all this now Monitoring more than a thousand cloud instances 24 into 7 without break a human team of 50 people She cannot achieve accuracy Friends, if your work is repeatable So that means it is also deleteable.
You will definitely understand this Now comes the third villain which is within us Understand this, there is an ostrich, when danger comes, he buries his head in the ground. The third villain, which is within us, understand this, there is an ostrich, when danger comes, he buries his head in the ground, he thinks, if I don't see him, then maybe the danger will go away, this is the same mistake which the IT professionals of India actually made.
Twelve thousand were selected, many of them had 10-15 years of Oracle experience. He had expert level in Oracle database. Was also certified in Oracle ERP, But he did not teach Oracle Cloud, Didn't teach AI agent, didn't teach generative AI, Why? Because Oracle was safe, Bank accounts were running, EMIs were being deducted.
The job seemed secure. And Indians don't actually do upskilling. Actually friends, this is not just about Oracle. This is a story of total Big Tech layoffs in 2025-26 Global jobs are gone by the hour Amazon removed 16,000, Microsoft removed 19,000, Meta removed 36,000 Google, IBM, the same pattern is going on everywhere, what exactly are you projecting, then you are the culprit of this project and this is the villain number three. Oracle did not give warning but Larry Ellison did.
Said himself in the beginning of 2026 Autonomous Software Eliminates Human Labels there was warning But there was no one to really listen And this is not just Oracle's story This is the pattern of the entire IT sector Oracle removed 30,000 Also their stock went up Imagine their stock was going down for a year, their stock was going down for a year, But in one day he showed high.
Because investors saw that the company is investing in AI. Removing umls and marchin will increase in future. This is the signal, this trend is not stopping. This is just the beginning of Oracle's AI driven restructuring. This is just the beginning and you will see the big game in the times to come. So friends, this is the story of Oracle.
But there are three business lessons in this which if you don't learn then you too will fall into the same trap. Understand the first lesson There is no such thing as a safe job Oracle was considered the safest job in India for 30 years. Used to go to banks, used to go to hospitals, Used to work on government project, And just one day, at 6 o'clock in the morning, an email changed everything, Actually no skill is safe, Is it only current or becomes outdated? This choice is yours now, Understand the second lesson, when a company invests in AI,
Then employees should also invest, Oracle plans to invest $50 billion in 2025-26; lesson number 1 is most important Repeatable work is actually a risk zone in India and the world Like cloud monitoring, L1, L2 support, quality assured testing, basic engineering tasks All this is actually taken over by AI But one thing cannot be taken away, friends, please understand carefully.
Critical thinking, customer relationships, system design decisions, understanding business context. If you go to these places then I will not be able to replace you. If eye is taking over your job role completely. So you have to become a manager of AI, Will have to become the CEO of AI, So friends, this was the story of Oracle, We have known three villains, The Oracle who fired people without warning, AI automation which is invisibly replacing people, And along with Indian people who leave work for later,
I have not made this video to scare you. I have made it because this is a warning, And warning is valuable, friends, today AI is one of the must do skills, I have not made AI to scare you, I have made it because it is a warning, and warning is valuable, friends, today AI is one of the must do skills, no matter what is your job role, you are a businessman, student, IT professional, or do any work, or just watch the house, if you don't need AI. If it comes, then you will be left behind the times, please comment and let us know if you too have been laid off from Oracle.
Claude AI Website Source Code Leaked
So ClaudeCode's source code has been leaked. This fellow right here, ChowFanChow, is the one that figured it out because he saw that ClaudeCode's NPM package included their source map file, which if you don't know what a source map file is, it is a JSON file, which actually you can just see right over here, that contains all of the original source code for that project, meaning this little .
map file right here getting pushed into the NPM package leaked the entire Claude Code source code to the Internet. And not only does it contain all of the source code, but we also have a reference so it points directly to the complete unminified TypeScript source code, which was downloadable via a zip archive on Anthropic's own R2 cloud storage bucket. Now, you may be wondering, well, why don't you have any of the code open up right here? Because I'm a little bit scared.
Anthropic loves to enact DNCA takedowns, which they have already done on every single GitHub mirror that they have found that contains the source code from their leaked source map file. Again, they did this. And you can see all of the repos that they specifically called out right here containing that source code.
Now, there is a little caveat here. If this code is actually written by Claude Code as they claim, then it means it's AI generated and US law dictates that AI generated things cannot be copyrighted. So therefore they have no standing to DMCA take down any of this code because it's not the copyright owner's behalf because it can't be copyrighted.
Now, do I wanna test that and try to fight YouTube's DMCA copyright system, which almost always sides with the claimant being anthropic? Not particularly, but if we did wanna look at any of the code, we could take a look at this repo right here. It's not the exact code.
I'm pretty sure it is the fastest repo to ever hit 100,000 stars on GitHub, but it's from an individual, I guess there are four contributors now, that took all of the features and everything from the Cloud Code source code and then ported the core features to Python from scratch and pushed it to this repo before the sun came up after they had seen that the leak had happened and then they said Python's not good enough let's rewrite the entire thing in rust so this this is awesome but how in the world did this leak happen i mean obviously it was an accident for them to push the dot map file well in order to understand that we have to take a look at bun
I'm not saying it's bun's fault bun is a fast javascript bundler and runtime and test runner and bun's awesome by the way and it's also a company that Anthropic just acquired. So Anthropic owns Bunn, and Bunn is the bundler for Cloud Code. And Bunn, by default, generates a source map file. Even though technically it says it does not generate the source map by default, and that the default is none none this has not been the experience for me or many others when you do use the source map flag right here and do not specify the value
it doesn't default to none it defaults to linked actually and this is on the cli side on the javascript side it also says it defaults to none however there is a bug that has been out for three weeks now that the source map incorrectly served when in production, when the docs say that it shouldn't be by default.
So you can go in your bunfig.toml file and set your source map to false or to none, and it shouldn't generate any source map, but they're having issues here. So frankly, I don't even know if it's supposed to not be generated by default or supposed to be generated by default because what it says in docs is not actually how it works.
And a lot of people thought that this bug was what caused the Claude code source code to leak in that source map file to sneak through. But Jared Sumner, the creator of Bunn, says that this has nothing to do with ClaudeCode because this issue is with Bunn's front-end development server, and ClaudeCode is not a front-end app. It's a TUI app.
It doesn't use Bunn's serve to compile a single file executable, so this wasn't the issue. That's not to say there isn't an issue that hasn't been reported on the CLI side. However, if we want to take Boris Cherny, the creator of Clawed Code himself, his word for it, it is not related to Bunn. It is just a developer error.
So human error or AI error, obviously they're not going to blame their own AI for doing something something like this. So really long story long, they forgot to add asterisk dot map to their dot npm ignore file because that would have prevented all of this. And that's an issue that they probably would have caught considering it was probably changed in a PR and nobody saw it. I bet Greptile would have caught it, the sponsor of of today's video because Greptile is the AI code reviewer, which is an AI agent that catches bugs in your pull requests with full context of your code base, which is how they got the name Greptile, because they grep your code base and like a
reptile, eat all the bugs, which honestly I think is an awesome name. And it goes over some people's heads. I've had to explain that to a few folks. And I greatly appreciate that Greptile themselves call it your second pair of eyes because an AI code reviewer, just like AI with coding, should not be the end-all be-all. It is a second pair of eyes.
It is something that you can use to help you identify bugs that maybe you have missed or just give it a first pass before you review a pull request. And with Greptile, you get context aware comments on your PRs. You describe your coding standards in English and you can personalize it for your team so it is consistent across the entire code base.
And you can write rules or point to markdown files. You can scope where the rule should apply, what repository, the file pattern, things of that nature. And you can track the rule effectiveness and, what repository, the file pattern, things of that nature. You can track the rule effectiveness and usage over time so you can fine tune the rules that you have set in your code base.
Many, many companies use Greptile, including Netflix and PostHog and NVIDIA and OpenClaw, and you know how many PRs they get. I would highly recommend taking a look at this page, greptile.com examples actually use the link in my description so they know that i sent you to take a look at how greptile has helped these companies and take a look at these case studies and you can go to the actual pr itself and how it responded to these different prs and the score that it gave it and the sequence diagram that's related to the exact PR that we have up here. So yeah, take a look at Grapetile. They're awesome. I use them for my
projects. Highly recommend. Again, if you want to check them out, use the link in the description. So there's a lot to go over in this video. I mean, there are 390,000 lines of leaked code. You may have seen the 512 or 513,000 line number. When I say 390k, that is actual lines of code not including comments, not including blank lines. So that's a lot of code.
Now as bad as I want to show you all of the code, I will just help guide you in the right direction for what to understand in case you wanted to look at the code yourself. And we got to start with the code names. So if you type in Tengu into the code base, you will see that it occurs over 1500 times. Tengu is Claude Code's internal project name. So this means Claude Code. If you see the term FNEC, migrate FNEC to Opus. FNEC latest is Opus. FNEC latest 1 mil is Opus.
Well, you guessed it. FNEC is Opus 4.6 to be exact. And then they have Capybara. As discussed, this is Mythos, which is a Clawed 4.6 variant. It is of its own name. It's not Opus. It's not Sonnet. It is Mythos. And then there is Numbat, which it literally says, remove this section when we launch Numbat.
And as you can see, we have that ant user type again. So only these individuals have access to Numbat. So it must be some sort of unreleased model that we don't know anything about. And I'm sure there's more, but those are the main ones. Now, if you want to get to the brain of Cloud Code, this is how Cloud Code kind of works.
You got to go to queryengine.ts. It is the core. It owns the entire query life cycle and session state for a conversation. So every single prompt you type goes through this code right here it handles the llm api calls the streaming responses so if you want to know how cloud code actually processes your prompts and how it handles llm api calls and streaming responses and tool call loops and thinking mode and retry logic and things of that nature this is the file you want to look at. There's also query.ts.
So this whole query module and everything that's tied to it, that's where you start. And then you can branch out from there or go down to tools.ts and check out all of their tools. It lists everything from internal only tools to all the tools that we know, agent tools, config tools, things of that nature. Their entire tool system's right there.
Their permission system is pretty interesting because they have default, they have auto, they have bypass, and then they have one called YOLO, which is deny everything, which I would think that's the opposite of what you would think. YOLO actually means, but that's neither here nor there.
And there's just a lot of interesting, nerdy things that you can dive into this code base and check out. But unfortunately, that's not what this video is about since I can't actually show you the code base. What I really want to talk about is not the deep dive into the code base, but some of the unreleased features and some of those silly fun features starting with undercover mode because it's absolutely hilarious so it's an entire system that only activates four anthropic employees which is what this user type equals ant has been determined ant means hey this user type is an employee, so activate these
features. One of those features being undercover mode, which is safety utilities for contributing to public open source repos. And it's specifically designed to prevent Anthropic's internal information from leaking, which we only know about because their internal information was leaked. I swear, you can't make this stuff up.
So it tells Claude to ensure that Claude or an employee can't leak this information to never mention that it's an AI, never mention the code names that it uses. Make sure you don't release any of the unreleased model numbers. So like Opus 4.7, no internal repo or project names, no Slack channels, no internal short links, no cloud code anything.
So when they're contributing to any other open source project, you don't know whether it's AR or not because there's zero attribution to cloud code or AI or anything like that at all. And this is a feature that is turned on automatically and cannot be turned off. Obviously, unless they go into the code and turn it off. But the other Anthropic employees who use cloud code don't have access to that code, presumably.
So they have no control over that. You know what? I think I'm just going to show some of this code and hope for the best. So if you're watching this in the future, hopefully it doesn't get taken down. But if it does and everything is blurred out, then that means I was able to successfully appeal it as long as I put blocks over the code and not show it.
Now I want to go over undercover mode first because it's just funny. It is ironic that they built an entire system being so concerned about leaking internal information. And the only reason we know that it exists is because they leaked internal information. And now there is an allow list.
So like if, if anthropic employees are using Claude code, this doesn't activate if they're working on an anthropic repo internally, that's on the allow list and ClaudeCode, this doesn't activate if they're working on an Anthropic repo internally that's on the allow list. And ClaudeCode would probably be on the allow list. And even if it wasn't, this isn't even an issue that that would have caught. Again, I just thought it was funny.
But now I want to talk about what is probably the biggest unreleased feature that they have known as Kairos. Now Kairos, I think it's important to note, it's named after a Greek word for at the right time or like the critical time. And I say that's important because Kairos is basically an always on Claude. It's like an assistant, a persistent assistant that is on in the background.
So instead of you typing a prompt and Claude responding, obviously you can still do this, Kairos watches your project. It keeps daily logs of your project. And on a regular interval, it gets a tick prompt where it decides, should I do something or should I stay quiet? And if it decides that it should do something, then there's a 15 second rule, where if whatever it wants to do would block your workflow that you're currently on for more than 15 seconds, it defers.
And it also has actually tools that are exclusive to Kairos, like send user file. So you can now get notifications for when Kairos does something as well as summaries. And those summaries actually have a brief mode. So when Kairos is active, there's a special output called brief, which is extremely concise responses designed for persistent assistant that shouldn't flood your terminal because you don't want to just be faced with like a wall of text from what Kairos is doing all the time. And I'm assuming that these responses are
the same as these summaries, but I could be wrong on that. And then there's push notifications, not to be confused with notifications, because as you know, push notifications are like what are on your phone, which by the way, check out my phone. I did the halo thing with the orange phone and the green case.
But push notifications because you can use cloud code from anywhere now and it's all interconnected and things of that nature so basically you have this assistant looking at your code base maybe it sees a bugs or to do or a fix me or something along those lines and i assume based on what i've seen that it'll tackle those those things again in the background as long as it passes through that 15 second rule and the prompt and things of that nature.
And then it'll push files directly to you for the changes it made or any questions it has or anything like that. I can only imagine how expensive that is going to be. And then they have this thing called the dream system. However, this was already what I believe to be leaked in the form of slash dream because this post from eight days ago said that slash dream auto dream was just quietly released and just like this individual right here couldn't find any sort of official announcement from anthropic about this release so I don't know if they just quietly released it or they didn't mean to release it and it leaked out a week ago. But basically
what it does is that it serves as a background memory consolidation engine that runs as a forked sub-agent. And it has to pass three gates. One is it has to have been at least 24 hours since the last dream. There has to be at least five sessions since the last dream. And it must acquire a consolidation lock so it prevents concurrent dreams.
And all three of these must pass. And then what it does is it prunes out all of the bad information. It effectively takes those to dream up a new conversation that didn't actually happen, but with only relevant information from those past sessions. And the prompt literally says, you are performing a dream, a reflective pass over your memory file.
Synthesize what you've learned recently into durable, well-organized memories so that future sessions can orient quickly. So that's what it does. And then there's another feature that I am even more afraid of for how much it's going to cost compared to Kairos even, and that is coordinator mode. Okay.
Coordinator mode module, get coordinator user context is all of this. Yeah. Conditional import for coordinator mode. So this one turns Claude code, the single agent, as we all know, and may, may or may not love into an agent orchestrator. This is something that warp just released as well with Oz.
This is what a lot of AI labs and tools and folks are doing right now so basically what you do instead of you being the manager over all of these different claude code sessions you tell a claude code session to become a manager over multiple worker agents in parallel so you're just outsourcing yourself as manager to claude code as manager, and you are the Claude Code manager's manager. And you can see the prompt for coordinator mode. I sure hope I don't.
This may be the biggest leak is the actual prompts, not just the code. But anyway, it's telling coordinator mode. Actually, I'm just going to go to this. Oh, it has all of this in here. You are Claude Code, an AI assistant that orchestrates software engineering tasks across multiple workers. So that's what it is. And then it has a mention of, under concurrency, parallelism is your superpower.
Workers are async. Launch independent workers concurrently whenever possible. Don't serialize work that can run simultaneously and look for opportunities to fan out. When doing research, cover multiple angles to launch workers in parallel, make multiple tool calls in a single message. And then I'm not gonna show you the rest of this.
It hopefully will be blurred out, but there's a lot of information. This is coordinatormode.ts. Oh, and I gotta show this. So there are prompt tips for the Claude Code manager agent to prompt the worker agents. And it said, bad examples are fix the bug we discussed because it has no context. Workers can't see your conversation.
It says another bad example is based on your findings, implement the fix, create a PR for recent changes. Something went wrong with the test. Can you look? So if you want to know how to properly prompt, just see how the Cloud Code team told the Cloud Code manager agent to prompt the Cloud Code workers.
And then there's a feature called Ultraplan, a 30-minute remote planning session. And I feel like plan mode is slept on. I feel like a lot of people should use plan mode that don't. If you're just sitting there prompting Codex or Cloud or whatever every single time and not going into plan mode that don't.
If you're just sitting there prompting Codex or Claude or whatever every single time and not going into plan mode or not telling it to go into plan mode and actually create a plan for what you want, I think you are not only wasting time, wasting money, your hair is going gray sooner because my results, having it plan out something in its own plan mode based on what I want, and the code that comes out at the end, the usability of the code is, well, I'm not gonna give a number to it, but it's a whole lot better.
I face less problems when I have the AI plan my problem out first or plan the solution to my problem out first. And ultra plan mode is where Cloud Code offloads a complex planning task to a remote cloud container runtime session, which is running Opus 4.
Gives up to 30 minutes to think and let you approve the result from your browser. However, something super weird about it is, and oh, they did actually include this. So your terminal shows a polling state and then it checks every three seconds for the result. Why three seconds? That seems like a lot. Now, if it doesn't cost that much to check every three seconds, it's very minuscule, then okay, so be it.
I mean, and that is probably the case, so fine. And not only that, but you can actually, there's a browser-based UI that but you can actually there's a browser-based ui that lets you watch the planning happen and approve or reject it and when you approve it there's a sentinel value right here ultra plan teleport sentinel that teleports the plan back to your local terminal so basically claude goes away to think about how to implement your plan while you can go do something else and all of these are very cool and all don't get me wrong but there's one that is wild and that is
anti-distillation which sends fake tools opt-in for one pcli only i don't know how to interpret that. Basically what it does is that when you have other harnesses and tools running Claude's models, they can see how Claude, like Opus 4.6, for example, how it operates. And this anti-distillation deal right here is Anthropicix way to actively defend against competitors trying to copy Claude's behavior.
And when this anti-distillation CC feature is on, Claude code injects fake tool definitions into its API requests. So to clarify, a competitor could be recording the API traffic, trying to distill how how Cloud works. And this is giving them bad data because it's injecting fake tool definitions into those API requests.
So basically the Cloud Code team and Anthropic are like, we have the best harness. Everything we do is the best. We don't want anybody to steal what we're doing. So we're going to make sure that they can't and put these measures in place to ensure that nobody can steal it. And then we're going to release the entire source map, our entire source code for Clogged Code.
So that way, all of these people that we gave bad data in the first place that were trying some abstract way of copying us can just straight up copy us and see exactly how we do everything i also don't think that cloud code is the best harness there is but that's neither here nor there now those are like some pretty solid features that are to be released but what's way cooler than that are all of the silly things that are in this code base and one was i don't know if i haven't checked today i'm recording this on april 1st was supposed to start today april 1st and that is their buddy system
i think they were going to roll it out from april 1st to april 7th and and i i do feel bad that they can't yes teaser window april 1 through 7 2026 only command stays live forever after and it was going to be a cool little like a tamagotchi pet system called buddy where somebody created this nice little site clawed buddy viewer where you can view all of the sprites it's 18 animated companions across five rarity tiers and and each buddy gets five stats.
Show stats. So you have debugging, patience, chaos, wisdom, snark. This one down here, you can see that it's heavier on wisdom. This one's heavier on chaos. I don't think I'd appreciate that. I would like the wisdom, maybe a little bit of snark, and some really good debugging. Which one's the best at debugging? Oh, some of these, some of the common ones up here. So yeah, that was leaked.
Penguin mode was leaked, which is just the code name internally used for fast mode. But you can see that like everything else is Penguin themed as well. It's just Penguins all the way down. And I'm sure y'all have seen this by now. 19 million views Wes boss Claude code leaked their source map.
Well, okay, we went over that I immediately went for the one thing that mattered spinner verbs There are a hundred and eighty seven spinner verbs that they have in this array here So, you know when it's you know do it's working So and it says it's fermenting or it's forging or it's discombobulating, you can see all of them right here. And there's also 20... Apologies for the parents in the room, hide this from your kids. I don't think I have many kids that watch my channel, but I'm not going to say any of these words because I...
I just don't cuss or try not to. But they filter out 25 swear words to ensure that these are not included in their random four character IDs that they give, which, okay, that makes sense. I also saw, I don't know if Wes posted this, but I also saw that they recognize when the user cusses at Claude Code and then uses that data as an indication that the user is frustrated with Claude Code. Let's figure out why and fix it.
Oh, it's right here. Yes. Swearing at Claude Code logs your prompt as negative in their internal analytics. Continue, keep going, and go on all match for the agent to keep going. Well, that didn't match exactly what I said, but I read elsewhere that they use that data in order to improve. That's why they log it. They use it in order to improve their product.
And we can go on and on and on about this, but what does this really mean for competitors and for the industry? competitors and for the industry. Nothing really. People don't pay for Claude Max in order to use Claude code and Claude models on Claude code. They pay for Claude Max because they want the subsidization of the inference.
These Claude Max plans used to give you $2,000 or more worth of inference for the $100 or $200 that you pay for the plan per month. But now as we hit a thousand different posts just like this, where people are hitting their limits very quick this week, even with 20X Pro plan, it makes my cloud code unusable. That subsidization seems to be quickly coming to an end and they're just kind of rug pulling it too.
And I only use the term rug pull in the sense of they're not being transparent about it. They're not saying, Hey, we've been giving you all this free stuff for a while now. We need to scale that back. They're just doing it without saying anything. And that is the biggest issue that I have with Anthropic is that they are very bad at PR, not pull requests.
Use Greptile for pull requests. I'm talking about public relations because they seemingly never want to listen to us. It's like everything they do, they have to put their lawyers as the PR people. I don't know whose idea that was. That's not a good idea to have lawyers as your PR people have devs talking about these things.
Like somebody that's relatable, like devs talking to devs. It makes sense. I can go on a whole tangent about this. I don't really want to. That's why I am stumbling over my words right now, but it's just whatever. One super funny thing that I wanted to include earlier in this video, but I forgot, is this PR right here to the official Anthropic Cloud Code repo, which has pretty much always existed, but it just didn't contain the source code.
It contains other things like skills and things of that nature. Somebody created a pull request to add the entire leaked Claude code source code to the Claude code repo, but it was generated with Claude code, the actual pull request itself. And then it got closed and then it got deleted. And the only way I'm able to be on this page right now is because I never refreshed the page.
But what I'm going to do right now, okay, I'm going to log all of this information just for history's sake. As well as my, it looks good to me, go ahead and merge comment right there. Let me go all the way down. Okay. And as I refresh, it'll be gone. Because like I said, Anthropic has been DMCAing all of way down. Okay. And as I refresh, it'll be gone.
Because like I said, Anthropic has been DMCAing all of the mirrors and that pull request contained the entirety of the source code on their own repository just underneath the pull requests. Now, technically it takes you elsewhere to the code files, but that's still technically under the pull request in the repo.
So they had to get rid of that one as well. So that is the Claude code source code leak. And that's all I have to say about it. Y'all have a good one.