There is a sweet spot that lies between embedded and external.
For an external DSL, we are implicitly assuming a particular approach to code distribution, e.g. Alice sends Bob some DSL code, which is then parsed and interpreted by a program that Bob already possesses, which hopefully has a compatible version. This sweet spot is found by tweaking the code distribution model: Alice sends Bob some DSL code, and Alice also sends Bob the program to parse and interpret (and optimize) this code. Alice can then address corner cases by updating her interpreter, and further retains all the advantages of external DSLs.
Of course, design at this sweet spot presents a few performance challenges.
First, those parsers/interpreters/compilers/optimizers can be relatively heavy – themselves having hundreds of kilobytes up to a few megabytes of code. Alice doesn’t want to send the full interpreter every time a web-app is loaded… and Bob, similarly, doesn’t want to parse and compile and link a big interpreter every time. If a DSL is popular across dozens of websites, Bob also doesn’t want to load dozens of copies to accommodate his tab addiction. We need a good way to cache and reuse the interpreter code, at which point Bob can afford to optimize it heavily and use it hundreds of times.
Second, we ideally want the DSL code itself to operate with near-native performance. This suggests we need some sort of just-in-time or dynamic compilation, i.e. such that we can compile a DSL by translating it into the host language or bytecode. People care about performance. It would undermine the use of DSLs if developers had to jump through hoops to squeeze excellent performance from them.