这个Any比包含的更好还是不更好?

This Any is better or not than this contains?
2020-11-21
  •  译文(汉语)
  •  原文(英语)

我正在使用EF6,我想获取表中一组ID中的记录.

例如,在我的测试中,我使用4个ID.

我尝试两个选择,第一个选择.

dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));

这个linq表达式生成的T-SQL是:

SELECT 
    *
    FROM [dbo].[MiTabla] AS [Extent1]
    WHERE  EXISTS (SELECT 
        1 AS [C1]
        FROM  (SELECT 
            [UnionAll2].[C1] AS [C1]
            FROM  (SELECT 
                [UnionAll1].[C1] AS [C1]
                FROM  (SELECT 
                    cast(130 as bigint) AS [C1]
                    FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
                UNION ALL
                    SELECT 
                    cast(139 as bigint) AS [C1]
                    FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
            UNION ALL
                SELECT 
                cast(140 as bigint) AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
        UNION ALL
            SELECT 
            cast(141 as bigint) AS [C1]
            FROM  ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
        WHERE [UnionAll3].[C1] = [Extent1].[MiID]
    )

可以看出,T-SQL是使用许多子查询和联合的"存在的地方".

第二个选项是包含.

dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));

和T-SQL:

SELECT 
    *
    FROM [dbo].[MiTabla] AS [Extent1]
    WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))

将包含转换为"where in",但是查询的复杂度要低得多.

我已经读过它曾经要更快一些,所以我怀疑,尽管乍一看它更复杂,但是它是否更快.

非常感谢.

编辑:我有一些测试(我不知道这是否是测试此的最佳方法).

System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
                        miswContains.Start();
                        for (int i = 0; i < 100; i++)
                        {
                            IQueryable<MyTable> iq = dbContext.MyTable
                            .Where(x => myIDS.Contains(x.MyID));

                            iq.ToArrayAsync();
                        }
                        miswContains.Stop();



                        System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
                        miswAny.Start();
                        for (int i = 0; i < 20; i++)
                        {
                            IQueryable<MyTable> iq = dbContext.Mytable
                            .Where(x => myIDS.Any(y => y == x.MyID));

                            iq.ToArrayAsync();
                        }
                        miswAny.Stop();

结果是miswAny约为850ms,miswContains约为4251ms.

因此,第二种选择比较容易受到污染.

速聊1:
Entity Frameworkselect *?生成了代码 ID的类型长吗?您如何知道结果是否未缓存,是否尝试交换查询顺序?
解决过程1

第二个选择是我能想到的最快的解决方案(至少对于不是很大的id数组),前提MiTabla.MiID是您在index中.

如果您想了解有关in子句性能的更多信息:SQL IN是否对性能不利?.

速聊1:
我已编辑原始帖子以添加测试.但是我不知道这是否是测试这种情况的最佳方法.
速聊2:
不,不是,因为您没有考虑一些外部因素,例如SQL缓存或网络延迟,尽管这可能会给您带来粗略的估计.无论如何,您要反复进行20次以进行第二项测试,而要进行100次以进行第一项测试(结果表明,这样做的速度要快5倍,这可能是由于该原因).另外,您可能应该使用await指令iq.ToArrayAsync()
解决过程2

如果您知道ID,则使用LINQ2SQL Count()方法将创建更加整洁和快速的SQL代码(比Any和Contains都多):

dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);

生成的用于计数的SQL应该如下所示:

DECLARE @p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = @p0
速聊1:
在这种情况下,我知道ID.但是,当我有很多ID时,查询将如何?
速聊2:
您检查匹配项的计数-如果计数为正(如果ID不是唯一的,则至少为1),则ID列表中包含指定的ID.
解决过程3

您可以通过Any根本无法扩展的查询形状来判断.不需要myIDSSQL异常(超过了最大嵌套级别)就不需要很多元素(大约50个).

Contains在这方面要好得多.在性能受到严重影响之前,它可以处理成千上万的元素.

所以我会选择可扩展的解决方案,即使Any数量较少可能会更快.这有可能使Contains甚至更好的可扩展性.

我读过,它用起来更快.

在LINQ到对象中,这通常是正确的,因为枚举在第一次命中时就停止了.但是,使用针对SQL后端的LINQ,最重要的是生成的SQL.

I am using EF6 and I would like to get the records in a table which are in a group of IDs.

In my test for example I am using 4 IDs.

I try two options, the first is with any.

dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));

And the T-SQL that this linq exrepsion generates is:

SELECT 
    *
    FROM [dbo].[MiTabla] AS [Extent1]
    WHERE  EXISTS (SELECT 
        1 AS [C1]
        FROM  (SELECT 
            [UnionAll2].[C1] AS [C1]
            FROM  (SELECT 
                [UnionAll1].[C1] AS [C1]
                FROM  (SELECT 
                    cast(130 as bigint) AS [C1]
                    FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
                UNION ALL
                    SELECT 
                    cast(139 as bigint) AS [C1]
                    FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
            UNION ALL
                SELECT 
                cast(140 as bigint) AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
        UNION ALL
            SELECT 
            cast(141 as bigint) AS [C1]
            FROM  ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
        WHERE [UnionAll3].[C1] = [Extent1].[MiID]
    )

How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.

The second option is with contains.

dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));

And the T-SQL:

SELECT 
    *
    FROM [dbo].[MiTabla] AS [Extent1]
    WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))

The contains is translated into "where in", but the query is much less complex.

I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.

Thank so much.

EDIT: I have some test (I don't know if this is the best way to test this).

System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
                        miswContains.Start();
                        for (int i = 0; i < 100; i++)
                        {
                            IQueryable<MyTable> iq = dbContext.MyTable
                            .Where(x => myIDS.Contains(x.MyID));

                            iq.ToArrayAsync();
                        }
                        miswContains.Stop();



                        System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
                        miswAny.Start();
                        for (int i = 0; i < 20; i++)
                        {
                            IQueryable<MyTable> iq = dbContext.Mytable
                            .Where(x => myIDS.Any(y => y == x.MyID));

                            iq.ToArrayAsync();
                        }
                        miswAny.Stop();

the results are that miswAny is about 850ms and the miswContains is about 4251ms.

So the second option, with contaions, is slower.

Talk1:
Entity Framework generated the code with select *? Are the IDs of type long? How do you know if the result are not cached, did you try to swap the queries order ?
Solutions1

Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.

If you want to read more about in clause performance: Is SQL IN bad for performance?.

Talk1:
I have edit the original post to add a test. But I don't know if it is the best way to test this cases.
Talk2:
No, it isn't, because you are not taking into account some external factors, like SQL cache or net latency, although it may give you a rough estimation. Anyway, you are iterating 20 times to do the second test and 100 to do the first one (and the results show it's 5 times faster, probably due to that). Also, you should probably use await the instruction iq.ToArrayAsync()
Solutions2

If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):

dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);

The generated SQL for the count should look something like this:

DECLARE @p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = @p0
Talk1:
I know the IDs in this case. But when I have many IDs, how it would be the query?
Talk2:
You check the count of the matches - if the count is positive (at least 1, if ID is not unique), then the list of id's contains the specified id.
Solutions3

You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.

Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.

So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.

I have read that any it use to be faster,

In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.

转载于:https://stackoverflow.com/questions/28398486/this-any-is-better-or-not-than-this-contains

本人是.net程序员,因为英语不行,使用工具翻译,希望对有需要的人有所帮助
如果本文质量不好,还请谅解,毕竟这些操作还是比较费时的,英语较好的可以看原文

留言回复
我们只提供高质量资源,素材,源码,坚持 下了就能用 原则,让客户花了钱觉得值
上班时间 : 周一至周五9:00-17:30 期待您的加入