Monday, 20 June 2016

Get All Items in 5000+ large list with CSOM in PowerShell

Last week I had to write a script that needed to take all items in large SharePoint Online list.
By large I mean above 5000 items. This means that the list is above the list view threshold in SharePoint Online, which is 5000 and we cannot change that. The way to get all items in SharePoint Online is to use CAML query. However if it is just an empty query without any filtering it will fail, if you use unindexed column for filtering or ordering the query will fail, if you filter/order by indexed column and the query returns more than 5000 items it will fail again. The error in this and other scenarios is similar to the one below.

Exception calling "ExecuteQuery" with "0" argument(s): "The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator."

The way to workaround this is pagination of the view. This means that we will have a row limit of the result that the query can return that should be less or equal to 5000. Once we get the first 5000 items we can do another query for the next 5000 starting from the position where the first result(page) ends. This is the same with what we do in the UI scrolling foreword in the list view. Below is an example PowerShell snippet that will take all items from a list using 5000 items page size ordering the items by ID.

$list = $ctx.Web.Lists.GetByTitle($DocLibName)
$ctx.Load($list)
$ctx.ExecuteQuery()
## View XML
$qCommand = @"
<View Scope="RecursiveAll">
    <Query>
        <OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy>
    </Query>
    <RowLimit Paged="TRUE">5000</RowLimit>
</View>
"@
## Page Position
$position = $null
 
## All Items
$allItems = @()
Do{
    $camlQuery = New-Object Microsoft.SharePoint.Client.CamlQuery
    $camlQuery.ListItemCollectionPosition = $position
    $camlQuery.ViewXml = $qCommand
 ## Executing the query
    $currentCollection = $list.GetItems($camlQuery)
    $ctx.Load($currentCollection)
    $ctx.ExecuteQuery()
 
 ## Getting the position of the previous page
    $position = $currentCollection.ListItemCollectionPosition
 
 # Adding current collection to the allItems collection
    $allItems += $currentCollection
}
# the position of the last page will be Null
Until($position -eq $null) 

Few word about the query, I am using RecursiveAll because I used it against library and I wanted to get all items in all folders, the size of the page is 5000, just on the edge of the threshold and I am ordering the result by ID because this column is always indexed.
I am using Do-Until loop to get all pages and setting the position to be the position of the last item collection that was retrieved.
This is really a powerful and quick way to workaround the annoying 5000 list view threshold. I hope you find it useful!

12 comments:

  1. Thank you, this was super helpful for me

    ReplyDelete
  2. Thanks for the code. The query of the example has not a "where" clause.
    I have tried adding "where" clause that returns less than 5000 items and it works properly.
    However if more than 5000 items are returned I get the error message, it seems as if RowLimit were not applied in that case.
    Anyone has experienced this? How did you solve it?

    Thanks again

    ReplyDelete
    Replies
    1. Hello, I think that you can apply the same principal with where close. Keep your row limit below/equal to 5000 and the fields used in the where clause should also be indexed. https://support.office.com/en-us/article/Manage-lists-and-libraries-with-many-items-fba0c8b7-79bb-4ea4-9aff-bf145a689e8e?ui=en-US&rs=en-US&ad=US&fromAR=1#bm2

      Delete
    2. Hi Ivan, I have read the article you say, my fields are indexed, however if the result with the filter is more than 5000 items it seems that the rowlimit is not applied.
      If I apply no filter, only the rowlimit it works perfectly, returning result in batch of 5000 items.
      I don't nkow if I'm missing something, any help would be appreciated.

      Thanks again

      Delete
    3. I have just tried to query with where clause based on ID, which is always indexed and it works perfectly, but not with other indexed fields. Desperate :-(

      Delete
    4. Hi When did you set the columns to be indexed? Maybe you should give it some time to get the columns actually indexed on DB level. You should not have issues if the columns are indexed. To test this use where with ID and search for a range of IDs that you know it will return more than 5000. Like where ID is more than 100 and greater than 5300.

      Delete
    5. Hello Ivan
      It's months since my columns were indexed, so that shouldn't be the problem.
      I have tried with ID >= 0 and it returns 7000+ items, so that's ok, but with other fields I have not the same behavior. That the strange thing.
      Have you ever tried wih other columns different to ID?

      Delete
    6. Hello,

      No actually I have not tried with other columns, so you say that instead of ID you are ordering by different field. Have you included that field in the query ?

      Delete
    7. I think we have set index columns before reaching the 5K items in list otherwise index columns or not working.

      Delete