Managing Lists and Libraries with Thousands or Millions of Items

One of the biggest challenges for a document management system built on SharePoint is scaling lists.  There is actually quite a bit of knowledge spread across the help files, technet documentation, in whitepapers and on blogs… but it’s still quite difficult to navigate this challenge.  I also see this as a step in the progress toward upgrade.  People that are looking to optimize their environments should be looking at large lists sooner than later… here are some of the impacts…

  1. Can slow down indexing
  2. Can affect performance of the list, site, database, network and server(s)
  3. Can affect adoption and user perception
  4. Can cause you to reorganize and restructure your content causing headaches and business issues due to poor planning or misunderstanding
  5. Can affect your ability to scale your application

Managing large lists is not necessarily only an end user or power user problem it is a developer issue and an IT issue….

Did you know that Microsoft IT scans for lists over 5000 and assists in either cleaning up or helps the users optimize them? See: SharePoint Performance Optimization Microsoft IT

Did you know that PreUpgradeCheck (in SP2) and SPDiag (in the Admin Toolkit) will help you identify large lists?  For PreUpgradeCheck I’ve been telling people to make copies of their environments, but if you’ve already got SP2 you can run it today.  It’s read only and an STSADM command that’s actually pretty quick and gives you amazing reports. Ultimately this PreUpgradeCheck is a best practices Analyzer.

While the planning software boundaries Technet article mentions you can scale a list to 5 million documents, it is extremely important how you do it.  “Item – 2,000 per view – Testing indicates a reduction in performance beyond two thousand items. Using indexing on a flat folder view can improve performance.” I’ve mentioned SQL blocking that can happen not only on other areas of the site or site collection, but on the server and blocking in the database.  Blocking SPIDs… ever heard of em?  That’s why you have to be careful ever killing a blocking SPID from SQL on a SharePoint box.  Why?  You could end up with data or site or web orphans.  Ever killed a blocking SPID in SharePoint?  Better pull out your trusty SP2 featured PreUpgradeCheck (Don’t be afraid of it.  It doesn’t write, and it will save your life in identifying best practices.)

image

I noticed some content in help and online info has changed over time to help clarify help and assistance on this topic.  The content is very important, I’m not sure why the content isn’t as well understood in broad SharePoint circles.  It’s very common to have someone rattle off responses to the question… “What methods can you use to optimize a large SharePoint list or library?”

  1. Folders
  2. Indexed Column

There are many more options you might not be aware of:

  1. Configure View Item Limit (see screenshot below)
  2. Limit all views and use Search to query (see large list whitepaper)
  3. Filtered View (subset of content optimum performance would be based on an indexed column)
  4. Tree View in Document Center (MOSS only)
  5. Webparts –  Upcoming Tasks, Relevant Documents (Personalized views of documents you create, checked out, and modified)
  6. Combination of Folders for Explorer view and enable options to stop folders for Browser based Views
  7. Limit max limit item on all views and use Access for editing
  8. RowLimit (SPQuery Developer Option)
  9. GetCachedListItemsbyQuery (MOSS Only Developer Option)

 

Recommended custom views & indexes:

Office Online on large listsEach additional column index consumes extra resources in the database. Therefore, you should add indexes only to columns that will be used actively in views on the list or library.”

Recently changed (Index based on “Modified” includes new documents added, create filter based on Last 7 or last 30, etc…)

My Files – (Index based on Created by, sort & filter based on this or create a search based on this attribute)

Due in the next week (Indexed based on Date Due, such as in a task list)

Contact list – Use Outlook to manage and edit the list it’s much more efficient for editing

Task List – use Outlook to manage and edit the task list… much more efficient for editing

Custom Lists – Consider using Excel 2003 or Access 2007 for editing large custom lists as well as datasheet view with limited max “Item limit”

Office Online “You might want to remove the Manage Personal Views permission from contributors for a large list or library. By removing this permission, you can prevent users from creating a view that spans all of the items and that might adversely affect the performance of the rest of the site.”

 

Ultimately the best information written on the top of large lists is the whitepaper done by Steve Peshka which provides charts and detail on the best retrieval methods.  I quickly extrapolate from that paper, that the out of the box queries and pages are the worst way to work with SharePoint and that search is the best method.  Reza’s interpretation of the paper I believe, do make the paper itself much easier to understand. 

20 key Points Arising, or Inferred, From “Working with large lists in MOSS 2007” Paper (PDF)

<Update July 16, 09> RE: Copy/paste mistake in Fastest methods to query large lists in MOSS. Anyway,I have corrected the issue and updated the docuement so please g o ahead and update your post with the following section (just copy and paste below into your blog post please) </Update>

The Fastest Methods to query Large Lists in WSS & MOSS based on Reza’s  Dev Horizon blog.

 

Fastest methods to query large lists in WSS (Not explicitly mentioned in the paper) :

I)   Looping through SPListItemCollection of SPList.GetItems(SPQuery)

II)  Lists Web Service 

III) Looping through each DataRow of SPListItemCollection.GetDataTable()  

Fastest methods to query large lists in MOSS :

I)   PortalSiteMapProvider. Depending on the column you use in the WHERE clause 

     of the SPQuery object used in this method and whether  or not that column is 

     indexed there might be some performance degradation. More about this in tip #10. 

II)  Search API 

III)  Looping through each DataRow of SPListItemCollection.GetDataTable (). Pleas e 

     be note d that, for smaller sets  of data, for example 1500 list items, this approach  

     may outperform querying using Search API technique. 

Reza’s book just got published on Professional SharePoint & Reporting Services.

 

Datagrid View

Some may think that using datagrid view will allow you to continue to scale with your data for editing.  Unfortunately despite that you can begin reading the data in your list with dataview, the query continues to “retrieve” data and keep it read only until the query is complete.  If it can’t get a lock or meet various other critera you’ll get a read-only view.

Using list max “Item Limit” and then Limiting the columns “display”-ed will help return the results for much easier editing.

image

Note the bottom right hand corner says “ready-only”

 

Folders

Folders are seen as very old school, but in the past this was the most common means of managing large lists.  Now with the recommended 2000 max per folder it’s straight forward to explain to users who manage these lists.  Employing this strategy with the max limit Item limit of somewhere between 100 or 2000.  You’re better off.  You’ll have to play with this, it isn’t as simple as it sounds.

My users commonly use Web Dav to access my document library, what can I do to optimize my list?

While it may be easy to say, don’t use ‘em.  You can actually potentially satisfy both types by using folders and views.

image

More and more I see people blocking SharePoint Designer.  It wouldn’t be difficult to block the explorer view.  You could turn off the “browse directories” right which would disable viewing folders in Web DAV interfaces such as Explorer View and SharePoint Designer in one go.

image

 

Here’s a Quote from the Comments from GetThePoint Blog on “Manage SharePoint Large Lists for Performance”

“Hi Dina,

Do you have any tips to increase the performance of the "Save As" dialog in the Office clients, when working with large lists?

Let’s say you’ve got 10,000 documents in a document library, and you are careful to display them only 100 at a time in the web browser,  using a 100 item limit on all views. Indexed columns are also in use.

When you "Save As" in Word, the item limit on your views appears to be ignored; Word’s "Save As" dialog will try to display all 10,000 documents and performance will degrade.

Is there any way to work around this, to make the Office clients only use the carefully crafted views?

Thanks

Pete

Your best option here is to folderize. Having less than 2,000 items per folder should improve your performance significantly for the Office "Save As" dialog, which does not utilize SharePoint views and does not support paging. So, it should be smoother if you split up the items inside your document library into at least 5 folders.

Dina”

I found this Q&A at askville helpful on improving client performance with WebDav Web Folders. There are third party clients that can help you map and connect to https WebDav for example.

Confused about why web folders are so odd and often tough to troubleshoot?  Here are some resources:

While dated I found this KB article 838028 on “How documents are opened from a Web site in Office 2003” to be the most technical resource on the topic, of check out 841215 for common WebDav Client Auth issues.

There are a lot more methods that can be used, and ultimately both of the solutions above can oversimplify the problem and solution.

Here’s an example of a list with more than 2000 items.  First off you can see… This list or library contains a large number of items.

image

When you click on the link you get some help content that explains that “When the number of items in a list or library grows to more than one thousand (MOSS lists two thousand in help), the performance of the list or library may begin to slow down or adversely affect other areas of the site.”  Essentially this list scale issue explains that not only is the list slow to load, but can affect the performance of other areas of the site.

As well the article explains “[large list] issues can be caused by user actions such as sorting or retrieving a large number of items at the same time.”  It does go onto explain that as long as you plan and configure the list or library properly, you can scale to millions of items.

The root of the list performance problem is actually in the view.  When you closely read the capacity boundaries document you’ll see that recommend list performance is recommended at a maximum of 2000 per view.  The default all items view and actually all views created by default have no limit by default.

 

Configure all Views with Item Limit

I see a new best practice to simply either configure a limit of 2000 less… your choice on all large lists.  If you’re creating a template, list template, site tempate, or feature that creates a list… would be great to see more use of this feature.

If you edit all of your views you can actually limit the total number of items returned

 

 

 

More Resources many of which are referred to above:

Joel’s Blog: Key Capacity Planning Info, Resources, and Links (I’ve been updating this over time)

Case Study by MCS and Knowledge Lake on Extreme Scalability

Office Online – SharePoint Server: Manage Lists and Libraries with many items

GetThePoint Blog Manage Large Lists for better performance

Whitepaper: Working with Large Lists in Office SharePoint Server 2007

Three Will’s Community Wiki a snippet from the TechNet article on SP1

Office Online: WSS Manage Lists and Libraries with many items (smaller set of ideas)

Downloadable book: Planning and deploying Service Pack 1 for Office SharePoint Server 2007 in a multi-server environment.

White paper: Performance recommendations for storage planning and monitoring (newer)

Whitepaper: SharePoint Performance Optimization Microsoft IT (IT Showcase)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: