Identifying data frame rows in R with specific pairs of values in two columns
I would like to identify all rows in a data frame (or matrix) whose values in column 1 and 2 match a specific pair. For example, if I have a matrix
I would like to identify the rows that contain any of the following pairs, i.e. all rows that contain a combination of either 1,2 or 2,4 in their first and second columns
The following does not work
because, as expected, it returns all combinations of 1,2 in the first column and 2,4 in the second (i.e. rows 2,3,5 rather than just rows 2 and 5 as desired), so that the row [1,4] is included even though this is not one of the pairs I'm querying for. There must be some simple way to use the which...%in%... to match specific pairs like this, but I haven't been able to find an example of this that works.
Note that I need the positions/row numbers of the rows which match the desired condition.
I assume as you're using which() you want the position, rather than just whether there is a match. You can cbind() the row number to testmat and then merge() this with of_interest.
You mention in your comment that you have 10e8 rows. This makes me think two things:
Given this I would avoid using which() or other approaches which do not exit early. Here's some Rcpp code that should be much faster than merge() with large datasets:
I think accessing rows as sub-matrices is more idiomatic Rcpp code than a double for-loop with matrix indexing, but I have no idea which is faster so if performance is your primary concern I'd try various approaches and benchmark.
Here is an approach with which + asplit
which might be a bit inefficient due to aplist, but should be working well for small datasets if speed is not one of your concerns.
You could paste() the values from your example (testmat and of_interest) into a single value and then do one %in% evaluation. For example:
If %in% is not fast enough for you, consider trying %fin% or fmatch() from fastmatch as a faster alternative to %in%.
We can use row.names() + {ivs}.
Set-up:
Index,
compare,
and index again: